Jump to content
Sign in to follow this  
DCCD

replace multiple strings in 100mb file

Recommended Posts

Hi, i wrote a script that can replace multiple strings in a xml file works fine but so slow!

I've used StringReplace ,_ReplaceStringInFile, StringRegExpReplace, all the same very slow,.

The number of replacements in the file about 8000

Any help would be greatly appreciated

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)
FileClose($OXML)
$XL = $XML
If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then

            _ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf
Edited by DCCD

Share this post


Link to post
Share on other sites

Well, you have a huge issue with loading and unloading 100mb's into memory over and over.

Every call to _ReplaceStringInFile opens the file twice.

So... 2 suggestions I can think of.

1.  Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time).

2.  Read the file into chunks and repeat step 1.

Edit:

If this is some type of database script, sqlite would make a lot more sense.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

This : StringRegExp("date err", "(.{33,}?(?:s)|.+)", 3)

and this : StringRegExp("kind err", "(.{33,}?(?:s)|.+)", 3)

has not sense...

Can you post a sample of your XML file, and explain us what exactly you want to replace by what ?

Share this post


Link to post
Share on other sites

You're going to have us guess without your code and an example file of what you've tried aren't you :( ... ?


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

Well, you have a huge issue with loading and unloading 100mb's into memory over and over.

Every call to _ReplaceStringInFile opens the file twice.

So... 2 suggestions I can think of.

1.  Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time).

2.  Read the file into chunks and repeat step 1.

Edit:

If this is some type of database script, sqlite would make a lot more sense.

 

each text string need to be replaced may contain more than 500 characters/numbers.

Share this post


Link to post
Share on other sites

I'm sorry, I don't see the relevance to your statement/reply.

 

Edit:

This would speed up your script exponentially.

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)
FileClose($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)

If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then
            
            $XML = StringReplace($XML, $aArray[$i], '')
            ;_ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf

Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE)
FileWrite($ghOpen, $XML)
FileClose($ghOpen)

Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time.

Your way, it was opening, reading to memory, writing as many times as the loop was long.

One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example.

So you may want to backup your xml file before using this code (just FYI).

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

I'm sorry, I don't see the relevance to your statement/reply.

 

Edit:

This would speed up your script exponentially.

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)
FileClose($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)

If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then
            
            $XML = StringReplace($XML, $aArray[$i], '')
            ;_ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf

Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE)
FileWrite($ghOpen, $XML)
FileClose($ghOpen)

Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time.

Your way, it was opening, reading to memory, writing as many times as the loop was long.

One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example.

So you may want to backup your xml file before using this code (just FYI).

 

@SmOke_N, Thank you for all your help  ^_^ and I apologize for the late response :sweating:

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • By jmp
      I am adding labour charge to total paid amount using : 
      #include <IE.au3> #include <Array.au3> $oIE = _IEAttach ("Shop") $oTable = _IETableGetCollection ($oIE, 1) $aTableData3 = _IETableWriteToArray ($oTable) Local $sitem1 = $aTableData3[5][1] Local $sitem2 = $aTableData3[5][2] Local $lcharge = "10" ;add manualy using inputbox, becuase not generating online Local $atotPric = "Payable Total Price " Local $oTds = _IETagNameGetCollection($oIE, "td") For $oTd In $oTds If $oTd.Innertext = $atotPric Then $iatotPric = $oTd.NextElementSibling.innertext MsgBox (0, "2", $iatotPric) EndIf Next $irCtotal = StringFormat("%.2f", $sitem1 + $sitem2 + $lcharge) $crTotp = StringReplace(_IEBodyReadHTML($oIE), $iatotPric, $irCtotal) _IEBodyWriteHTML ($oIE, $crTotp) But, It was also changing Total price, I want to change only Payable Total Price.

    • By nacerbaaziz
      hello sirs
      i've some questions about StringRegExpReplace i hope you can help me
       
      i tried to make a function that give me the host of the url and other give me the url with out host
      for example i've this link
      https://www.example.com/vb/result.php
      i need the first give me the
      example.com
      and the other give me 
      /vb/result.php
      i find that
      $s_source = "https://www.google.com/vb/index.php" Local $s_Host = StringRegExpReplace($s_Source, '.*://(.*?)/.*', '\1') Local $s_Page = StringRegExpReplace($s_source, '.*://.*?(/.*)', '\1') msgBox(64, $s_Host, $s_Page)  
      but i found some problems i need your help to correct it
      first: when i get the host if the url has www i want to remove it
      second: if the url with out host did not have other things 
      i need the result to be ""
      e.g
      https://www.example.com
      the first i want it
      example.com
      and the second i want it to be ""
      i hope that you can help me
      thanks in advance
    • By jmp
      Script running good but error in line 7.
      When i run this script :
      #include <IE.au3> #include <Array.au3> $oIE = _IEAttach ("Shop") $oTable = _IETableGetCollection ($oIE, 1) $aTableData = _IETableWriteToArray ($oTable) For $inumber = 1 To UBound($aTableData) -1 $table = $aTableData[4][$inumber] MsgBox(0, "", $table) Next I got Error: array variable has incorrect number of subscripts or subscript dimension range exceeded
    • By nacerbaaziz
      Good morning guys, i hope that you're all well.
      guys, i have a problem and i hope that you can help me
      i've created an 3d array
      the array Contain a Categories info
      as folow
      $array[n][0][0] = Categorie name
      $array[n][0][1] = Categorie file path
      $array[n][0][2] = Categorie contents number
      $array[n][m][0] = link name
      $array[n][m][1] = link url
      $array[n][m][2] = link section name
      in my tool i want to add an option to delete a Category
      as you know the Categorie mean that must delete a region from the array
      when i tried to use _arrayDelete
      with the 2d array it work well
      but here i couldn't find any way to do that, can any one help me please?
      thanks in advance.
×
×
  • Create New...