Jump to content
Sign in to follow this  
DCCD

replace multiple strings in 100mb file

Recommended Posts

DCCD

Hi, i wrote a script that can replace multiple strings in a xml file works fine but so slow!

I've used StringReplace ,_ReplaceStringInFile, StringRegExpReplace, all the same very slow,.

The number of replacements in the file about 8000

Any help would be greatly appreciated

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)
FileClose($OXML)
$XL = $XML
If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then

            _ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf
Edited by DCCD

Share this post


Link to post
Share on other sites
SmOke_N

Well, you have a huge issue with loading and unloading 100mb's into memory over and over.

Every call to _ReplaceStringInFile opens the file twice.

So... 2 suggestions I can think of.

1.  Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time).

2.  Read the file into chunks and repeat step 1.

Edit:

If this is some type of database script, sqlite would make a lot more sense.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
kylomas

DCCD,

See jdelaney's sig for working with XML files directly. 

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
jguinch

This : StringRegExp("date err", "(.{33,}?(?:s)|.+)", 3)

and this : StringRegExp("kind err", "(.{33,}?(?:s)|.+)", 3)

has not sense...

Can you post a sample of your XML file, and explain us what exactly you want to replace by what ?

Share this post


Link to post
Share on other sites
SmOke_N

You're going to have us guess without your code and an example file of what you've tried aren't you :( ... ?

  • Like 1

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
DCCD

Well, you have a huge issue with loading and unloading 100mb's into memory over and over.

Every call to _ReplaceStringInFile opens the file twice.

So... 2 suggestions I can think of.

1.  Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time).

2.  Read the file into chunks and repeat step 1.

Edit:

If this is some type of database script, sqlite would make a lot more sense.

 

each text string need to be replaced may contain more than 500 characters/numbers.

Share this post


Link to post
Share on other sites
SmOke_N

I'm sorry, I don't see the relevance to your statement/reply.

 

Edit:

This would speed up your script exponentially.

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)
FileClose($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)

If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then
            
            $XML = StringReplace($XML, $aArray[$i], '')
            ;_ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf

Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE)
FileWrite($ghOpen, $XML)
FileClose($ghOpen)

Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time.

Your way, it was opening, reading to memory, writing as many times as the loop was long.

One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example.

So you may want to backup your xml file before using this code (just FYI).

Edited by SmOke_N
  • Like 1

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
DCCD

I'm sorry, I don't see the relevance to your statement/reply.

 

Edit:

This would speed up your script exponentially.

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)
FileClose($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)

If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then
            
            $XML = StringReplace($XML, $aArray[$i], '')
            ;_ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf

Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE)
FileWrite($ghOpen, $XML)
FileClose($ghOpen)

Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time.

Your way, it was opening, reading to memory, writing as many times as the loop was long.

One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example.

So you may want to backup your xml file before using this code (just FYI).

 

@SmOke_N, Thank you for all your help  ^_^ and I apologize for the late response :sweating:

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • milos83
      By milos83
      Default keyword for optional parameter is interpreted wrongly.
      ConsoleWrite(StringReplace("aa", "a", "b", Default, 1) & @CRLF) StringReplace ( "string", "searchstring/start", "replacestring" [, occurrence = 0 [, casesense = 0]] ) The code above will output ab even thou the default value for the occurrence is 0 (replace all).
      Of course using zero instead of Default will work fine.
    • VollachR
      By VollachR
      Hi,
      I'm looking for a way to take a number value from a Row2 of a 2D array and according to this check if files that appear in rows 3-11 in the array exists.
      For example, if the number in Row2 is 5 I need to check for the files in Row 3-6 only, if it is 6 than rows 3-7 and so on.
      I thought on using a FOR loop but I have very little experience with those.
      Can you suggest the best way to do what I need?
      BTW, the files in Rows 3-11 will usually have blank value for any row above the number in Row2 (e.g. Row2 = 5 so Rows3-6 will have values but 8-11 be empty), The values I need are in Column 1 of the array, the name of the key from the INI file that the array was created from is in Column 0.
      Full Example:
      Row2 of Array:
      Col0 = Games# - Col1 = 5
      Rows3-6
      Col0 = Exe2 - Col1 = Path To File
      Col0 = Exe3 - Col1 = Path To File
      Col0 = Exe4 - Col1 = Path To File
      Col0 = Exe5 - Col1 = Path To File
      I need that if Row2 is 5 to check these above for rows if the file exists, if it was 6 then the next row as well and so on up until number 10 in Row2 as it can't go above 10.
      So basically for whatever number in Row2 from 2-10 need to check 1-9 rows from 3-11 to see if the files in Col1 exists and if any of them don't exist it should call a function that shows an error message.
      I'm pretty sure I have the first line of the for look correct:
      For $i = 1 To $aAIO[2][1] Just not sure how to continue from there, also not sure if $i should be equal 1 or 2.
      Help will be appreciated.
    • FMS
      By FMS
      Hello,
      I'm trying to get data from twitter to an array and so far I found an Twitter UDF whish lookes very intresting but couldn't get it to work.
      It lookes not supported any more(2010) and buggy when i read all te replies.
      More around this subject (autoit and twitter) i couldn't find on this forum.
      Is there sombody who know's a good way to get live data from twitter to an array inside autoit?
      (I kinda doubt that this isn't tackled before)
      In the end I was hoping to get all tweets from date to date from an specific subject inside a 2D array to work whit.
    • AndreasNWWWWW
      By AndreasNWWWWW
      I got a question:  i am trying to run different functions based upon what i select in these radio buttons.(code below)
      it needs to check server 1. then run function 1 or function 2 after what i selected in the checkbox.
      once that function is done it moves to the next one, until it has been trough all 5 
       
      iv'e tried using while loops with different while $i equals to something but then i manualy need to go in and edit the script every time.
      #include <ButtonConstants.au3> #include <GUIConstantsEx.au3> #include <StaticConstants.au3> #include <WindowsConstants.au3> #Region ### START Koda GUI section ### Form= $Form1 = GUICreate("Form1", 615, 437, 192, 124) $Server2 = GUICtrlCreateLabel("Server2", 216, 95, 41, 17) $server1 = GUICtrlCreateLabel("Server1", 216, 72, 41, 17) $server4 = GUICtrlCreateLabel("Server4", 216, 144, 41, 17) $server3 = GUICtrlCreateLabel("Server3", 216, 119, 41, 17) $server5 = GUICtrlCreateLabel("Server5", 216, 170, 41, 17) $Start = GUICtrlCreateButton("Start", 240, 248, 147, 25) $Checkbox1 = GUICtrlCreateCheckbox("function1", 288, 72, 97, 17) $Checkbox2 = GUICtrlCreateCheckbox("function2", 392, 72, 97, 17) $Checkbox3 = GUICtrlCreateCheckbox("function1", 288, 96, 97, 17) $Checkbox4 = GUICtrlCreateCheckbox("function2", 392, 96, 97, 17) $Checkbox5 = GUICtrlCreateCheckbox("function1", 288, 120, 97, 17) $Checkbox6 = GUICtrlCreateCheckbox("function2", 392, 120, 97, 17) $Checkbox7 = GUICtrlCreateCheckbox("function1", 288, 144, 97, 17) $Checkbox8 = GUICtrlCreateCheckbox("function2", 392, 144, 97, 17) $Checkbox9 = GUICtrlCreateCheckbox("function1", 288, 170, 97, 17) $Checkbox10 = GUICtrlCreateCheckbox("function2", 392, 170, 97, 17) GUISetState(@SW_SHOW) #EndRegion ### END Koda GUI section ### While 1 $nMsg = GUIGetMsg() Switch $nMsg Case $GUI_EVENT_CLOSE Exit EndSwitch WEnd  
    • FroVN
      By FroVN
      i have a text : <Name>Jonh</Name>.<Age>15</Age>
      how i can get Jonh and 15 in one stringregexp? pls give me example
×