Jump to content
Sign in to follow this  
DCCD

replace multiple strings in 100mb file

Recommended Posts

DCCD

Hi, i wrote a script that can replace multiple strings in a xml file works fine but so slow!

I've used StringReplace ,_ReplaceStringInFile, StringRegExpReplace, all the same very slow,.

The number of replacements in the file about 8000

Any help would be greatly appreciated

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)
FileClose($OXML)
$XL = $XML
If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then

            _ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf
Edited by DCCD

Share this post


Link to post
Share on other sites
SmOke_N

Well, you have a huge issue with loading and unloading 100mb's into memory over and over.

Every call to _ReplaceStringInFile opens the file twice.

So... 2 suggestions I can think of.

1.  Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time).

2.  Read the file into chunks and repeat step 1.

Edit:

If this is some type of database script, sqlite would make a lot more sense.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
kylomas

DCCD,

See jdelaney's sig for working with XML files directly. 

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
jguinch

This : StringRegExp("date err", "(.{33,}?(?:s)|.+)", 3)

and this : StringRegExp("kind err", "(.{33,}?(?:s)|.+)", 3)

has not sense...

Can you post a sample of your XML file, and explain us what exactly you want to replace by what ?

Share this post


Link to post
Share on other sites
SmOke_N

You're going to have us guess without your code and an example file of what you've tried aren't you :( ... ?

  • Like 1

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
DCCD

Well, you have a huge issue with loading and unloading 100mb's into memory over and over.

Every call to _ReplaceStringInFile opens the file twice.

So... 2 suggestions I can think of.

1.  Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time).

2.  Read the file into chunks and repeat step 1.

Edit:

If this is some type of database script, sqlite would make a lot more sense.

 

each text string need to be replaced may contain more than 500 characters/numbers.

Share this post


Link to post
Share on other sites
SmOke_N

I'm sorry, I don't see the relevance to your statement/reply.

 

Edit:

This would speed up your script exponentially.

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)
FileClose($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)

If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then
            
            $XML = StringReplace($XML, $aArray[$i], '')
            ;_ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf

Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE)
FileWrite($ghOpen, $XML)
FileClose($ghOpen)

Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time.

Your way, it was opening, reading to memory, writing as many times as the loop was long.

One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example.

So you may want to backup your xml file before using this code (just FYI).

Edited by SmOke_N
  • Like 1

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
DCCD

I'm sorry, I don't see the relevance to your statement/reply.

 

Edit:

This would speed up your script exponentially.

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)
FileClose($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)

If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then
            
            $XML = StringReplace($XML, $aArray[$i], '')
            ;_ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf

Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE)
FileWrite($ghOpen, $XML)
FileClose($ghOpen)

Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time.

Your way, it was opening, reading to memory, writing as many times as the loop was long.

One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example.

So you may want to backup your xml file before using this code (just FYI).

 

@SmOke_N, Thank you for all your help  ^_^ and I apologize for the late response :sweating:

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • lattey
      By lattey
      hi,
      i have checkboxes and each checkbox that checked, i put in array. 
      now, im stuck on how to loop the checked array and store in in one variable. what i can do now, is only write the result into a text file. 
      below is the code:
      #include <GUIConstantsEx.au3> ;~ #include <MsgBoxConstants.au3> #include <ButtonConstants.au3> #include <Array.au3> Global $Count = 3 Global $CheckBoxP[$Count] Global $step[$Count] global $array1[1] Global $ExitResult $hGUI = GUICreate("Summary Steps", 500, 400) GUISetFont(12, 400, "Tahoma") GUICtrlCreateLabel( "Please Select the Summary Steps for Script Check", 70, 20) GUISetFont(10, 400, "Tahoma") Global $array_Pstep[3] = ["fix2","fix1","fix3"] global $step[3] = ["2","3","4"] $Spacing = 50 For $i = 0 To UBound($array_Pstep) - 1 $CheckBoxP[$i] = GUICtrlCreateCheckbox($array_Pstep[$i], 80, $Spacing + (20 * $i), 65, 17) Next $submit = GUICtrlCreateButton("Submit",180, 280, 80, 30) $exit = GUICtrlCreateButton("Exit",180, 320, 80, 30) GUISetState() While 1 $Msg = GUIGetMsg() Select case $Msg=$submit For $i = 0 To $Count - 1 If GUICtrlRead($CheckBoxP[$i]) = $GUI_CHECKED Then _ArrayAdd($array1, $step[$i]) EndIf Next Global $logfilerray = @WorkingDir & "\checkedlist.txt" FileDelete ($logfilerray) Global $readlogfile = FileOpen($logfilerray,1) for $a = 1 to UBound($array1) - 1 ;~ $var=$array1[$a] FileWriteLine($readlogfile,$array1[$a]) Next FileClose($readlogfile) Exit case $Msg=$exit $ExitResult = MsgBox(1,"Summary Step", "Continue to Exit ?") if $ExitResult = 1 Then ;ok Exit EndIf Exit EndSelect WEnd  
    • omicron
      By omicron
      How do you perform a nested loop function with a multidimensional array from 2 lists.
      for i in list1
      (open file) extract variable
          while open for i in list 2
          (open file2) extract variable
       
      var1 + var2 = (search term)

      The list sizes will more than likely consist of different lengths.
       
      What is the best approach to accomplishing this method?
             
    • omicron
      By omicron
      Hello!

      I am working on a function that I am just getting lost on. The goal is a multiple nested loop.

      Here are the steps:
      Contents of file1.txt::
      [topic] var1=Name var2=OtherName var3=SomeotheName Contents of file2.txt::
      [subTopic] top=sub1 top2=sub2 top3=sub3 The Shell I am working from::
      #include <file.au3> $file = "c:\yourfile.txt" FileOpen($file, 0) For $i = 1 to _FileCountLines($file) $line = FileReadLine($file, $i) msgbox(0,'','the line ' & $i & ' is ' & $line) Next FileClose($file) Understanding however that the "msgbox" needs to then become a variable. in example the following::
      $file = "c:\yourfile.txt" FileOpen($file, 0) While true( prog.exe is running && "WinName" is open) do For $i = 1 to _FileCountLines($file) $line = FileReadLine($file, $i) ;Open File to log "current location of file 1" FileWriteLine ("filename", $i & ' is ' & $line) var = $line Next $file2 = "c:\yourfile.txt" FileOpen($file, 0) For $i = 1 to _FileCountLines($file) $line = FileReadLine($file, $i) ; OpenFile to log "Current location of file 2" FileWriteLine ("filename", $i & ' is ' & $line) Next FileClose($file2) FileClose($file) The goal in written form is the following ::

      While in "OpenWindow"
          read from file 1 starting at line 1 until end of file.
         file 1 is a list of names to be searched.
         With $line selected, add this element to the element in file 2.
       
      The search of a variables in list 1 and list 2 differ on the amount of posts that day. (This is not a web based platform, it is a game) I need to search 2 names and take a screenshot of the out put. The sizes of the names list depend on the activity of names at the time of search.
      This loop continues until all the names from both lists have been searched. Mostly in the format of::
      File1= item
      File2= Vendor
       
      Item + Vendor  ( Capture screen, scroll) -- Not sure how to detect if I need to scroll)
       
      Thank you for your help and support!
    • Skeletor
      By Skeletor
      Hi Virtual People,
      My array works perfectly fine. However, what is the best practice if the line in the array doesn't have the correct amount of columns and if I can add a placeholder?

       
      For $count = 1 To _FileCountLines($FileRead1) Step 1 $string = FileReadLine($FileRead1, $count) $input = StringSplit($string, ",", 1) $value1 = $input[1] $value2 = $input[2] $value3 = $input[3] _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value2, "A1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value1, "B1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value3, "C1") Next  
    • MrCheese
      By MrCheese
      hi all,
      reviewing the forum, this thread is applicable: 
       
       
      I wanted to know if there is now a better way to do this?
      In essence, I load a tab delimited txt file into an array (works well). I used tab, as some fields in the original csv contains commas.
      However, I needed autoit to manipulate this array, and output it as a csv.
      IF my array contains items with a comma, without double quotes around the field, then how best do I get a csv out of this?
      My current workaround is to filewritefromarray tab delimited, then open it in excel and save as a csv. I will need to check this to see how the address fields behave that contain a comma.
       
      Any thoughts would be appreciated.
       
×