Jump to content
Sign in to follow this  
DCCD

replace multiple strings in 100mb file

Recommended Posts

Hi, i wrote a script that can replace multiple strings in a xml file works fine but so slow!

I've used StringReplace ,_ReplaceStringInFile, StringRegExpReplace, all the same very slow,.

The number of replacements in the file about 8000

Any help would be greatly appreciated

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)
FileClose($OXML)
$XL = $XML
If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then

            _ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf
Edited by DCCD

Share this post


Link to post
Share on other sites

Well, you have a huge issue with loading and unloading 100mb's into memory over and over.

Every call to _ReplaceStringInFile opens the file twice.

So... 2 suggestions I can think of.

1.  Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time).

2.  Read the file into chunks and repeat step 1.

Edit:

If this is some type of database script, sqlite would make a lot more sense.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

This : StringRegExp("date err", "(.{33,}?(?:s)|.+)", 3)

and this : StringRegExp("kind err", "(.{33,}?(?:s)|.+)", 3)

has not sense...

Can you post a sample of your XML file, and explain us what exactly you want to replace by what ?

Share this post


Link to post
Share on other sites

You're going to have us guess without your code and an example file of what you've tried aren't you :( ... ?


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

Well, you have a huge issue with loading and unloading 100mb's into memory over and over.

Every call to _ReplaceStringInFile opens the file twice.

So... 2 suggestions I can think of.

1.  Ditch _ReplaceStringInFile() and just read the file into memory once, enum each line, keep what you want, remove what you don't (would require a second string to write back to the file, I say a second string because _ArrayDelete ReDims the array every time).

2.  Read the file into chunks and repeat step 1.

Edit:

If this is some type of database script, sqlite would make a lot more sense.

 

each text string need to be replaced may contain more than 500 characters/numbers.

Share this post


Link to post
Share on other sites

I'm sorry, I don't see the relevance to your statement/reply.

 

Edit:

This would speed up your script exponentially.

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)
FileClose($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)

If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then
            
            $XML = StringReplace($XML, $aArray[$i], '')
            ;_ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf

Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE)
FileWrite($ghOpen, $XML)
FileClose($ghOpen)

Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time.

Your way, it was opening, reading to memory, writing as many times as the loop was long.

One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example.

So you may want to backup your xml file before using this code (just FYI).

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

I'm sorry, I don't see the relevance to your statement/reply.

 

Edit:

This would speed up your script exponentially.

#include <File.au3>
$path = @ScriptDir & '\xmlfo.xml'
$OXML = FileOpen($path, 256)
$XML = FileRead($OXML)
FileClose($OXML)

$term = 'post'
$nofr = 1
Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3)

If Not @error Then
    For $i = 0 To UBound($aArray) - 1
        ;get data start
        ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF)
        $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3)
        If @error Then
            $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3)
        ElseIf Not @error Then
            ;ConsoleWrite($date[0] & ' ' & $i & @CRLF)
        EndIf
        $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3)
        If @error Then
            $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3)
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        ElseIf Not @error Then
            ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF)
        EndIf
        If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then
            
            $XML = StringReplace($XML, $aArray[$i], '')
            ;_ReplaceStringInFile($path, $aArray[$i], '')

            If Not @error Then
                ;MsgBox(16,'',$XL)
                ConsoleWrite($nofr & ' ' & $i & @CRLF)
                $nofr = $nofr + 1
            EndIf
            ;FileDelete(@ScriptDir & '\XML_output.xml')
            ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) )
        Else
            ConsoleWrite ('err0x0'& @CRLF)
        EndIf
    Next
EndIf

Global $ghOpen = FileOpen($path, $FO_UTF8_NOBOM + $FO_OVERWRITE)
FileWrite($ghOpen, $XML)
FileClose($ghOpen)

Here, as suggested before, we are only opening the file, reading the file, and writing to the file 1 time.

Your way, it was opening, reading to memory, writing as many times as the loop was long.

One thing is different, the FileOpen at the bottom of the script, you never told _ReplaceStringInFile how to write the data back to the file, so it was writing it regularly, I added $FO_UTF8_NOBOM strictly because that's how you opened it before in your code example.

So you may want to backup your xml file before using this code (just FYI).

 

@SmOke_N, Thank you for all your help  ^_^ and I apologize for the late response :sweating:

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By goku200
      I'm having some issues with writing to column C when an element is found. It works on C2 but it does not continue to C3, C4, C5, etc..... I'm wanting to write "test" if the element //input[@id='username'] is found  $someUser = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//input[@id='username']"). I have attached my HTML and Excel file along with my AutoIt code below:
      #Include "wd_core.au3" #Include "wd_helper.au3" #Include "wd_core.au3" #Include "File.au3" #Include "Array.au3" #Include "Excel.au3" Local $sDesiredCapabilities, $sSession _WD_Startup() $Ssession = _WD_CreateSession($sDesiredCapabilities) _WD_Navigate($sSession, "https://127.0.0.1/test.html") _WD_LoadWait($sSession) Local $oExcel = _Excel_Open() Local $oWorkbook = _Excel_BookOpen($oExcel, "C:\Users\<Username>\Downloads\test.xlsx") Local $aArrayTest1 = _Excel_RangeRead($oWorkbook, 1, $oWorkbook.ActiveSheet.Usedrange.Columns("A:A")) Local $aArrayTest2 = _Excel_RangeRead($oWorkbook, 1, $oWorkbook.ActiveSheet.Usedrange.Columns("B:B")) For $i = 0 To UBound($aArrayTest1) - 1 _WD_Navigate($Ssession, $aArrayTest1[$i]) _WD_LoadWait($sSession) $someUser = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//input[@id='username']") _WD_SetElementValue($sSession, $someUser, $aArrayTest2[$i]) Local $sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//input[@type='submit'][@value='Submit']") _WD_ElementAction($sSession, $sElement, 'click') _WD_LoadWait($sSession) Sleep(5000) If $someUser Then Local $aArray2D[2] = ["test"] _Excel_RangeWrite($oWorkbook, $oWorkbook.ActiveSheet, $aArray2D, "C2") EndIf Next Func SetupChrome() _WD_Option('Driver', 'chromedriver.exe') _WD_Option('Port', 9515) _WD_Option('DriverParams', '--log-path="' & @ScriptDir & '\chrome.log"') $sDesiredCapabilities = '{"capabilities": {"alwaysMatch": {"goog:chromeOptions": {"w3c": true, "args":["start-maximized","disable-infobars"]}}}}' EndFunc ;==>SetupChrome  
      test.html test.xlsx
    • By roeselpi
      hello again,
      it has been a long time since i have been here and a long time since i last used autoit. ever so often when the time allows me to, then i follow up on an idea that i had a long time ago. i have done all the work on paper but now it is up to writing it in autoit and i keep stumbling over many little issues here and there. sometimes after a few days i will try again and get a step further but sometimes it just will not help no matter how long i try and think about a solution. for most of you it will be the basics but for me it is not all that easy, but at least i give it a try.
      right, down to business:
      here is my code:
      #include <MsgBoxConstants.au3> #include <StringConstants.au3> #include <Array.au3> #include <String.au3> ; ; PART 1: define replacements and check with msgbox ; Global $y, $z $y = "Yes" $z = "No" MsgBox(0,"replacements", $y & @CRLF & $z) ;the replacements in a message box ; ; PART 2: set the texts and check via console and msgbox ; Global $my1string = "abab" ;the first specified text MsgBox(0,"my1string", $my1string) ;the message box to output the first specified text Global $my2string = "icic" ;the second specified text MsgBox(0,"my2string", $my2string) ;the message box to output the second specified text ; ; PART 3: transform the strings to individual arrays ; $my1array = StringSplit($my1string, "") $my1array[0] = "" _ArrayDelete($my1array, 0) _ArrayDisplay($my1array, "my1array") ;the display of the first specified array $my2array = StringSplit($my2string, "") $my2array[0] = "" _ArrayDelete($my2array, 0) _ArrayDisplay($my2array, "my2array") ;the display of the first specified array ; ; PART 4: create an empty array for filling ; Global $OutputArray[4] $OutputArray[0] = "" _ArrayDisplay($OutputArray, "OutputArray") ;the display of the first specified array ; ; PART 5: compare & fill empty OutputArray with data after evaluation ; Global $i, $j, $k For $i = 0 to UBound($my1array) -1 For $j = 0 to UBound($my2array) -1 For $k = 0 to UBound($OutputArray) -1 If $my1array[$i] = "a" And $my2array[$j] = "i" Then $OutputArray[$k] = $y Else $OutputArray[$k] = $z EndIf Next Next Next _ArrayDisplay($OutputArray, "OutputArray") ;the display of the Newly filled Array In "Part 2" i make a string that is converted to an array in "Part 3" ... Now, I know that "a" and "i" are always in the exact same spot in both arrays and so i wanted to compare this and make a further array to document my findings by saying "yes" or "no" ... however my new array keeps saying just "no" allthough i can clearly see and know that it should say:
      yes no yes no my guess is that there is something wrong within my for-loops and that the counting is somehow "off" i guess that when the first for-loop is finished it reaches the second whilst the second for-loop is checking the first which would explain why it always says "no" instead of seeing the obvious.
      so my question would be: what is wrong with my for-loop? or where am i making an error that ultimately gives me the wrong results?
      help is much appreciated.
      kind regards
      roeselpi
       
       
      PS: sorry for my not so great english spelling ... stupid german sitting here trying out intermediate english skills.
    • By Jangal
      Hello friends
      This app is slow
      How to increase its speed?
       
      #include <Array.au3> #include <StringConstants.au3> #include <File.au3> #include <String.au3> Global $aWord[][2]  = [[1, "google"],[2,"hello"]]


        Global $sFileName = @ScriptDir & "\1.txt" ; 2MB Text File Local $sFileRead = FileRead($sFileName) Local $res = StringRegExp($sFileRead, "(*UCP)\b[\pL\d]{2,}", 3) _ArrayDisplay($res)   for $sWord in $res     $iIndex = _ArraySearch($aWord, $sWord, 0, 0, 0, 0, 1, 2)     ;MsgBox(0,0,$iIndex)     if $iIndex == -1 Then         Local $aFill = [[0,$sWord]]         _ArrayAdd($aWord,$aFill)        ;      Else         $aWord[$iIndex][0] +=1     EndIf   Next _ArrayDisplay($aWord)

      1.txt
    • By Colduction
      Hi dear friends!, i'm sorry for creating a new thread (a new problem), i have over than 9 lists that i want to combine them to be this (in this example, there are 3 test files):


      I've written a little code for splitting main information, but i really confused how to make results as "Output.txt", here is that code:
       
      $sRegex_1 = StringRegExp(FileRead("1.txt"), '(?s:(?<=\=\=\r\n)(.*?)(?=\r\n\=\=))', 3) $sRegex_2 = StringRegExp(FileRead("2.txt"), '(?s:(?<=\=\=\r\n)(.*?)(?=\r\n\=\=))', 3) $sRegex_3 = StringRegExp(FileRead("3.txt"), '(?s:(?<=\=\=\r\n)(.*?)(?=\r\n\=\=))', 3) For $i = 0 To UBound($sRegex_1) - 1 ConsoleWrite($sRegex_1[$i] & @CRLF) For $j = 0 To UBound($sRegex_2) - 1 ConsoleWrite($sRegex_2[$j] & @CRLF) For $k = 0 To UBound($sRegex_3) - 1 ConsoleWrite($sRegex_3[$k] & @CRLF) Next Next Next  
    • By nacerbaaziz
      hello evrybody
      here is an example about how to split your texts using a delimiter with the ability to select how much of delimiters shows in each colum  with $i_number
      e.g
      you have a long text and you want to split it in an array
      that evry colum have a number (n) of lines
      i made a function that do that for you
      just call it with a three params
      $s_text
      your text
      $i_number
      the number that you want to put in each col
      $s_siparator
      the siparator
      default is "|"
      here is the function with example
      i hope that it will be useful for you
       
      ****
       
      #include <Array.au3> $s_txt = "some text1some text2|some text3|some text4|some text5|some text6" $array = splitText($s_txt, 2) _ArrayDisplay($array) Func splitText($s_text, $i_number, $s_siparator = "|") Local $a_TXT = StringSplit($s_text, $s_siparator) Local $a_Return[$a_TXT[0] + 1] If ($a_TXT[0] <= $i_number) Or ($i_number <= 0) Then ReDim $a_Return[2] $a_Return[0] = 1 $a_Return[1] = $s_text Return $a_Return EndIf Local $i_Processed = 1, $i_arrayProcessed = 1 Do For $i = $i_Processed To ($i_Processed + $i_number) - 1 If ($a_TXT[0] < $i) Then ExitLoop If Not ($a_Return[$i_arrayProcessed]) Then $a_Return[$i_arrayProcessed] = $a_TXT[$i] Else $a_Return[$i_arrayProcessed] &= $s_siparator & $a_TXT[$i] EndIf $i_Processed += 1 Next $i_arrayProcessed += 1 Until ($a_TXT[0] < $i_Processed) ReDim $a_Return[$i_arrayProcessed] $a_Return[0] = $i_arrayProcessed - 1 Return $a_Return EndFunc ;==>splitText
      accept my greetings
      thanks to
      @Dan_555
      for his notes
       
×
×
  • Create New...