Jump to content

Recommended Posts

Posted

Hi Im having a text file with over 160k lines . Im looking for a way to search it quickly 

im having about 40 pics im searching for 

the text file I have is built up like so:  b.imageUrl = "/images//G_SS_Preview.png";

The data I haver to search for is only "Preview.png" but searching for that gives multiple hits in this text document with results i dont want, Im wondring if I can search the lines that it must include "b.imageUrl" and "Preview.png" to give a match result so I can quickly change the images var to search for all my images ? 

Posted
$word1 = "Preview.png"
$word2 = "b.imageUrl"

$file_in = "input_file.txt"
$file_out = "output_file.txt"

$text_in = FileRead($file_in)
$text_in = StringSplit($text_in, @CRLF, 1)

$text_out = ''

For $i = 1 To $text_in[0]
    $line = $text_in[$i]
    
    If StringInStr($line, $word1) And StringInStr($line, $word2) Then
        $text_out &= $line & @CRLF
    EndIf
Next

FileDelete($file_out)
FileWrite($file_out, $text_out)

 

Posted (edited)

Here is an alternative way. You get an array with the matching line numbers.

#include <File.au3>
#include <Array.au3>
Global $sFilePath, $aSearchArr, $aResultArr[0], $sSearch1, $sSearch2
$sFilePath = @ScriptDir & '\input_file.txt'
$sSearch1  = "b.imageUrl"
$sSearch2  = "Preview.png"
_FileReadToArray($sFilePath, $aSearchArr)
If @error Then Exit
For $i = 1 To $aSearchArr[0]
    If StringInStr($aSearchArr[$i], $sSearch1) And StringInStr($aSearchArr[$i], $sSearch2) Then
        _ArrayAdd($aResultArr, "LineNo. : " &  $i & " => " & $aSearchArr[$i])
    EndIf
Next
_ArrayDisplay($aResultArr, 'Matches : ')

 

EDIT : With 160.000 lines, StringRegExp can possibly be faster than 2 times StringInStr (try it out ;))

If StringRegExp($aSearchArr[$i], '(?i)' & $sSearch1 & '.*' & $sSearch2) Then
        _ArrayAdd($aResultArr, "LineNo. : " &  $i & " => " & $aSearchArr[$i])
    EndIf

 

input_file.txt

Edited by Musashi

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Posted

@Musashi

For the fun :idiot:

#include <Array.au3>

$txt = FileRead(@ScriptDir & '\input_file.txt')
Local $sSearch1  = "b.imageUrl", $sSearch2  = "Preview.png"

$aResult = StringRegExp(Execute ( "'" & StringRegExpReplace(StringReplace($txt, "'", "''"), "(?m)^", "' & Assign(""iReplace"", Eval(""iReplace"")+1) * Eval(""iReplace"") & ' - ") & "'" ), '(?i).*?\Q' & $sSearch1 & '\E.*?\Q' & $sSearch2 & '\E.*', 3)

_ArrayDisplay($aResult)

 

Posted
25 minutes ago, mikell said:

@Musashi

For the fun :idiot:

I would have been deeply grieved if you had not delivered a "one-liner" :lol:. Nice job (as usual).

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Posted

Thanks  :)
Obviously the best script to use is yours (or the one from Zedna)
BTW about your edit, may I suggest

If StringRegExp($aSearchArr[$i], '(?i)\Q' & $sSearch1 & '\E.*?\Q' & $sSearch2 & '\E') Then

\Q...\E  could be used in case of special chars in the search terms (such as "b*.imageURL" and so on)

Posted
11 minutes ago, mikell said:

BTW about your edit, may I suggest ...

Your suggestions are always welcome, especially when it comes to regular expressions :).

I have implemented your (more elegant) variant into the script :

#include <File.au3>
#include <Array.au3>
Global $sFilePath, $aSearchArr, $aResultArr[0], $sSearch1, $sSearch2
$sFilePath = @ScriptDir & '\input_file.txt'
$sSearch1  = "b.imageUrl"
$sSearch2  = "Preview.png"
_FileReadToArray($sFilePath, $aSearchArr)
If @error Then Exit
For $i = 1 To $aSearchArr[0]
    If StringRegExp($aSearchArr[$i], '(?i)\Q' & $sSearch1 & '\E.*?\Q' & $sSearch2 & '\E') Then
        _ArrayAdd($aResultArr, "LineNo. : " &  $i & " => " & $aSearchArr[$i])
    EndIf
Next
_ArrayDisplay($aResultArr, 'Matches : ')

Now @Acce has several solutions to choose from, and that's all that matters.

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Posted (edited)

Thanks guys , actually I solved this myself about 5min after posting here here is what I did

If StringInStr($Line, $Search)  And StringInStr($Line, "b.imageUrl") Then 
       do something

, will look at what you guys have suggested and see if that is better 

StringRegExp  looks very interesting to use  

thanks for your help :) 

Edited by Acce
Posted (edited)
21 hours ago, Musashi said:

EDIT : With 160.000 lines, StringRegExp can possibly be faster than 2 times StringInStr (try it out ;))

 

 

StringInStr() is much faster when CaseSensitive option is ON (1), so when speed is important and it's possible then I use casesense=1

 

$word1 = StringLower("Preview.png")
$word2 = StringLower("b.imageUrl")

$file_in = "input_file.txt"
$file_out = "output_file.txt"

$text_in = StringLower(FileRead($file_in))
$text_in = StringSplit($text_in, @CRLF, 1)

$text_out = ''

For $i = 1 To $text_in[0]
    $line = $text_in[$i]

    If StringInStr($line, $word1, 1) And StringInStr($line, $word2, 1) Then
        $text_out &= $line & @CRLF
    EndIf
Next

FileDelete($file_out)
FileWrite($file_out, $text_out)

 

Edited by Zedna
Posted (edited)

As I said in my previous post, it seems that StringInStr with CaseSense=1 is faster than RegExp (in this case).

Here are my testing data/script:

input_file0.txt (10 lines) --> input_file.txt (160 000 lines) copied by this helper script:

$file_in = "input_file0.txt"
$file_out = "input_file.txt"

$text_in = FileRead($file_in)
$text_out = ''

For $i = 1 To 16000
    $text_out &= $text_in
Next

FileDelete($file_out)
FileWrite($file_out, $text_out)

 

And here is main testing script which measures both variants StringInStr x RegExp:

$word1 = StringLower("Preview.png")
$word2 = StringLower("b.imageUrl")

$file_in = "input_file.txt"
$file_out = "output_file.txt"

$text_in = StringLower(FileRead($file_in))
$text_in = StringSplit($text_in, @CRLF, 1)

$text_out = ''

$start = TimerInit()
For $i = 1 To $text_in[0]
    $line = $text_in[$i]

    If StringInStr($line, $word1, 1) And StringInStr($line, $word2, 1) Then
        $text_out &= $line & @CRLF
    EndIf
Next
$end1 = TimerDiff($start)

FileDelete($file_out)
FileWrite($file_out, $text_out)

$file_out = "output_file2.txt"
$text_out = ''

$start = TimerInit()
For $i = 1 To $text_in[0]
    $line = $text_in[$i]

    If StringRegExp($line, '(?i)' & $word1 & '.*' & $word2) Then
        $text_out &= $line & @CRLF
    EndIf
Next
$end2 = TimerDiff($start)

FileDelete($file_out)
FileWrite($file_out, $text_out)

MsgBox(0,'Time','StringInStr: ' & $end1 & @CRLF & 'RegExp: ' & $end2)

result is:

StringInStr: 849.7688
RegExp: 1190.2022
 

... so StringInStr is FASTER than RegExp in this case.

 

Note:

I'm not RegExp expert so I can't fix some problem in this (copied) RegExp because output_file2.txt is empty.

But I think that this comparision is relevant no matter of this bug.

input_file0.txt

Edited by Zedna
Posted

wow so many replies by this little question really appreciate it , Just a silly dumb question then how do I activate CaseSense=1 ? 

Posted

nice didn't see the first time thanks , searching a text file this big really has to be as fast as possible. thanks again for the help  really want expecting this to be faster then RegExp 

Posted (edited)
1 minute ago, Acce said:

nice didn't see the first time thanks , searching a text file this big really has to be as fast as possible. thanks again for the help  really want expecting this to be faster then RegExp 

What I think is funny with my script is that it browses these 160k lines looking for download links for png files , and the files gets downloaded faster then it takes to look up the links , lol 

Edited by Acce
Posted (edited)

Post sample of your input TXT file, we can optimize searching by some tricks like this (based on how real data looks like):

For $i = 1 To $text_in[0]
    $line = $text_in[$i]
    If StringLen($line) < 22 Then ContinueLoop ; -----> optimization

    If StringInStr($line, $word1, 1) And StringInStr($line, $word2, 1) Then
        $text_out &= $line & @CRLF
    EndIf
Next

 

EDIT:

here is other liitle optimization:

instead of

$word1 = StringLower("Preview.png")
$word2 = StringLower("b.imageUrl")

use rather this

$word1 = StringLower("b.imageUrl")
$word2 = StringLower("Preview.png")

--> as first search for "b.imageUrl"

Edited by Zedna
Posted

Thanks for all your help not sure why I had them in opposite order . This was a small little topic but lots of helpful info here cant really say more then thank again :)

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...