Jump to content
Sign in to follow this  
buymeapc

Searching Between Two Large Arrays

Recommended Posts

Ok, so I tried your code with the regex against the log I have (65MB) and it took about 42 seconds to search the entire log for the criteria specified in the variable. Utilizing the loops method took about 17 seconds.

Share this post


Link to post
Share on other sites

Ok, I mocked up a quick demo of what I'm working with. Since not too many people have logs that are huge, I added that to the demo code below. It will create a 20MB text file in which to read from. The process (without the text file creation) takes about 60 seconds to run. Is there any better method to search the huge array for case-sensitive search criteria including partials?

#include <array.au3>
#include <file.au3>
$fileName = @ScriptDir&"\Test.txt"; <---Sample file name to use
$sText = 'struct pPob1 "0000000000h 000000029142000000848300000000000000030000000003762500000000000000000000000003762500000000 01780010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"' & @CRLF & _
'struct pPob2 01 "V 000000079160013033010000000000601400000019020000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"' & @CRLF & _
'struct pPob2 02 "V 000000297090048917010000000002312800000065810000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"' & @CRLF & _
'struct pAeb1 "87 104                0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"' & @CRLF & _
'struct pAeb_Dx "0000000000000000000000000000000000000000"' & @CRLF & _
'struct pAeb_Op 1 "00613000000000000000000000000000000000000V 910000002000000100000000000"' & @CRLF & _
'struct pAeb_Op 02 "00616000000000000000000000000000000000000V 910000002000000100000000000"' & @CRLF & _
'ACE Edit V10, R4, EdtRC=00' & @CRLF & _
'struct pPws1 "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"' & @CRLF & _
'struct pOob2 "0000000000000000000000000000000000000000000000000000000000000000000000000"' & @CRLF & _
'struct pPaths->system "C:\Inetpub\wwwroot\HSS\Data"' & @CRLF & _
'struct irec "010001     09  2010100100600000838700010000050000000000000050000178001100000000005000175000000000000000000000000100981000010981000000001FS2010 0800002000 0800002000AL     10000000001010200 0800002000NATIONAL 08000020001010200 0000000000 000000000000000000000000000000000000000000000000000000000000000000000000000020106 0000000000ALLEDITSOFF   000000000000000000002175000178000000    0000L999        09 20101001 apc010h NA000"' & @CRLF & _
'13:07:53 opcode=16, OptRC=00, APC, Group=55/55 V10/10, GrpRC=00, Price=h /h , PrcRC=00' & @CRLF

; If the test text file is less than 20MB, then write to it until it is greater than 20MB
If FileGetSize($fileName) < 20971520 Then
; Open the file and write a ton of data to it to make it huge
$hFile = FileOpen($fileName, 1)
Do
     Sleep(10)
     FileWrite($hFile, $sText)
Until FileGetSize($fileName) > 20971520; Write to it until it's greater than 20MB - This could take a while!
FileClose($hFile)
EndIf

; Read the file into an array since I'll be using that array to import into a virtual listview later
Dim $aNewFile, $aRestore, $aRest, $fCase
_FileReadToArray($fileName, $aNewFile)

; This array is the final output which will be the same size as the above array and will let us know which lines have a corresponding match or partial match
Dim $aFinal[UBound($aNewFile)]
$aFinal[0] = UBound($aNewFile)-1

; Create the search criteria array
$aRestore = _AddSearchDefaults()

$time = TimerInit()
For $x = 1 To $aFinal[0];disinclude this 3 lines if you dont need "No Match" textual statement
$aFinal[$x] = "No Match"
Next
If IsArray($aRestore) Then
For $s = 1 To $aRestore[0][0]
     $fCase = $aRestore[$s][1]
     $aRest = $aRestore[$s][0]
     For $t = 1 To $aNewFile[0]
         If StringInStr($aNewFile[$t], $aRest, $fCase) Then $aFinal[$t] = "---------->Match!"
     Next
Next
EndIf
_ArrayDisplay($aFinal, TimerDiff($time))
Exit

; ~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*

Func _AddSearchDefaults()
; This function creates an array of search criteria that is used to search the text file for
Local $hiLiteDefaults[127][2], $iCount = 0
Local $aNumbers[127] = [126,"01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18","23","30","31","71","72","73","74","81","82","83","84", _
"88","89","90","95","96","97","98","99","01","02","05","06","07","08","09","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27", _
"28","46","62","70","87","02","03","04","05","06","07","08","09","10","11","12","60","61","87","88","89","95","01","02","03","04","05","06","07","15","16","17","18", _
"19","20","21","22","23","24","88","89","91","92","93","94","95","96","97","98","99","01","02","03","04","05","06","07","08","09","10","13","14","15","87"]
; Populate the array with search terms
$hiLiteDefaults[0][0] = 126
For $x = 1 To 126
     Switch $x
         Case 1 To 38
         $hiLiteDefaults[$x][0] = 'struct pEcb "'&$aNumbers[$x]
         Case 39 To 69
         $hiLiteDefaults[$x][0] = 'struct pOob1 "'&$aNumbers[$x]
         Case 70 To 86
         $hiLiteDefaults[$x][0] = 'struct pAeb1 "'&$aNumbers[$x]
         Case 87 To 114
         $hiLiteDefaults[$x][0] = 'struct pGob1 "'&$aNumbers[$x]
         Case 115 To 128
         $hiLiteDefaults[$x][0] = 'struct pLeb1 "'&$aNumbers[$x]
     EndSwitch
     $hiLiteDefaults[$x][1] = 1 ; Make them all case sensitive for this example - this won't be the case outside of this example
Next
_ArrayDisplay($hiLiteDefaults, "Search Criteria")
Return $hiLiteDefaults
EndFunc

Here is the topic I posted a while ago which is ultimately what I'm going for, but I couldn't figure out how to highlight text in a RichEdit control, which is why I switched to using a virtual listview instead. It was also much slower than a virtual listview, too. At least, this code was :idiot:

Thanks for all the help!

Share this post


Link to post
Share on other sites

If I were to make some assumptions from looking at your example above, I could offer some useful speedups, but I'm afraid I'd fall prey to the ASS-U-ME trap again.

I'm guessing that either of these assumptions would be incorrect:

1. Since every search key in the example above begins with "struct" they all will?

2. Since in the example in every line that does contain a matching search key, the the matching string begins in column 1 they always will?

Lacking some pattern or rules regarding the content of the search keys or the location of the target data, I can't see an avenue for improving beyond the current brute-force method.

Share this post


Link to post
Share on other sites

1. Since every search key in the example above begins with "struct" they all will?

Ok, I see what you mean.The search criteria could be anything. It doesn't have to start with "struct". It could be "ACE" as well. "Struct" was just what I had been using.

2. Since in the example in every line that does contain a matching search key, the the matching string begins in column 1 they always will?

The search criteria could be in any part of the line, not necessarily at the beginning. It could also be somewhere in the middle...which makes things rather difficult, of course.

I'm basically trying to find whatever search criteria is in the $aRestore array within the text file that is being read into the $aNewFile array. I'm really only after the line number that has the search criteria in it. My ultimate goal is to display the text file in a GUI and highlight the lines that match the search criteria that is set by the user. So, if someone adds "hamburger" to the search array and it's found within one of the lines of the text file, that entire line is highlighted.

I hope I'm explaining this right...

Attached is a screenshot of something similar to what I'm trying to accomplish.

post-8728-0-73283400-1348245781_thumb.gi

Share this post


Link to post
Share on other sites

since the data would most probably be Strings [if not this is useless]

u cn use 26 arrays which have strings sorted in accordance to the starting alphabet

when you search for a data get the first alphabet and search from the index

like for searching of 'Hello' you should get data from $Data8


My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites

Okay, I think I get what you are trying to do - here are my thoughts on the subject:

You do not need to search a string again after a match has already been found. So for each new search pattern you could avoid certain array elements on subsequent runs. Increases in speed will depend on the percentage of matches you expect to find. If it is a high number then the difference in speed should be greater.

There are several ways to achieve this. I would shift matching elements to the end of the array. The first match found will be swapped with the final element Ubound(Array) -1, The second match swapped with Ubound(Array) -2 etc. The next time you loop through the array you only go as far as Ubound(Array) -$matches_Already_Found -1. This is a complicated proceedure and requires careful handling of loop iteration count, and determining when to quit a loop.

Edited by czardas

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...