Mingre Posted July 29, 2016 Author Share Posted July 29, 2016 (edited) @jchd (and to other kind souls) I have two HTML files saved as <*.txt>: one is the original version (ORIG.txt) and the other is the simplified one (SIMPLE.txt; simplified in the sense that all tag parameters are stripped off, e.g., <p style="bla bla"> turned to <p>). (Both are attached on this post.) I already got another code working, but for curiosity's sake, can you guys enlighten me why this code doesn't work on the original HTML? expandcollapse popup#include <Array.au3> Local $fileread = FileRead(@ScriptDir & '\ORIG.txt') ; Function does not work on this ;Local $fileread = FileRead(@ScriptDir & '\SIMPLE.txt') ; Function works on the simplified version of HTML. Local $ha[1] __retrieveList($ha, $fileread) _ArrayDisplay($ha) Func __retrieveList(ByRef $array, Const $string, $iRecursion = -1) If $iRecursion = -1 Then ReDim $array[1][2] $array[0][0] = '' $array[0][1] = '' EndIf $iRecursion += 1 Local Const $regEx_Start = "<li[^>]*?>", _ $regEx_End = "<\/li>", _ $regex = '(?imsx)' & _ '(?(DEFINE) (?<LiStart> ' & $regEx_Start & ' ) )' & _ '(?(DEFINE) (?<LiEnd> ' & $regEx_End & ' ) )' & _ '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _ '(?&LiBlock)' Local $data = StringRegExp($string, $regex, 3) ; Not Const because this will be modified later. Local $aTemp, $iUbound For $i = 0 To UBound($data) - 1 Step +1 $data[$i] = StringRegExpReplace($data[$i], '(?imsx)\A' & $regEx_Start & '(.*)' & $regEx_End & '\Z', '$1') $iUbound = UBound($array) If String($array[$iUbound - 1][0]) = "" Then $iUbound -= 1 ReDim $array[$iUbound + 1][2] $array[$iUbound][0] = $iRecursion ;$data[$i] If Not StringRegExp($data[$i], $regex) Then $array[$iUbound][1] = $data[$i] ContinueLoop EndIf $aTemp = StringRegExp($data[$i], '(?imsx)(\A.*?)(?:' & $regEx_Start & ')', 3) $array[$iUbound][1] = $aTemp[0] __retrieveList($array, $data[$i], $iRecursion) Next $iRecursion -= 1 Return 1 ; $data EndFunc ;==>__retrieveList SCiTE output: >Running:(3.3.14.2):C:\Program Files\AutoIt3\autoit3.exe "C:\Documents and Settings\G99\Desktop\__retrieveList.au3" --> Press Ctrl+Alt+Break to Restart or Ctrl+Break to Stop !>19:39:49 AutoIt3.exe ended.rc:-1073741819 +>19:39:49 AutoIt3Wrapper Finished. >Exit code: 3221225477 Time: 0.6851 Anyway, here's the other code I'm referring to, quite a different approach but basically does what I want. This works on both HTML files. expandcollapse popup#include <Array.au3> Local $fileread = FileRead(@ScriptDir & '\ORIG.txt') ; This works. ;Local $fileread = FileRead(@ScriptDir & '\SIMPLE.txt') ; This works. Local $ha[1] __retrieveList($ha, $fileread) _ArrayDisplay($ha) Func __retrieveList(ByRef $array, $s__parsedRight, $i__recurse = -1) If $i__recurse = -1 Then ReDim $array[1][2] $array[0][0] = '' $array[0][1] = '' EndIf $i__recurse += 1 Local Const $s__keyWord = 'li', _ $s__start = "<" & $s__keyWord & "[^>]*?>", _ $s__end = "<\/" & $s__keyWord & ">", _ $s__regEx = "(?ims)\A.*?(" & $s__start & ".*?" & $s__end & ")" Local $s__parsedLeft = '', $a__temp[1], $i__uBound Do Switch UBound(StringRegExp($s__parsedLeft, $s__start, 3)) Case 0 Case UBound(StringRegExp($s__parsedLeft, $s__end, 3)) $a__temp = StringRegExp($s__parsedLeft, '\A' & $s__start & '(.*)' & $s__end & '\Z', 3) $i__uBound = UBound($array) If String($array[$i__uBound - 1][0]) = "" Then $i__uBound -= 1 ReDim $array[$i__uBound + 1][2] $array[$i__uBound][0] = $i__recurse $array[$i__uBound][1] = $a__temp[0] If StringRegExp($a__temp[0], $s__start) Then __retrieveList($array, $a__temp[0], $i__recurse) Case Else $s__parsedLeft &= __parse($s__parsedRight, "(?ims)(\A.*?" & $s__end & ")") ContinueLoop EndSwitch $s__parsedLeft = __parse($s__parsedRight, $s__regEx) Until @error $i__recurse -= 1 EndFunc ;==>__retrieveList Func __parse(ByRef $parsedString, $s__regEx) Local Const $a__temp = StringRegExp($parsedString, $s__regEx, 3) If @error Then Return SetError(1, 0, 0) $parsedString = StringRegExpReplace($parsedString, $s__regEx, "") Return $a__temp[0] EndFunc ;==>__parse SIMPLE.txt ORIG.txt Edited July 29, 2016 by Mingre Added SCiTE output. Link to comment Share on other sites More sharing options...
jchd Posted July 29, 2016 Share Posted July 29, 2016 I have no time to dig into this down to details, but I strongly suspect that the crash is due to PCRE (the regexp engine) exploding the available stack space allocated. There are request underneath to compile PCRE into AutoIt with an option forcing use of the heap in lieu of the stack, getting rid of this kind of issues. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Mingre Posted July 29, 2016 Author Share Posted July 29, 2016 Ohh, thanks! Link to comment Share on other sites More sharing options...
czardas Posted July 29, 2016 Share Posted July 29, 2016 (edited) 21 hours ago, jchd said: English "couple" often means 3 I get exactly where you're coming from: one or two equals a few! BitOR(1,2) Edited July 29, 2016 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now