tsue Posted December 12, 2010 Share Posted December 12, 2010 hello, im trying to substrack words from links, example: <TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD> <TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_sierra01.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_sierra01.jpg" BORDER="0" ALT="Alcione&Sierra" WIDTH="150" HEIGTH="150"></A></TD> <TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_sierra02.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_sierra02.jpg" BORDER="0" ALT="Alcione&Sierra" WIDTH="150" HEIGTH="150"></A></TD> <TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_zagato01.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_zagato01.jpg" BORDER="0" ALT="Alcione&Zagato" WIDTH="150" HEIGTH="150"></A></TD> </TR><TR> i only need the link from images to .jpg for each href images/magic_knight_rayearth-love-alcione_lantys.jpg ive search in the autoit web but i have found nothing. is this possible? Link to comment Share on other sites More sharing options...
Realm Posted December 12, 2010 Share Posted December 12, 2010 Hello tsue, Have you tried StringRegExp()? This example works fine with your HTML example. #include <Array.au3> $fPath = @DesktopDir & 'HTML_Test.txt' $HTML_text = FileRead($fPath) $sre_Array = StringRegExp($HTML_text,'IMG SRC\=\"(.*?)\" BORDER',3) _ArrayDisplay($sre_Array) Realm My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry. Link to comment Share on other sites More sharing options...
tsue Posted December 12, 2010 Author Share Posted December 12, 2010 Hello tsue, Have you tried StringRegExp()? This example works fine with your HTML example. #include <Array.au3> $fPath = @DesktopDir & 'HTML_Test.txt' $HTML_text = FileRead($fPath) $sre_Array = StringRegExp($HTML_text,'IMG SRC\=\"(.*?)\" BORDER',3) _ArrayDisplay($sre_Array) Realm im trying this, but i cant manage to get yust images/magic_knight_rayearth-love-alcione_lantys.jpg here is the code $array = StringRegExp("<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>", "<(?i)TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="(.*?)"><(?i)IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>" , 3) for $i = 0 to UBound($array) - 1 msgbox(0, "RegExp Test with Option 2 - " & $i, $array[$i]) Next Link to comment Share on other sites More sharing options...
Jury Posted December 12, 2010 Share Posted December 12, 2010 im trying this, but i cant manage to get yust images/magic_knight_rayearth-love-alcione_lantys.jpg here is the code $array = StringRegExp("<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>", "<(?i)TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="(.*?)"><(?i)IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>" , 3) for $i = 0 to UBound($array) - 1 msgbox(0, "RegExp Test with Option 2 - " & $i, $array[$i]) Next try this: $array = StringRegExp($HTML_text, '(?s).*?(images.*?jpg).*?', 3, 1) For $i = 0 To UBound($array) - 1 MsgBox(0, "RegExp Test with Option 1 - " & $i, $array[$i]) Next Link to comment Share on other sites More sharing options...
GEOSoft Posted December 12, 2010 Share Posted December 12, 2010 (edited) $aImages = StringRegExp($sHTML, "(?i)HREF\s*=\s*\x22(.+\.jpg)", 3) If Not @Error Then For $i = 0 To UBound($aImages) -1 MsgBox(4096, "Result " & $i+1, $aImages[$i]) Next Else MsgBox(4096, "Error", "The expression returned error code " & @Error) EndIf Another one that might be what you are really needing would be $aImages = StringRegExp($sHTML, "(?i)<img\ssrc\s*=\s*\x22(.+\.jpg)", 3) If Not @Error Then For $i = 0 To UBound($aImages) -1 MsgBox(4096, "Result " & $i+1, $aImages[$i]) Next Else MsgBox(4096, "Error", "The expression returned error code " & @Error) EndIf EDIT: By the way, the test you ran was doomed to failure from the start becuase of all the double quotes in it "<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>"In order to use that you would wrap the whole thing in single quotes and do the same in your expression. Rather than use it in the expression, I prefer to use \x22 to check if a double-quote appears at a given position. Edited December 12, 2010 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
tsue Posted December 12, 2010 Author Share Posted December 12, 2010 try this: $array = StringRegExp($HTML_text, '(?s).*?(images.*?jpg).*?', 3, 1) For $i = 0 To UBound($array) - 1 MsgBox(0, "RegExp Test with Option 1 - " & $i, $array[$i]) Next thanks it worked, and thanks geosoft for the other information too now im trying to understand how does it work ok so (?s) makes the . next to match anything even newlines and * makes it repeat, the next i dont really understand ?(Find the smallest match instead of the largest after a repeat character) but i found that if i take it out it only gives me 1 result so all of this part was to search unti it finds everything inside () ? now insede () it will search from images to jpg? and why does it has to be added .*? again? can u help me to understand this. thanks Link to comment Share on other sites More sharing options...
Jury Posted December 12, 2010 Share Posted December 12, 2010 See attached (if I can get the attacment thingy to work).regex.html Link to comment Share on other sites More sharing options...
tsue Posted December 14, 2010 Author Share Posted December 14, 2010 See attached (if I can get the attacment thingy to work).thank you, for this all of this information, i found that it can give the same result as this$array = StringRegExp($HTML_text, '(?s)(images.*?jpg)', 3, 1)from what i have read, i havent found the reason why you add .*? outside ()thanks Link to comment Share on other sites More sharing options...
GEOSoft Posted December 14, 2010 Share Posted December 14, 2010 The *? tells PCRE to find the smallest match which is particularly important when using (?s). Uing your first post as an example, if you did not add the *? it would returnimages/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD><TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_sierra01.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_sierra01.jpg" BORDER="0" ALT="Alcione&Sierra" WIDTH="150" HEIGTH="150"></A></TD><TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_sierra02.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_sierra02.jpg" BORDER="0" ALT="Alcione&Sierra" WIDTH="150" HEIGTH="150"></A></TD><TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_zagato01.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_zagato01.jpg in element 0 of the array. Using the ? forces it to stop matching after the first .jpg and start searching for the next match. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now