Sign in to follow this  
Followers 0
tsue

substract words

9 posts in this topic

hello, im trying to substrack words from links, example:

<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>

<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_sierra01.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_sierra01.jpg" BORDER="0" ALT="Alcione&Sierra" WIDTH="150" HEIGTH="150"></A></TD>

<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_sierra02.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_sierra02.jpg" BORDER="0" ALT="Alcione&Sierra" WIDTH="150" HEIGTH="150"></A></TD>

<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_zagato01.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_zagato01.jpg" BORDER="0" ALT="Alcione&Zagato" WIDTH="150" HEIGTH="150"></A></TD>

</TR><TR>

i only need the link from images to .jpg for each href images/magic_knight_rayearth-love-alcione_lantys.jpg

ive search in the autoit web but i have found nothing.

is this possible?

Share this post


Link to post
Share on other sites



Hello tsue,

Have you tried StringRegExp()? This example works fine with your HTML example.

#include <Array.au3>

$fPath = @DesktopDir & 'HTML_Test.txt'

$HTML_text = FileRead($fPath)

$sre_Array = StringRegExp($HTML_text,'IMG SRC\=\"(.*?)\" BORDER',3)

_ArrayDisplay($sre_Array)

Realm


My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry.  

Share this post


Link to post
Share on other sites

Hello tsue,

Have you tried StringRegExp()? This example works fine with your HTML example.

#include <Array.au3>

$fPath = @DesktopDir & 'HTML_Test.txt'

$HTML_text = FileRead($fPath)

$sre_Array = StringRegExp($HTML_text,'IMG SRC\=\"(.*?)\" BORDER',3)

_ArrayDisplay($sre_Array)

Realm

im trying this, but i cant manage to get yust images/magic_knight_rayearth-love-alcione_lantys.jpg here is the code

$array = StringRegExp("<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>", "<(?i)TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="(.*?)"><(?i)IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>" , 3)

for $i = 0 to UBound($array) - 1
    msgbox(0, "RegExp Test with Option 2 - " & $i, $array[$i])
Next

Share this post


Link to post
Share on other sites

im trying this, but i cant manage to get yust images/magic_knight_rayearth-love-alcione_lantys.jpg here is the code

$array = StringRegExp("<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>", "<(?i)TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="(.*?)"><(?i)IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>" , 3)

for $i = 0 to UBound($array) - 1
    msgbox(0, "RegExp Test with Option 2 - " & $i, $array[$i])
Next

try this:

$array = StringRegExp($HTML_text, '(?s).*?(images.*?jpg).*?', 3, 1)

For $i = 0 To UBound($array) - 1
    MsgBox(0, "RegExp Test with Option 1 - " & $i, $array[$i])
Next

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

$aImages = StringRegExp($sHTML, "(?i)HREF\s*=\s*\x22(.+\.jpg)", 3)
If Not @Error Then
    For $i = 0 To UBound($aImages) -1
        MsgBox(4096, "Result " & $i+1, $aImages[$i])
    Next
Else
    MsgBox(4096, "Error", "The expression returned error code " & @Error)
EndIf

Another one that might be what you are really needing would be

$aImages = StringRegExp($sHTML, "(?i)<img\ssrc\s*=\s*\x22(.+\.jpg)", 3)
If Not @Error Then
    For $i = 0 To UBound($aImages) -1
        MsgBox(4096, "Result " & $i+1, $aImages[$i])
    Next
Else
    MsgBox(4096, "Error", "The expression returned error code " & @Error)
EndIf

EDIT:

By the way, the test you ran was doomed to failure from the start becuase of all the double quotes in it

"<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>"

In order to use that you would wrap the whole thing in single quotes and do the same in your expression.

Rather than use it in the expression, I prefer to use \x22 to check if a double-quote appears at a given position.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

try this:

$array = StringRegExp($HTML_text, '(?s).*?(images.*?jpg).*?', 3, 1)

For $i = 0 To UBound($array) - 1
    MsgBox(0, "RegExp Test with Option 1 - " & $i, $array[$i])
Next

thanks it worked, and thanks geosoft for the other information too

now im trying to understand how does it work

ok so (?s) makes the . next to match anything even newlines and * makes it repeat,

the next i dont really understand ?(Find the smallest match instead of the largest after a repeat character) but i found that if i take it out it only gives me 1 result

so all of this part was to search unti it finds everything inside () ?

now insede () it will search from images to jpg?

and why does it has to be added .*? again?

can u help me to understand this. thanks

Share this post


Link to post
Share on other sites

See attached (if I can get the attacment thingy to work).

regex.html

Share this post


Link to post
Share on other sites

See attached (if I can get the attacment thingy to work).

thank you, for this all of this information, i found that it can give the same result as this

$array = StringRegExp($HTML_text, '(?s)(images.*?jpg)', 3, 1)

from what i have read, i havent found the reason why you add .*? outside ()

thanks

Share this post


Link to post
Share on other sites

The *? tells PCRE to find the smallest match which is particularly important when using (?s). Uing your first post as an example, if you did not add the *? it would return

images/magic_knight_rayearth-love-alcione_lantys.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_lantys.jpg" BORDER="0" ALT="Alcione&Lantys" WIDTH="150" HEIGTH="150"></A></TD>

<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_sierra01.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_sierra01.jpg" BORDER="0" ALT="Alcione&Sierra" WIDTH="150" HEIGTH="150"></A></TD>

<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_sierra02.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_sierra02.jpg" BORDER="0" ALT="Alcione&Sierra" WIDTH="150" HEIGTH="150"></A></TD>

<TD WIDTH="150px" HEIGHT="150px"><A TARGET="_blank" A HREF="images/magic_knight_rayearth-love-alcione_zagato01.jpg"><IMG SRC="galeria/_magic_knight_rayearth-love-alcione_zagato01.jpg

in element 0 of the array. Using the ? forces it to stop matching after the first .jpg and start searching for the next match.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0