Jump to content

StringRegExp confirmation


Recommended Posts

Hallo,

I'm new to AutoIt and enjoying it alot atm, made some pretty cool stuff!

The only thing i'm struggling with is RegExp's :-/

However after much confusion and many different patterns i managed to get what i wanted working, what i'm trying to confirm is if i have done it correctly.

To all of you guys this is going to be the most simple RegExp match you've seen, i get truly suck when it comes to these

This is meant to scan through the HTMl pulled from a web page and find "<TR" none-case sensitive and then show me how many it found.

The following works but have i done it correctly?

$html = "<TR tes tesn .... yest /\/\@?''' <tR 1111 <tr adawd"

$isFound = StringRegExp($html, "(?i)\<TR", 3)



For $element IN $isFound
    ConsoleWrite($element & @CRLF)
Next

ConsoleWrite("Total Matches Found: " & UBound($isFound) & @CRLF)

Just looking for some advice TBH :)

Anything is appreciated.

Thanks

Edited by Steveiwonder

They call me MrRegExpMan

Link to comment
Share on other sites

Hallo,

I'm new to AutoIt and enjoying it alot atm, made some pretty cool stuff!

The only thing i'm struggling with is RegExp's :-/

However after much confusion and many different patterns i managed to get what i wanted working, what i'm trying to confirm is if i have done it correctly.

To all of you guys this is going to be the most simple RegExp match you've seen, i get truly suck when it comes to these

This is meant to scan through the HTMl pulled from a web page and find "<TR" none-case sensitive and then show me how many it found.

The following works but have i done it correctly?

$html = "<TR tes tesn .... yest /\/\@?''' <tR 1111 <tr adawd"

$isFound = StringRegExp($html, "(?i)\<TR", 3)



For $element IN $isFound
    ConsoleWrite($element & @CRLF)
Next

ConsoleWrite("Total Matches Found: " & UBound($isFound) & @CRLF)

Just looking for some advice TBH :)

Anything is appreciated.

Thanks

Your pattern will work but there is an easier way which could be used as long as all you really need is the count.

$html = "<TR tes tesn .... yest /\/\@?''' <tR 1111 <tr adawd"
StringRegExpReplace($html, "(?i)<tr.*?>", "")
If @Extended Then MsgBox(0, "Result", "There are " & @Extended & " <tr> elements on the page")

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

StringRegExp() is nice (and very geeky, if you're into that), but not always the fastest way. This might be quicker to just count instances:

; Generate about 1K lines
$html = "<TR tes tesn .... yest >/\/\@?''' <tR 1111> <tr adawd>" & @CRLF
For $n = 1 To 10
    $html &= $html
Next

; With StringRegExp()
$iTimer = TimerInit()
For $n = 1 To 1000
    $isFound = StringRegExp($html, "(?i)\<TR", 3)
Next
$iCount = UBound($isFound)
$iTimer = TimerDiff($iTimer)
ConsoleWrite("Total StringRegExp() Matches Found: " & $iCount & "; In " & Round($iTimer/1000, 3) & "sec" & @CRLF)

; With StringReplace
$iTimer = TimerInit()
For $n = 1 To 1000
    $isFound = StringReplace($html, "<TR", "")
Next
$iCount = @extended
$iTimer = TimerDiff($iTimer)
ConsoleWrite("Total StringReplace() Matches Found: " & $iCount & "; In " & Round($iTimer/1000, 3) & "sec" & @CRLF)

Results on my CPU:

Total StringRegExp() Matches Found: 3072; In 28.271sec
Total StringReplace() Matches Found: 3072; In 8.33sec

About three times as fast.

You would still want to use StringRegExp() for more complicated matches (i.e. "TR tags that do not contain any TD tags").

:)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

StringRegExp() is nice (and very geeky, if you're into that), but not always the fastest way. This might be quicker to just count instances:

; Generate about 1K lines
$html = "<TR tes tesn .... yest >/\/\@?''' <tR 1111> <tr adawd>" & @CRLF
For $n = 1 To 10
    $html &= $html
Next

; With StringRegExp()
$iTimer = TimerInit()
For $n = 1 To 1000
    $isFound = StringRegExp($html, "(?i)\<TR", 3)
Next
$iCount = UBound($isFound)
$iTimer = TimerDiff($iTimer)
ConsoleWrite("Total StringRegExp() Matches Found: " & $iCount & "; In " & Round($iTimer/1000, 3) & "sec" & @CRLF)

; With StringReplace
$iTimer = TimerInit()
For $n = 1 To 1000
    $isFound = StringReplace($html, "<TR", "")
Next
$iCount = @extended
$iTimer = TimerDiff($iTimer)
ConsoleWrite("Total StringReplace() Matches Found: " & $iCount & "; In " & Round($iTimer/1000, 3) & "sec" & @CRLF)

Results on my CPU:

Total StringRegExp() Matches Found: 3072; In 28.271sec
Total StringReplace() Matches Found: 3072; In 8.33sec

About three times as fast.

You would still want to use StringRegExp() for more complicated matches (i.e. "TR tags that do not contain any TD tags").

:)

P.S. If you are working with an active instance of IE, you could also just do _IETagNameGetCollection() and check @extended for the count. I haven't timed that.

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

@ GEOSoft - Your code didn't seem to do anything :)

Did it work for you?

@Psalty will have a look at this and see how i get on, thanks.. and how come its so fast?

Is there anywhere i can learn some more about autoit RegExp's so i don't have to bug people on here?

They call me MrRegExpMan

Link to comment
Share on other sites

Change it to this

$html = "<TR tes tesn .... yest /\/\@?''' <tR 1111 <tr adawd"
StringRegExpReplace($html, "(?i)<tr.*?>", "")
$iCount = @Extended
MsgBox(0, "Result", "There are " & $iCount & " <tr> elements on the page.")

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Thank alot both of you. Both work as needed.

I'm gonna use Geosoft's version for one reason only, i have no idea how to use Regular Expression yet and i need to learn so i figure this is the best way to start. It also seems more flexible for future use? (Correct me if i'm wrong)

but thanks again both of you :)

They call me MrRegExpMan

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...