Andrew Sparkes Posted October 25, 2005 Posted October 25, 2005 (edited) Is there any UDF or some special technique used to parse a html file in autoit? I have a html document, and want to read a line from it, and if regular expressions arent supported, what do I do? something like <p>You have * new messages</p>and have it snag the *? Or am I missing the point entirely?I looked around and havent found anything in the help file or the forum. This is a common script task, so there is bound to be many solutions, anyone care to share one with me?Here is another (very) simple example:<html> <head> <title>Testing page</title> </head> <p> The number is 347. </p> </html>Example: $var=HTMLParse("test.htm","The number is [0-9\s][0-9\s][0-9\s].") $var=StringReplace($var,"The number is ","") $var=StringReplace($var," ","")Is there a HTMLParse function that I don't know about? or some parameters for a stringreplace?EDIT: I only looked through the stable release's online documentation, I forgot about Beta. The regular expression stuff is there, ignore the part about RegExp... The question about parsing html still exists..... Edited October 25, 2005 by Andrew Sparkes ---Sparkes.
poisonkiller Posted October 25, 2005 Posted October 25, 2005 I hope thats help: #include <INet.au3> $var = _INetGetSource("test.htm") $var = _StringBetween($var, "The number is ", ".") MsgBox(0, "", "You have " &$var &" messages") Func _StringBetween($s,$from,$to) $x=StringInStr($s,$from)+StringLen($from) $y=StringInStr(StringTrimLeft($s,$x),$to) Return StringMid($s,$x,$y) EndFunc
w0uter Posted October 25, 2005 Posted October 25, 2005 (edited) StringRegExp might be what you are after. its from the beta Edited October 25, 2005 by w0uter My UDF's:;mem stuff_Mem;ftp stuff_FTP ( OLD );inet stuff_INetGetSource ( OLD )_INetGetImage _INetBrowse ( Collection )_EncodeUrl_NetStat_Google;random stuff_iPixelSearch_DiceRoll
SpookMeister Posted October 25, 2005 Posted October 25, 2005 Perhaps it would help more if you gave us a better context of usage. Is this a case of you have the specified .htm file directly on your file system? Are you trying to check webmail somewhere? Are you recieving an email notification of some sort that contains this .htm information? ------ Off the top of my head, if this is a straight parse information from a file then you might use FileOpen() and StringInStr() If you are doing webmail, there may be a utility they provide that gives you a pop-up or tray Icon that you can manipulate to get the information. There are usually a lot of ways to skin a cat... you just need specifics on the cat in question to do it "right". [u]Helpful tips:[/u]If you want better answers to your questions, take the time to reproduce your issue in a small "stand alone" example script whenever possible. Also, make sure you tell us 1) what you tried, 2) what you expected to happen, and 3) what happened instead.[u]Useful links:[/u]BrettF's update to LxP's "How to AutoIt" pdfValuater's Autoit 1-2-3 Download page for the latest versions of Autoit and SciTE[quote]<glyph> For example - if you came in here asking "how do I use a jackhammer" we might ask "why do you need to use a jackhammer"<glyph> If the answer to the latter question is "to knock my grandmother's head off to let out the evil spirits that gave her cancer", then maybe the problem is actually unrelated to jackhammers[/quote]
eitan Posted October 25, 2005 Posted October 25, 2005 Func Parsehtml() $htmlexist=InetGet("http://www.thesite.com/your.html", "C:\your dir\yourfile.html", 1); get and save file if $htmlexist=1 then; checkpoint $file=FileOpen("C:\auto\details.htm", 0); open the file $line1=FileReadLine($file, 89); read the line you want by its number $line11=StringReplace($line1, 'junk', "") replace junk with nothing. note that i used ' and not " in case you have " in the junk fileclose($file) endfunc you continue to use with in the func the StringReplace function until your result is clean. usualy 2 sweeps will do it cheers
eitan Posted October 25, 2005 Posted October 25, 2005 oh i forgot the endif !!! add it in please eitan said: Func Parsehtml()$htmlexist=InetGet("http://www.thesite.com/your.html", "C:\your dir\yourfile.html", 1); get and save fileif $htmlexist=1 then; checkpoint$file=FileOpen("C:\auto\details.htm", 0); open the file$line1=FileReadLine($file, 89); read the line you want by its number$line11=StringReplace($line1, 'junk', "") replace junk with nothing. note that i used ' and not " in case you have " in the junkfileclose($file)endfuncyou continue to use with in the func the StringReplace function until your result is clean. usualy 2 sweeps will do itcheers
Andrew Sparkes Posted October 26, 2005 Author Posted October 26, 2005 The function that poisonkiller posted worked beautifully and I'll probably just use that, but I'm still curious about this regexp problem:I have this:<blah blah blah>The number is 10.</blah blah blah>If I stringregexp the above string for "The number is [0-9]*\.", it returns false, and is understandable, as it is testing the whole string against the pattern. I want to know if there is a function to see if the string contains a regexp pattern, return the patterned substring and work from there. ---Sparkes.
w0uter Posted October 26, 2005 Posted October 26, 2005 you need a () to let it return someting. try something like: "The number is ([0-9]*)\." My UDF's:;mem stuff_Mem;ftp stuff_FTP ( OLD );inet stuff_INetGetSource ( OLD )_INetGetImage _INetBrowse ( Collection )_EncodeUrl_NetStat_Google;random stuff_iPixelSearch_DiceRoll
erifash Posted October 26, 2005 Posted October 26, 2005 Use my string parsing function:Func _StringParse($sz_str, $sz_before, $sz_after, $i_occurance = 0) Local $sz_sp1 = StringSplit($sz_str, $sz_before, 1) If $i_occurance < 0 or $i_occurance > $sz_sp1[0] Then SetError(1) Return "" EndIf Local $sz_sp2 = _Test($i_occurance = 0, StringSplit($sz_sp1[$sz_sp1[0]], $sz_after, 1), StringSplit($sz_sp1[$i_occurance + 1], $sz_after, 1)) Return $sz_sp2[1] EndFunc Func _Test($b_Test, $v_True, $v_False) If $b_Test Then Return $v_True Return $v_False EndFuncExample:<tag>first occurance</tag> <tag>second occurance</tag> <tag>third and last occurance</tag> _StringParse($str, "<tag>", "</tag>") would give you "third and last occurance" _StringParse($str, "<tag>", "</tag>", 1) would give you "first occurance" _StringParse($str, "<tag>", "</tag>", 2) would give you "second occurance" _StringParse($str, "<tag>", "</tag>", 3) would give you "third and last occurance" My UDFs:_FilePrint() | _ProcessGetName() | _Degree() and _Radian()My Scripts:Drive Lock - Computer Lock Using a Flash DriveAU3Chat - Simple Multiuser TCP ChatroomStringChunk - Split a String Into Equal PartsAutoProxy - Custom Webserver
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now