UnKnOwned Posted April 15, 2013 Share Posted April 15, 2013 Good Day,I have a tag inside a HTML file needs parsing.TAG <div class="W3 Schools Link WHITESPACES " title="Hello World "></div>CURRENT METHOD$Delim1=StringSplit($tagsParameters, ' ') ; ...I did not account for this reaction initially =/ For $k = 1 to $Delim1[0] If $Delim1[$k] = '' then ; catch empty strings Else $Delim2 = StringSplit($Delim1[$k], '=') For $l = 1 to $Delim2[0] $KeyValue = $Delim2[1] ; *** KEY VALUE NAME *** $KeyParameter = $Delim2[2] ; *** KEY PARAMETER VALUE *** Next $KeyParameter = StringReplace($KeyParameter, '"', "") EndIf NextI'm a moron of course, not compensating for all those " SPACES"FUTILE ATTEMPTS$DelimX = StringSplit($tagsparameters, '="') For $l = 1 to $DelimX[0] $KeyValue = $DelimX[1] ; *** KEY VALUE NAME *** $KeyParameter = $DelimX[3] ; *** KEY PARAMETER VALUE *** Next msgbox(0, '', 'Key Value Name: '&$KeyValue&@LF&'Key Parameter: '&$KeyParameter)This of course does not work because 'title="Hello World "' exist in array[4]I can't seem to remember how to get For $x = 1 to $arr[0][0] to cooperate properly.I'm overlooking something I already know the answer to, but too many hours staring at this screen has made me biased.The function was made completely adbsent-minded of the fact that there would be spaces in quoted strings.To much of THIS_AndThatAndThisAndThat again, has made me lazy.Regex solutions are fine but I would prefer to keep it as is.Thank You & Regards,UnKnOwned Link to comment Share on other sites More sharing options...
JohnOne Posted April 15, 2013 Share Posted April 15, 2013 Why not just strip the string of spaces first. StringStripWS maybe. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
UnKnOwned Posted April 15, 2013 Author Share Posted April 15, 2013 (edited) JohnOne, Thank you for the prompt response. My apologies for not being more specific in the op. The quoted string needs to remain intact as it is passed along to another function. When the string is passed along it throws the Array has incorrect subscripts error because of how the string was parsed initially. The parameters return incorrectly & "Hello becomes and incorrect return value whereas "HelloWorld" returns correctly. Edited April 15, 2013 by UnKnOwned Link to comment Share on other sites More sharing options...
JohnOne Posted April 15, 2013 Share Posted April 15, 2013 I'll be honest, I cannot even tell what it is supposed to be doing. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
UnKnOwned Posted April 15, 2013 Author Share Posted April 15, 2013 (edited) $tagsparameters = <div class="W3 Schools Link WHITESPACES " title="Hello World "></div> Loop through these parameters. This is nothing more than a giant loop that runs through a list of functions processing the parameters as it goes. So... $KeyValue = class $KeyParameter = "W3 Schools Link WHITESPACES" ...and $KeyValue = title $KeyParameter = "Hello World " cont... If formatted as "HelloWorld" it works fine, formatted "Hello World" does not because in the 1st ex. code parses the string at whitespaces. class="W3 Schools Link WHITESPACES" (right here is the 1st parse) title="Hello World" someOtherParam="whatever" etc... Instead using the 1st example code causes class="W3(1st parse) Schools Link WHITESPACES" which is wrong. In the second example code it works more effectively but passes over all other names & params. Class= would be picked up but not title= I'm terribly sorry I hope this some what makes sense... Again I'm being overly complicated in my explanations, sorry. "HTML PARSER" Simple get the parameters and values inside the tag. Edited April 15, 2013 by UnKnOwned Link to comment Share on other sites More sharing options...
JohnOne Posted April 15, 2013 Share Posted April 15, 2013 Is there a reason why not just use DaleHohm's IE UDF and get the info cleanly and correctly AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
kylomas Posted April 15, 2013 Share Posted April 15, 2013 UnKnOwned, This works for the string posted. I use regexp because its the easiest (best) way to do this. local $tagsparameters = '<div class="W3 Schools Link WHITESPACES " title="Hello World "></div>' local $aret = StringRegExp($tagsparameters,'(\w+="[ \w]+")',3), $aTemp for $1 = 0 to ubound($aret) - 1 $aTemp = stringsplit($aret[$1],'=') if $aTemp[0] <> 2 then ConsoleWrite('Error - ' & $aret[$1] & @LF) ConsoleWrite(stringformat('Key #' & $1 & ' = %-20s value = %-20s',$aTemp[1],$aTemp[2]) & @LF) next kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
UnKnOwned Posted April 15, 2013 Author Share Posted April 15, 2013 JohnOne & kylomas, Thank you both very much for your assistance. JohnOne I am curious of this approach as it provides the data needed in a couple other areas. Could you give an example possibly grabbing not just the inner text but the values and parameters as well? I am having a bit of trouble getting... #include Local $oIE = _IEDocReadHTML("Test.html") Local $oElements = _IETagNameAllGetCollection($oIE) For $oElement In $oElements MsgBox(0, "Element Info", "Tagname: " & $oElement.tagname & @CR & "innerText: " & $oElement.innerText) Next So, read the locally stored file get the tag names, values, properties, and inner text. kylomas I do agree with you in being the easiest way, and perhaps even the best, but correct me if I am wrong. Is it the fastest way? If I am wrong I will more than likely switch over my approach. Link to comment Share on other sites More sharing options...
kylomas Posted April 15, 2013 Share Posted April 15, 2013 (edited) UnKnOwned,kylomas I do agree with you in being the easiest way, and perhaps even the best, but correct me if I am wrong. Is it the fastest way?If I am wrong I will more than likely switch over my approach.No, this is the worst way to do it, for a variety of reasons. Read the IE doc and you'll see why that is the best approach. I only offered the string parser solution because you were already on that road.kylomas Edited April 15, 2013 by kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
UnKnOwned Posted April 15, 2013 Author Share Posted April 15, 2013 (edited) kylomas,Thank you for clarifying, I thought perhaps there was something I was missing. So, IE UDF it is. I searched local help file & here with little to no luck.How do I go about reading from a locally stored HTML & using _IETagNameAllGetCollection($oIE)?<div value1="param 1" value2="param 2" />INNER TEXT</div>I think I am missing something again, I can not find how to retrieve the params or formatting properly Edited April 15, 2013 by UnKnOwned Link to comment Share on other sites More sharing options...
kylomas Posted April 16, 2013 Share Posted April 16, 2013 UnKnOwned, Here is an example of something i use to process downloaded HTML. Maybe you can adapt it. #include <IE.au3> local $fln = 'k:\sd\sd0100\nba\boxes\400440940' ; this is a text file downloaded with inetget _get_links( fileread($fln) ) ConsoleWrite(@error & @LF) func _get_links($html) Local $o_htmlfile = ObjCreate('HTMLFILE'), $str If Not IsObj($o_htmlfile) Then Return SetError(-1) $o_htmlfile.open() $o_htmlfile.write($html) $o_htmlfile.close() Local $ocol = _IETagnameGetCollection($o_htmlfile, 'a') if not isobj($ocol) then return seterror(-2) for $o in $ocol ConsoleWrite('innertext = ' & $o.innertext & @LF) ConsoleWrite('href = ' & $o.href & @LF) ConsoleWrite('-----------------------' & @LF) next endfunc kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
UnKnOwned Posted April 16, 2013 Author Share Posted April 16, 2013 (edited) kylomas,Playing around with your solution allows me to store predefined tags into an array retrieving .tagname & .innerText in this manner if any of the tags are found in the document. This is working for me at the moment as a workaround for the method used in the Autoit help documentation which I can not seem to make work. Autoit Document Example#include Local $oIE = _IE_Example("basic") Local $oElements = _IETagNameAllGetCollection($oIE) For $oElement In $oElements MsgBox(0, "Element Info", "Tagname: " & $oElement.tagname & @CR & "innerText: " & $oElement.innerText) NextI've tried combining your method with the doc example, again with no luck. Local $html = ( fileread('TEST.html') ) Local $o_htmlfile = ObjCreate('HTMLFILE'), $str $o_htmlfile.open() $o_htmlfile.write($html) $o_htmlfile.close() Local $oElements = _IETagNameAllGetCollection($o_htmlfile) For $oElement In $oElements MsgBox(0, "Element Info", "Tagname: " & $oElement.tagname & @CR & "innerText: " & $oElement.innerText) NextI finally got the below to work minus the ability to retrieve the code attributes. Workaround$val = 2 Global $tags[$val] = ["td", "div"] $html = ( fileread('TEST.html') ) Local $o_htmlfile = ObjCreate('HTMLFILE'), $str $o_htmlfile.open() $o_htmlfile.write($html) $o_htmlfile.close() For $j = 0 to $val - 1 Local $x = _IETagnameGetCollection($o_htmlfile, $tags[$j]) For $x in $x MsgBox(0, '', 'TagName = ' & $x.tagname & @LF & 'InnerText = ' & $x.innertext) Next NextThis will allow me to loop through the array. I still can't figure out how to retrieve value="params" etc...Looping through in this manner is sloppy and unnecessary. I was able to get the doc help example to work withlocal files but it ended up outputting HTML HEAD BODY html comment & whole lot of other unwanted data.How can I fix this?Using Dale's IE functions how can I catch the tags and separate the major parts such as "tag name" "attributes" & "elements"/"innerText"?Example<div class="W3 SchoolsLink" title="Hello World">Some Text</div><div class="Autoit Forums" title="Rocks !!!">Some More Text</div>I would like to end up with.the following when outputted to the console/or msgbox-TagName = "<div" Class = "W3 SchoolsLink" Title = "Hello World" Element = "Some Text" TagName = "<div" Class = "Autoit Forums" Title = "Rocks !!!" Element = "Some More Text" Edited April 16, 2013 by UnKnOwned Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now