vij Posted May 1, 2013 Posted May 1, 2013 (edited) I would like to read html body as lines and then look for a text that follows after a particular string Starting it like so: #include <IE.au3> $ieObj=_IECreate("https://ahrefs.com/index.php") $str = _IEBodyReadHTML($ieObj) Need to search for the string that follows after "a.src=document.location.protocol+" in the html body which is "//dnn506yrbagrg.cloudfront.net/pages/scripts/0014/6260.js?"+Math.floor(new Date().getTime()/3600000); Anyone already have a function that does it? Thanks Edited May 1, 2013 by vij
jdelaney Posted May 1, 2013 Posted May 1, 2013 (edited) Use _IELinkGetCollection, loop through until you find the obj.src you need, then use the proper _IEPropertyGet|_IEFormElementGetValue to grab your data Edited May 1, 2013 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
vij Posted May 1, 2013 Author Posted May 1, 2013 (edited) Use _IELinkGetCollection, loop through until you find the obj.src you need, then use the proper _IEPropertyGet|_IEFormElementGetValue to grab your data The data I am looking for is not part of a link or html tag... Its part of a java script text within the page https://ahrefs.com/index.php Need to search for the string that follows "a.src=document.location.protocol+" #include $ieObj=_IECreate("https://ahrefs.com/index.php") $str = _IEBodyReadHTML($ieObj) Edited May 1, 2013 by vij
jdelaney Posted May 1, 2013 Posted May 1, 2013 (edited) Same difference, but use __IEGetObjByName (script is the name) rather than _IELinkGetCollection Inside the loop, do stringinstr for the obj.innertext Edited May 1, 2013 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
vij Posted May 1, 2013 Author Posted May 1, 2013 (edited) Same difference, but use __IEGetObjByName (script is the name) rather than _IELinkGetCollection Inside the loop, do stringinstr for the obj.innertext I dont understand. How can that get the text after "a.src=document.location.protocol+" which is //dnn506yrbagrg.cloudfront.net/pages/scripts/0014/6260.js? From function MentionsCheckForm() { $('div.alert').remove(); $('input.alert-border').removeClass('alert-border'); var Ret = true; var request = $.trim(MentionRequestObj.val()); if (request == '') { MentionRequestObj.addClass('alert-border').after('<div class="alert font90">Incorrect Request</div>'); Ret = false; } if (Ret) { ProcessObj.show(); } else { ProcessObj.hide(); } return Ret; } </script> <script type="text/javascript"> setTimeout(function(){var a=document.createElement("script"); var b=document.getElementsByTagName("script")[0]; a.src=document.location.protocol+"//dnn506yrbagrg.cloudfront.net/pages/scripts/0014/6260.js?"+Math.floor(new Date().getTime()/3600000); a.async=true;a.type="text/javascript";b.parentNode.insertBefore(a,b)}, 1); </script></body> </html> Edited May 1, 2013 by vij
Moderators Melba23 Posted May 1, 2013 Moderators Posted May 1, 2013 vij, Putting that text into a file (so that you do not have to worry about the single/double quote mix getting it into a string) makes it very easy to extract what you want: $sText = FileRead("Text.txt") $aExtract = StringRegExp($sText, "(?i)protocol\+\x22(.*js)\?", 3) ConsoleWrite($aExtract[0] & @CRLF) M23 vij 1 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area
jdelaney Posted May 1, 2013 Posted May 1, 2013 (edited) #include <IE.au3> $oIE=_IECreate("https://ahrefs.com/index.php") _IELoadWait($oIE) $oScriptCol = $oIE.document.GetElementsByTagName("Script") ConsoleWrite($oScriptCol.length & @CRLF) For $oScript In $oScriptCol If StringInStr($oScript.innertext, "a.src=document.location.protocol+") Then $adata = StringRegExp($oScript.innertext, "(?i)protocol\+\x22(.*js.*\);\s)", 3) ;~ $start = StringInStr($oScript.innertext, "a.src=document.location.protocol+") ;~ $end = StringInStr($oScript.innertext, "; ",default,Default,$start+1) ;~ ConsoleWrite($oScript.innertext & @CRLF & $end & @CRLF & StringMid($oScript.innertext,$start+StringLen("a.src=document.location.protocol+"),$end-10) & @CRLF) ConsoleWrite($adata[0] & @CRLF) EndIf Next Edited May 1, 2013 by jdelaney vij 1 IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
vij Posted May 2, 2013 Author Posted May 2, 2013 (edited) Thank you very much melba23 and jdelany. I was getting into a rut and you guys opened things for me And, I did this. #include <File.au3> #include <IE.au3> $oIE=_IECreate("https://ahrefs.com/index.php") $str=_IEBodyReadHTML($oIE) $answer=StringRegExp($str, "(?<=a\.src=document\.location\.protocol\+).*(?=\+Math\.floor\(new\ Date\(\)\.getTime\(\)/3600000\);)",1) ConsoleWrite($answer[0]) For those who find regexp daunting, there are regex designers. I used the one that comes with zennoposter -helps you construct those regular expressions quick Edited May 2, 2013 by vij
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now