Jump to content
Sign in to follow this  
vij

Read Lines and Search String

Recommended Posts

vij

I would like to read html body as lines and then look for a text that follows after a particular string

Starting it like so:

#include <IE.au3>


$ieObj=_IECreate("https://ahrefs.com/index.php")

$str = _IEBodyReadHTML($ieObj)

Need to search for the string that follows after "a.src=document.location.protocol+" in the html body

which is "//dnn506yrbagrg.cloudfront.net/pages/scripts/0014/6260.js?"+Math.floor(new Date().getTime()/3600000);

Anyone already have a function that does it?

Thanks :)

Edited by vij

Share this post


Link to post
Share on other sites
jdelaney

Use _IELinkGetCollection, loop through until you find the obj.src you need, then use the proper _IEPropertyGet|_IEFormElementGetValue to grab your data

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
vij

Use _IELinkGetCollection, loop through until you find the obj.src you need, then use the proper _IEPropertyGet|_IEFormElementGetValue to grab your data

The data I am looking for is not part of a link or html tag... Its part of a java script text within the page https://ahrefs.com/index.php

Need to search for the string that follows "a.src=document.location.protocol+"

#include
$ieObj=_IECreate("https://ahrefs.com/index.php") $str = _IEBodyReadHTML($ieObj)
Edited by vij

Share this post


Link to post
Share on other sites
jdelaney

Same difference, but use

__IEGetObjByName (script is the name)

rather than _IELinkGetCollection

Inside the loop, do stringinstr for the obj.innertext

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
vij

Same difference, but use

__IEGetObjByName (script is the name)

rather than _IELinkGetCollection

Inside the loop, do stringinstr for the obj.innertext

I dont understand. How can that get the text after "a.src=document.location.protocol+" which is

//dnn506yrbagrg.cloudfront.net/pages/scripts/0014/6260.js?

From

function MentionsCheckForm()
{
$('div.alert').remove();
$('input.alert-border').removeClass('alert-border');
var Ret = true;

var request = $.trim(MentionRequestObj.val());
     if (request == '')
     {
         MentionRequestObj.addClass('alert-border').after('<div class="alert font90">Incorrect Request</div>');
Ret = false;
     }
     if (Ret)
{
ProcessObj.show();
}
else
{
ProcessObj.hide();
}
return Ret;
}
</script>
<script type="text/javascript">
setTimeout(function(){var a=document.createElement("script");
var b=document.getElementsByTagName("script")[0];
a.src=document.location.protocol+"//dnn506yrbagrg.cloudfront.net/pages/scripts/0014/6260.js?"+Math.floor(new Date().getTime()/3600000);
a.async=true;a.type="text/javascript";b.parentNode.insertBefore(a,b)}, 1);
</script></body>
</html>
Edited by vij

Share this post


Link to post
Share on other sites
Melba23

vij,

Putting that text into a file (so that you do not have to worry about the single/double quote mix getting it into a string) makes it very easy to extract what you want: ;)

$sText = FileRead("Text.txt")

$aExtract = StringRegExp($sText, "(?i)protocol\+\x22(.*js)\?", 3)

ConsoleWrite($aExtract[0] & @CRLF)

M23

  • Like 1

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
jdelaney

#include <IE.au3>
$oIE=_IECreate("https://ahrefs.com/index.php")
_IELoadWait($oIE)
$oScriptCol = $oIE.document.GetElementsByTagName("Script")
ConsoleWrite($oScriptCol.length & @CRLF)
For $oScript In $oScriptCol

 If StringInStr($oScript.innertext, "a.src=document.location.protocol+") Then
  $adata = StringRegExp($oScript.innertext, "(?i)protocol\+\x22(.*js.*\);\s)", 3)
;~   $start = StringInStr($oScript.innertext, "a.src=document.location.protocol+")
;~   $end = StringInStr($oScript.innertext, "; ",default,Default,$start+1)
;~   ConsoleWrite($oScript.innertext & @CRLF & $end & @CRLF & StringMid($oScript.innertext,$start+StringLen("a.src=document.location.protocol+"),$end-10) & @CRLF)
  ConsoleWrite($adata[0] & @CRLF)
 EndIf
Next

Edited by jdelaney
  • Like 1

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
vij

Thank you very much melba23 and jdelany. I was getting into a rut and you guys opened things for me :huggles:

And, I did this.

#include <File.au3>
#include <IE.au3>


$oIE=_IECreate("https://ahrefs.com/index.php")

$str=_IEBodyReadHTML($oIE)
$answer=StringRegExp($str, "(?<=a\.src=document\.location\.protocol\+).*(?=\+Math\.floor\(new\ Date\(\)\.getTime\(\)/3600000\);)",1)
ConsoleWrite($answer[0])

For those who find regexp daunting, there are regex designers. I used the one that comes with zennoposter -helps you construct those regular expressions quick :)

Edited by vij

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×