Sign in to follow this  
Followers 0
xeroTechnologiesLLC

[Resolved] reading a string off a webpage and assign to variable

14 posts in this topic

#1 ·  Posted (edited)

Greetings,

I've been using vbscript, html and various other languages to build a script from secureCRT to capture certain data off a webpage after a few various irrelevant steps and that works just fine.

But it's using abilities in vbScript to read the entire webpage, a regular expression to capture the first id# then assign it to a variable.

I just started working with AutoIT and was just curious if there was a better way.

Essentially - capture the first id# (or id# with a predictable report name beside it: 12345 team42_1:08pm) and assign it to a variable to be used later in the program.

I'm extremely new to AutoIT so I'm not sure what the logic / syntax to do this would be.

The second part of the problem is to be able to read a link's a href value, but I'm not quite to that stage yet and haven't looked through the forums and such for that yet. I only post it here to give some kind of idea what i will need the report ID in later.

Thanks in advance for any time available to answer this inquiry.

-Nick

Edited by xeroTechnologiesLLC

Share this post


Link to post
Share on other sites



The _INetGetSource, StringInStr, StringSplit, and StringMid functions will probably be of use to you here. These are certainly not the only functions that could be used in a case like this, but I use them for a similar thing.

For instance; I use code like this to find a particular word on a webpage, then assign that word to a variable and pass it on to another function.

Local $Text_Array[5] = ["Registering", "Creating", "Modifying", "Activating", "Viewing User Profile"]

$Source = _INetGetSource($URL)
For $a = 0 To UBound($Text_Array) - 1
 $KeyWord = StringInStr($Source, $Text_Array[$a], 1)
 If $KeyWord <> 0 Or $Debug Then ;If text does exist
  $FoundWord = StringSplit(StringMid($Source, $KeyWord), " ")
  Actions($FoundWord[1])
 EndIf
Next

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

I get an invalid function error on using

_INetGetSource

is there an include that i need to use?

Share this post


Link to post
Share on other sites

nevermind - include inet.au3 ftw

so once i have the source - how can i get it into an array?

i'm assuming that simply assigning the source to a variable, the variable won't hold all of it due to character size, or will it?

Share this post


Link to post
Share on other sites

xeroTechnologiesLLC,

Variable size is not an issue. The help file does not give a max size so memory might be the only real restriction (not sure of that - perhaps an expert could advise). The first part of your problem is easily handled using the solution that somdcomputerguy gave in post #2.

Parsing out href strings can be accomplished in a number of ways. Provide a URL to the WEB site that you are parsing for further help.

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

so i figured out what my problem was.

using inetgetsource doesn't read the active page/ie Object apparently.

i'm writting this for a page thats behind a secure service and requires a login. i send the login at the beginning of the script and it logs in just fine, but when I try to capture the source - it doesn't. I get the 'please login' error page.

thoughts?

Share this post


Link to post
Share on other sites

unfortunately i am unable to provide the url as noted, it's an internal secure web tool :

(good times right? :) )

Share this post


Link to post
Share on other sites

xeroTechnologiesLLC,

You have this working in VBS? VB script is easily translated to autoIT.

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

disregard - found temporary solution:

$pagesource = _IEDocReadHTML($oIE)

wrote the output to a file to verify and it is capturing the right data.

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

xeroTechnologiesLLC, This will get you all "href"'s (assuming that stmt starts with "href='" and ends with next"'")

$href_array = StringRegExp($Value, "href='([^#].*?)'", 3)
Where $value = the string from your WEB source $href_array = an array of href elements kylomas

edit: changed regex to not list "#" hrefs

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

apologies for a dumb question continuation,

how do i take that and capture the first iD# on the page?

I don't necessarily need the link, i just need to capture the first 5 digit number that shows up on the page (first one being a known predictable inevitability).

I'm headed home for the night but will be back in tomorrow to continue working on this.

I am not able to easily convert the vb version over to this either. i'm finding a lot of the syntax and such are not the same and that is what i am having trouble with.

thanks.

Below is what i had before that works:

' Below is to read the ID page so we can capture the ID# from the HTML
' It's assigned to an array because a string will not hold enough characters
dim plusJunk
  set junk = ie.document
  plusJunk = junk.body.innerHTML
dim data_array
  data_array = split(plusJunk, chr(13))
' Building the array in a way that we can read if we need to break down the report page for testing
dim a, temp
  temp = ""
  for a = 0 to ubound(data_array)
   temp = temp & cstr(a) & "  " & data_array(a) & chr(13)
  next
' This segment is creating a file to output data to.  Used only in testing - commented (REM) out otherwise. 
   Stuff = temp
   REM Set myFSO = CreateObject("Scripting.FileSystemObject")
   REM Set WriteStuff = myFSO.OpenTextFile("c:output.txt", 8, True)
   REM WriteStuff.WriteLine(Stuff)
   REM WriteStuff.Close
   REM SET WriteStuff = NOTHING
   REM SET myFSO = NOTHING
  
' Expression
  set arrayExp = new RegExp
   arrayexp.ignoreCase = true
   arrayexp.global = false
   arrayExp.pattern = ">[0-9][0-9][0-9][0-9][0-9]<"  ' regular expression search pattern
  
' Begin cycling through the array for the regular expression pattern above to find the ID# used in pulling the report
  for a = lbound(data_array) to ubound(data_array)
   if arrayExp.test(data_array(a)) then
    exit for
   end if
  next 
  
' Break down the ID# from the array line string found above
  for b = len(data_array(a)) to 1 step -1
   temp = mid(data_array(a), b, 1)
    if (asc(temp) >= 48) and (asc(temp) <= 57) then
     temp = mid(data_array(a), b - 4, 5)
  exit for
    end if
  next

Share this post


Link to post
Share on other sites

$idNumber=StringRegExp($pagesource, "[^0-9a-zA-z]([0-9][0-9][0-9][0-9][0-9])[^0-9a-zA-z]", 3)
$keyword=stringinstr($pagesource, $idNumber, 0)
MsgBox(0,"",$keyword)

...is what i have now to try and capture the ID number on the page.

$pagesource is the information read by using _IEDocReadHTML and I've used a message box to verify that it is capturing everything I need it to...so it's good there.

But for some reason, the above code just returns blank information, not finding the 5 digit number that represents the ID number. In the source it reads like >12345< so I've tried various expressions:

[0-9][0-9][0-9][0-9][0-9]
[0-9]{5}
>[0-9][0-9][0-9][0-9][0-9]<
>[0-9]{5}<

None of them seem to be capturing anything.

Thoughts?

Share this post


Link to post
Share on other sites

$pagesource = _IEDocReadHTML($oIE)
$idNumber=StringRegExp($pagesource, ">[0-9][0-9][0-9][0-9][0-9]<", 0)
MsgBox(0,"",$idNumber)
$keyword=stringinstr($pagesource, $idNumber, 0, 1)
MsgBox(0,"",$keyword)

...this is the code i'm working with as of 10:07am.

the first message box returns a 1

the second a 40

...not sure why.

Share this post


Link to post
Share on other sites

$idNumber=StringRegExp($pagesource, ">[0-9][0-9][0-9][0-9][0-9]<", 1)
$idNumber[0]=StringReplace($idNumber[0], ">", "")
$idNumber[0]=StringReplace($idNumber[0], "<", "")
msgbox(0, "d", $idNumber[0])

disregard - i was missing the whole .... access the array item...part...

noob 101 :)

[RESOLVED]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0