Sign in to follow this  
Followers 0
jhinesyo

Screen Scrape - Find Text to the right of other text

5 posts in this topic

Hello all autoit-ers.... I need assistance with my idea of making a program that screen scrapes some information from a web page.

One of the software packages I use displays a web page in internet explorer that has some critical information in it. I need to be able to find a line in this text based, web page that starts with: (xxxx,xxxx) :

and return everything to the right of the : as a string.

There are many fields / text strings I wish to grab or scrape from the web page, so I figure I would need to:

"find" the beginning text string, and copy the value to the right to a variable.

Sample Web Page:

(0018,1080) : My Department

(0018,1090) : Last Name^First Name Middle^^^

(0020,1111) : 123456

(0018,0008) : My Organization

For example, if I wanted to find the name, I would search for the string (0018,1090), grab the data to the right and store in $txtName and the result would be:

$txtName = "LAST NAME^FIRST NAME MIDDLE^^^"

Can autoit screen scrape in this manner?

thanks in advance!!!!

Jeff

Share this post


Link to post
Share on other sites



Hello all autoit-ers.... I need assistance with my idea of making a program that screen scrapes some information from a web page.

One of the software packages I use displays a web page in internet explorer that has some critical information in it. I need to be able to find a line in this text based, web page that starts with: (xxxx,xxxx) :

and return everything to the right of the : as a string.

There are many fields / text strings I wish to grab or scrape from the web page, so I figure I would need to:

"find" the beginning text string, and copy the value to the right to a variable.

Sample Web Page:

(0018,1080) : My Department

(0018,1090) : Last Name^First Name Middle^^^

(0020,1111) : 123456

(0018,0008) : My Organization

For example, if I wanted to find the name, I would search for the string (0018,1090), grab the data to the right and store in $txtName and the result would be:

$txtName = "LAST NAME^FIRST NAME MIDDLE^^^"

Can autoit screen scrape in this manner?

thanks in advance!!!!

Jeff

Well, you can get the website text into a string or an array. I prefer arrays, and then one line per array element. Then loop through them and search for some text, then StringTrim that and get the remainder, something like:

for $a = 0 to UBound($lineArray)-1
 if StringLeft($lineArray[$a],11) = "(0018,1090)" then ; <-- or use a variable to search for if necessary
  $result = StringTrimLeft($lineArray[$a],14) ; <-- "(xxxx,xxxx) - " is always 14 chars; if this is variable you can search for a variable and then StringTrimLeft an amount of StringLen("stringtostrip") chars
 endif
 ; if $result <>"" then exitloop ; <-- possibly useful if there are more lines in the text that might trigger and you only want the first one
next a

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

Well, you can get the website text into a string or an array. I prefer arrays, and then one line per array element. Then loop through them and search for some text, then StringTrim that and get the remainder, something like:

for $a = 0 to UBound($lineArray)-1
 if StringLeft($lineArray[$a],11) = "(0018,1090)" then ; <-- or use a variable to search for if necessary
  $result = StringTrimLeft($lineArray[$a],14) ; <-- "(xxxx,xxxx) - " is always 14 chars; if this is variable you can search for a variable and then StringTrimLeft an amount of StringLen("stringtostrip") chars
 endif
 ; if $result <>"" then exitloop ; <-- possibly useful if there are more lines in the text that might trigger and you only want the first one
next a
I tried the code above- and added a basic trigger:

winwaitactive("Title")

then your code

then

msgbox(4096, "Testing", "$result)

$result is always blank, even though (0018,1090) is displayed on the page. I am not familiar with Arrays in autoit. Maybe I need some more code before to define a new array. Or are there includes required for arrays?

thanks again!

Share this post


Link to post
Share on other sites

I tried the code above- and added a basic trigger:

winwaitactive("Title")

then your code

then

msgbox(4096, "Testing", "$result)

$result is always blank, even though (0018,1090) is displayed on the page. I am not familiar with Arrays in autoit. Maybe I need some more code before to define a new array. Or are there includes required for arrays?

thanks again!

You don't need includes for arrays, though you can include Array.au3 to implement some very useful array-related functions.

But, well, did you actually read the text from the webpage into an array? Otherwise you are searching through an empty array and ofcourse result will be empty. If you read the webpage contents into a string instead of an array then either StringSplit() the string into an array, or don't use my code but use for instance StringBetween.


Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

Sorry it took so long to follow up- but it finally works well-

I always like to post my working code for others to see:

here it is:

FYI- web page is loaded by another macro- just an iE page with the title Dicom Dump [ medical imaging stuff ] and a whole bunch of text....

CODE

#include <Array.au3>

dim $array[1] ;clipboard array

dim $outarray[1] ;storage array

$pos = 0

$errorcatch = ""

$studyDate = ""

$Accession = ""

$Modality = ""

$StudyDesc = ""

$PatientName = ""

$PatientID = ""

$PatientDOB = ""

sleep(250) ;give MS Internet explorer enough time to load the page.

$errorcatch = WinActivate("DICOM Dump for", "")

If $errorcatch = 0 Then

MsgBox(4096, "Error", "Dicom Dump window failed to appear")

Else

Send("^a") ;select dicom dump text

Send("^c") ;copy it to clipboard

Send("!{F4}") ;close the window

Sleep(250) ;pause for clipboard to process the new data

$DicomDump = ClipGet() ;store the dicom dump from clipboard into variable.

$array =StringSplit(StringStripCR($DicomDump), @LF) ;split data into single line arrays to parse.

If Not @error Then

$pos = _ArraySearch($array, "(0008, 0020)", 0, 0, 0, true)

Select

Case $Pos = -1

;not found

Case Else

$StudyDate = StringTrimLeft($array[$pos], 34)

_ArrayAdd($outarray, $StudyDate)

EndSelect

$pos = _ArraySearch($array, "(0008, 0050)", 0, 0, 0, true)

Select

Case $Pos = -1

;not found

Case Else

$Accession = StringTrimLeft($array[$pos], 41)

_ArrayAdd($outarray, $Accession)

EndSelect

$pos = _ArraySearch($array, "(0008, 0060)", 0, 0, 0, true)

Select

Case $Pos = -1

;not found

Case Else

$Modality = StringTrimLeft($array[$pos], 32)

_ArrayAdd($outarray, $Modality)

EndSelect

$pos = _ArraySearch($array, "(0008, 1030)", 0, 0, 0, true)

Select

Case $Pos = -1

;not found

Case Else

$StudyDesc = StringTrimLeft($array[$pos], 42)

_ArrayAdd($outarray, $StudyDesc)

EndSelect

$pos = _ArraySearch($array, "(0010, 0010)", 0, 0, 0, true)

Select

Case $Pos = -1

;not found

Case Else

$StudyDate = StringTrimLeft($array[$pos], 39)

_ArrayAdd($outarray, $StudyDate)

EndSelect

$pos = _ArraySearch($array, "(0010, 0020)", 0, 0, 0, true)

Select

Case $Pos = -1

;not found

Case Else

$PatientID = StringTrimLeft($array[$pos], 34)

_ArrayAdd($outarray, $PatientID)

EndSelect

$pos = _ArraySearch($array, "(0010, 0030)", 0, 0, 0, true)

Select

Case $Pos = -1

;not found

Case Else

$PatientDOB = StringTrimLeft($array[$pos], 44)

_ArrayAdd($outarray, $PatientDOB)

EndSelect

EndIf

EndIf

_ArrayDisplay($outArray, "Array") ;display the array, holy cow, it works.

ClipPut("") ;clears the clipboard!

This Forum is AWESOME! :rolleyes: and always supports every 'scripter with the best support. I can't believe this stuff is free. I need to order the mouse pads and show this stuff off!!!

Jeff

You don't need includes for arrays, though you can include Array.au3 to implement some very useful array-related functions.

But, well, did you actually read the text from the webpage into an array? Otherwise you are searching through an empty array and ofcourse result will be empty. If you read the webpage contents into a string instead of an array then either StringSplit() the string into an array, or don't use my code but use for instance StringBetween.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0