Sign in to follow this  
Followers 0
tommytx

StringInString is not doing anything right..

9 posts in this topic

Can someone please scan this code to see what I am doing wrong... the top portion is the same as in the htm file on the web. I just dumped it here so I could explain. I am trying to use autoit to search and find the start and end for extraction but am having no luck... it can't be that hard..

The acutual url listed has the actual live html file if anyone wants to look.. That is the short.htm file.

Bottom line: when I run the below program it does not give correct positions for the items I am searching for..

Each of the below lines have 65 charactrs, so you can see "og:title" will be at 182 not 170 where it claims.
These are actual line in the short.htm file.
********************************************
<h1><center>Testing</h1></center>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
<meta property="og:title" content="Hampton VA Home for Sale" />XX
<meta property="og:type" content="landmark" />XXXXXXXXXXXXXXXXXXX


$oIE = _IECreate ("http://tomstats.info/zzzzz/short.htm")
$sHTML = _IEBodyReadHTML($oIE)
MsgBox(0, "", $sHTML);
$oStart = StringInStr($sHTML, "og:title")
$oStop = StringInStr($sHTML, "/>")
MsgBox(0, "", $oStart & " - " & $oStop )
 
These are some searches I placed in the search above and mostly they did not come out at the right position..
That is the position I count them in and the count given below...
For example "og:title" is in the box above and it claims its at 122 but in the actual short.htm file its at 182.

; og:title = 89
; title = 92
; Testing = 15
; </h1> = 22
; og:url = 220

Share this post


Link to post
Share on other sites



Boy, did you go about that the hard way! ;)

Just collect all the <meta> tags and look for the one you want:

$colMeta = _IETagNameGetCollection($oIE, "meta", -1)
$n = 0
For $oMeta In $colMeta
    ConsoleWrite($n & ": Property Name = " & $oMeta.Name & "; Content = " & $oMeta.Content & @LF)
    $n += 1
Next

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Thanks for the advice above, and it will help me in the future, but I am looking for data all over the place in a huge page.

I just used the Meta area since that was where the variable began..And I am not so great in working all the DOM magic..

I am one of those avid...believers who find the start and finish and extract and it seems the autoit string features does not work well unless you use it in the DOM situation... but I figured out a way to jackleg it and do the string string stuff very well just this after noon... For example.. _StringBetween works great with getting text data, but cannot find all the html stuff... the < html guarding commands > tags. OF course the <> tell the system that it is html so don't mess with it...therefore it does not search HTML very well... so here is the trick (workaround)... for anyone that wants to use it...

We will quickly remove all the <> tags from the file so the string tools think the entire damn file is text and absolutley Zero html so it will work the entire file just wonderfully.

First: Use StringReplace < with a Z or any other unusual character like |

Second: Use StringReplace > with a Z or any other unusual character like &

Now that we have told the hunter (_StringBetween) that the entire file is no longer html... you have free rein to search any freaking thing you like... so extract away... and it now works fantasitc.. and the dual replace commands on the fly only take a second and there is no modification to the original file..

Check it out if you like.. the live link is below...

I know this is horrible programming techniques.. but it works sooooo.... goood....

$oIE = _IECreate ("http://tomstats.info/zzzzz/short1.htm", 1)

$sHTML = _IEBodyReadHTML ($oIE)

$sHTML = StringReplace($sHTML, "<", "Z")

$sHTML = StringReplace($sHTML, ">", "Z")

Local $oDat = _StringBetween($sHTML, 'src="', '"ZZ/DIVZ')

_ArrayDisplay($oDat, 'Default Search')

Thanks for all the help.... I will use the code to help me learn more about DOM.... I know I need it... and my programming technique above will make a few of you throw up... but what the hell it works great...

Thanks again.. PSaltyds

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Just refering to the original post (not as a solution),

IE interprets the html page different than your html code.

If you write back to file what IE read like

$sHTML = _IEBodyReadHTML($oIE)
MsgBox(0, "", $sHTML);
FileWriteLine("ieout.txt",$sHTML)

You will see

<h1><center>Testing</h1></center>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
<meta property="og:title" content="Hampton VA Home for Sale" />XX
<meta property="og:type" content="landmark" />XXXXXXXXXXXXXXXXXXX

<H1>
<CENTER>Testing</H1></CENTER>XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
<META content="Hampton VA Home for Sale" property="og:title">XX
<META content=landmark property="og:type">XXXXXXXXXXXXXXXXXXX

Block 1 your HTML page block 2 what IE thinks it should look like....

And you may not forget that at the end of each line is a "0D0A"=@CRLF, but actually everything is correct, except the "what you may want to see is not what ie provides".

Edited by Tankbuster

Share this post


Link to post
Share on other sites

Thanks a lot PSaltyDS... you piss me off.... I spent all afternoon working on this sucker and you make a simple solution in 30 seconds... your code worked great.. for the meta's so I should hav just used other tags like Input and etc to get all the other things I wanted...

However I am not used to the extractor tools acting finicky when you try to extract any html stuff...

But by faking them out with <> removal they think its all text and work great...

I swear I am gonna learn some more DOM as it would save me a ton of time...

Thanks again...

Share this post


Link to post
Share on other sites

What is the significance of the -1... I thought if you left the end number blank it would then grab all the meta's. Can you explain the purpose of -1? Thanks.

$colMeta = _IETagNameGetCollection($oIE, "meta", -1)

Share this post


Link to post
Share on other sites

Dog ate your helpfile?

[optional] specifies whether to return a collection or indexed instance

0 or positive integer returns an indexed instance

-1 = (Default) returns a collection

;)

Share this post


Link to post
Share on other sites

What is the significance of the -1... I thought if you left the end number blank it would then grab all the meta's. Can you explain the purpose of -1? Thanks.

$colMeta = _IETagNameGetCollection($oIE, "meta", -1)

it does. -1 is the default value, so there is no difference between

$colMeta = _IETagNameGetCollection($oIE, "meta", -1)

and

$colMeta = _IETagNameGetCollection($oIE, "meta")

The first is simply "explicit" and therefore theoretically "better" coding.

Share this post


Link to post
Share on other sites

Thanks a lot PSaltyDS... you piss me off.... I spent all afternoon working on this sucker and you make a simple solution in 30 seconds...

Sorry 'bout that. I'll try to be less helpful next time!

;)

1 person likes this

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0