IAMK Posted September 21, 2018 Posted September 21, 2018 (edited) I am using the following code Local $pageSource = _IEDocReadHTML($ie) ;THIS THING IS HUUUUUUUUUUUUUUUUGE (~300k chars) From the above, I need to search for about 50 occurrences of <h3 class='heading_title'> <a href="GYBERISH_ALPHANUMERIC_HERE"></a> When I find <h3 class='heading_title'>, I want to strip the next line with ". That would give me an array of size 3, with the 2nd element being what I need. However, upon doing the above, I wish to execute other code, then CONTINUE from the place I found the previous <h3 class='heading_title'>, until I find the next <h3 class='heading_title'>. The way I do it is split $pageSource by <h3 class='heading_title'>, but doing that on the entire source feels extremely bad. $sourceParts = StringSplit($pageSource, "<h3 class='heading_title'>", 1) Also, if the CONTINUE thing I mentioned above can be done nicely, I could then make the script faster by removing Local $pageSource = and just feeding in the source directly from _IEDocReadHTML(). Note: $pageSource will be created and parsed 100+ times, so this script takes QUITE a long time for me to execute. Question: What is fastest way to get the GYBERISH_ALPHANUMERI_HERE string, execute code, then continue to find the next GYBERISH_ALPHANUMERI_HERE from the same string? Edited September 21, 2018 by IAMK
FrancescoDiMuro Posted September 21, 2018 Posted September 21, 2018 (edited) @IAMK I don't know if it could be a possible solution, but you could read ONLY all the <a> elements from that web-page, and loop through them. To do so, use _IETagNameGetCollection(). I think it could be faster to do your comparsion Edited September 21, 2018 by FrancescoDiMuro Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
Subz Posted September 21, 2018 Posted September 21, 2018 Didn't really understand the OP but you might be able to use something like this: #include <Array.au3> #include <IE.au3> Local $aHREF[1] Local $oIE = _IECreate() Local $oH3Tags = _IETagNameGetCollection($oIE, "H3") For $oH3Tag In $oH3Tags If $oH3Tag.ClassName = "heading_title" Then $oLinks = _IETagNameGetCollection($oH3Tag, "a") For $oLink In $oLinks _ArrayAdd($aHREF, $oLink.href) Next EndIf Next $aHREF[0] = UBound($aHREF) - 1 _ArrayDisplay($aHREF)
IAMK Posted September 21, 2018 Author Posted September 21, 2018 (edited) @FrancescoDiMuro Hmm, I thought that would also take some time, but @Subz solution seems fast enough (for now). I will need to play around with it some more. To be honest, from the look of the code, I expected it to be much slower for some reason. Thank you. Also, @FrancescoDiMuro There are TOO many <a> tags. I think the searching of h3 before searching a works better. Is there an inbuilt way to say I want to get the 1st, skip 2nd, skip 3rd "a" tag in the For ... In ... feature? Or should I just set up a counter variable + if statement? Edited September 21, 2018 by IAMK
FrancescoDiMuro Posted September 21, 2018 Posted September 21, 2018 31 minutes ago, IAMK said: Or should I just set up a counter variable + if statement? Check the @extended code ( success ) in the Return Value section of _IETagNameGetCollection: Success: an object variable containing the specified Tag collection, @extended = specified Tag count. Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
Moderators JLogan3o13 Posted September 21, 2018 Moderators Posted September 21, 2018 (edited) 44 minutes ago, IAMK said: Is there an inbuilt way to say I want to get the 1st, skip 2nd, skip 3rd "a" tag in the For ... In ... feature? Or should I just set up a counter variable + if statement? Look in the help file for For Loop and the Step parameter. This can be modified for a For..In loop Edited September 21, 2018 by JLogan3o13 "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum!
IAMK Posted September 22, 2018 Author Posted September 22, 2018 @JLogan3o13 Step was the first thing I tried, but I couldn't get it to work with For...In. How do you modify it? @FrancescoDiMuro I tried looking up how to use @extended, but I couldn't find examples. I don't have to modify the _IETagNameGetCollection function itself, do I?
Subz Posted September 22, 2018 Posted September 22, 2018 (edited) Step Example: For $i = 0 To 100 Step 10 ConsoleWrite($i & @CRLF) Next For $i = 100 To 0 Step -20 ConsoleWrite($i & @CRLF) Next @extended example Local $oH3Tags = _IETagNameGetCollection($oIE, "H3") Local $iCount = @extended For $i = 0 To @$iCount Step 2 ... Edited September 22, 2018 by Subz Fixed @extended example
IAMK Posted September 22, 2018 Author Posted September 22, 2018 (edited) @Subz That is For...To. I am trying to use For...In (_IETagNameGetCollection). For now, I have just For...In'd the entire collection then I do For...To...Step. I will try playing with @extended. Thanks Edited September 22, 2018 by IAMK
IAMK Posted September 22, 2018 Author Posted September 22, 2018 (edited) I have another question. This is about _IEGetProperty(). I have the following source: <div id='body_show_ori'> 僕は今日、仕事に行かなくて、一日中休みました。体は、まだ少し痛いですが、昨日より耐え得ります。4時に、家からスカイプで会議に参加しました。明日、仕事に戻ります。この動画をもう投稿したことがありますが、とても好きから、また投稿します。浅紫の浴衣は、綺麗過ぎます!また、水色の浴衣は、中国っぽいと思います。<br/><object width="560" height="315"> <param name="movie" value="https://www.youtube.com/v/Q2E7TLotcko"></param> <embed src="https://www.youtube.com/v/Q2E7TLotcko" type="application/x-shockwave-flash" width="560" height="315"></embed> </object> <br/>また、その浅紫の浴衣の子は、あるゲームの大好きなキャラに似ています。<br/><a href="https://imgur.com/u1WzDCL" target="_blank">https://imgur.com/u1WzDCL</a> </div> If I get innertext, then I get the writing which is not inside any html, and if I get innerhtml, then I get the writing + the html for the YouTube video and IMGur link. However, what I want is: 僕は今日、仕事に行かなくて、一日中休みました。体は、まだ少し痛いですが、昨日より耐え得ります。4時に、家からスカイプで会議に参加しました。明日、仕事に戻ります。この動画をもう投稿したことがありますが、とても好きから、また投稿します。浅紫の浴衣は、綺麗過ぎます!また、水色の浴衣は、中国っぽいと思います。 https://www.youtube.com/v/Q2E7TLotcko また、その浅紫の浴衣の子は、あるゲームの大好きなキャラに似ています。 https://imgur.com/u1WzDCL I can get the text with innertext, and the links by getting the value and href tags, but I can't get them in a way which would keep all the original ordering like the snippet above. How would I go about doing that? Note: There can be multiple videos and images in the source, in any order. Edited September 22, 2018 by IAMK
Moderators JLogan3o13 Posted September 22, 2018 Moderators Posted September 22, 2018 @IAMK try something like this: For $a = 0 To Ubound($aArray) - 1 Step 3 ... Next "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum!
IAMK Posted September 22, 2018 Author Posted September 22, 2018 @JLogan3o13 I have absolutely no idea why, but it turns out I don't need to step. Simply having a while loop 1/3rd of the list does the job... E.g. While($linkArray[0] < 20) Local $oH3Tags = _IETagNameGetCollection($ie, "H3") For $oH3Tag In $oH3Tags If($oH3Tag.ClassName = "heading_title") Then $oLinks = _IETagNameGetCollection($oH3Tag, "a") For $oLink In $oLinks _ArrayAdd($linkArray, $oLink.href) Next EndIf Next $linkArray[0] = UBound($linkArray) - 1 WEnd Without the outside loop, the 20 elements in the array become separated with 2 elements between each of them. It's black magic.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now