youtuber Posted November 19, 2017 Posted November 19, 2017 Hi I am able to get rid of the Regex complexity and to be easier, how can I use _StringBetween succession or another? thanks. expandcollapse popup#include <Array.au3> #include <String.au3> $string = _oHTTPGet("https://www.autoitscript.com/site/post-sitemap.xml") $string = StringRegExpReplace($string, '(?s)[\n\r\t\v]', '') $string = StringStripWS($string, 7) $aData = _StringBetween($string, '<loc>','</loc>') For $j = 0 To UBound($aData) - 1 If IsArray($aData) Then ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $aData[$j] & @CRLF) $string2 = _oHTTPGet($aData[$j]) $string2 = StringRegExpReplace($string2, '(?s)[\n\r\t\v]', '') $string2 = StringStripWS($string2, 7) $aPostData = _StringBetween($string2, '</head>','<footer') For $s = 0 To UBound($aPostData) - 1 If IsArray($aPostData) Then $StringBetw2 = _StringBetween($aPostData[$s], '<div class="entry-content">','<span class="synved-social-container');My question is exactly for this line $StringBetw2 = StringRegExpReplace($StringBetw2, "(?is)(<script[^>]+javascript.*?/script>)", "") $StringBetw2 = StringRegExpReplace($StringBetw2, '(?s)<.*?>', "" & @CRLF) $StringBetw2 = StringRegExpReplace($StringBetw2, '( )+', "") ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $StringBetw2 & @CRLF) Else ConsoleWrite("Line : " & @ScriptLineNumber & " : " & " Problem " & @CRLF) EndIf Next EndIf Next Func _oHTTPGet($aUrL) Local $oHTTP = ObjCreate("winhttp.winhttprequest.5.1") $oHTTP.Open("GET", $aUrL, False) $oHTTP.SetRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0") $oHTTP.Send() If @error Then ConsoleWrite("Line : " & @ScriptLineNumber & " Not Connect " & @CRLF) $oHTTP = 0 Return SetError(1) EndIf If $oHTTP.Status = 200 Then Local $sReceived = $oHTTP.ResponseText $oHTTP = Null Return $sReceived EndIf $oHTTP = Null Return -1 EndFunc
Jfish Posted November 19, 2017 Posted November 19, 2017 Have a look at the string that is in $aPostData[$s]. I don't think it contains the substrings you are searching on in your _StringBetween. I put that text to the console, then copied and pasted it into NotePad++ and could not find the start or end substrings. youtuber 1 Build your own poker game with AutoIt: pokerlogic.au3 | Learn To Program Using FREE Tools with AutoIt
youtuber Posted November 19, 2017 Author Posted November 19, 2017 Here is in the source section view-source:https://www.autoitscript.com/site/autoit-news/autoit-v3-3-14-0-released/
mikell Posted November 19, 2017 Posted November 19, 2017 Something like this $StringBetw2 = _StringBetween($aPostData[$s], .....) $StringBetw2 = StringRegExpReplace($StringBetw2, .....) can't work anyway, because StringBetween returns an array youtuber 1
youtuber Posted November 19, 2017 Author Posted November 19, 2017 (edited) I understand thank you @mikell Do you see a problem in my other codes? $string = _oHTTPGet("https://www.autoitscript.com/site/post-sitemap.xml") $string = StringRegExpReplace($string, '(?s)[\n\r\t\v]', '') $string = StringStripWS($string, 7) $aData = _StringBetween($string, '<loc>','</loc>') For $j = 0 To UBound($aData) - 1 If IsArray($aData) Then ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $aData[$j] & @CRLF) $string2 = _oHTTPGet($aData[$j]) $string2 = StringRegExpReplace($string2, '(?s)[\n\r\t\v]', '') $string2 = StringStripWS($string2, 7) $aPostData = _StringBetween($string2, '</head>','<footer') For $s = 0 To UBound($aPostData) - 1 If IsArray($aPostData) Then $StringBetw2 = _StringBetween($aPostData[$s], '<div class="entry-content','<!-- .entry-content -->') For $i = 0 To UBound($StringBetw2) - 1 $StringBetw2 = StringRegExpReplace($StringBetw2[$i], "(?is)(<script[^>]+javascript.*?/script>)", "") $StringBetw2 = StringRegExpReplace($StringBetw2, '(?s)<.*?>', "" & @CRLF) $StringBetw2 = StringRegExpReplace($StringBetw2, '( )+', "") ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $StringBetw2 & @CRLF) Next Else ConsoleWrite("Line : " & @ScriptLineNumber & " : " & " Problem " & @CRLF) EndIf Next EndIf Next Edited November 19, 2017 by youtuber
iamtheky Posted November 19, 2017 Posted November 19, 2017 (edited) Mikell is right, but it's also easy enough to stringify that array, and stringbetween iterates its own damn self, so isnt this the same thing (save for whatever cleaning you were doing for the GET)? For $s = 0 To UBound($aPostData) - 1 If IsArray($aPostData) Then $StringBetw2 = _ArrayToString(_StringBetween($aPostData[$s], '<div class="entry-content">','<span class="synved-social-container'));My question is exactly for this line ;~ $StringBetw2 = StringRegExpReplace($StringBetw2, "(?is)(<script[^>]+javascript.*?/script>)", "") ;~ $StringBetw2 = StringRegExpReplace($StringBetw2, '(?s)<.*?>', "" & @CRLF) ;~ $StringBetw2 = StringRegExpReplace($StringBetw2, '( )+', "") ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $StringBetw2 & @CRLF) Else ConsoleWrite("Line : " & @ScriptLineNumber & " : " & " Problem " & @CRLF) EndIf Next EndIf Next Edited November 19, 2017 by iamtheky put a gd unacceptable apostrophe in 'its' youtuber 1 Reveal hidden contents ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
youtuber Posted November 19, 2017 Author Posted November 19, 2017 (edited) On 11/19/2017 at 6:16 PM, iamtheky said: so isnt this the same thing (save for whatever cleaning you were doing for the GET)? Expand I am cleaning up html tags for me when you disable Reveal hidden contents When enabled Reveal hidden contents And should I use this? expandcollapse popup#include <Array.au3> #include <String.au3> $string = _oHTTPGet("https://www.autoitscript.com/site/post-sitemap.xml") $string = StringRegExpReplace($string, '(?s)[\n\r\t\v]', '') $string = StringStripWS($string, 7) $aData = _StringBetween($string, '<loc>','</loc>') For $j = 0 To UBound($aData) - 1 If IsArray($aData) Then ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $aData[$j] & @CRLF) $string2 = _oHTTPGet($aData[$j]) $string2 = StringRegExpReplace($string2, '(?s)[\n\r\t\v]', '') $string2 = StringStripWS($string2, 7) $aPostData = _StringBetween($string2, '</head>','<footer') For $s = 0 To UBound($aPostData) - 1 If IsArray($aPostData) Then $StringBetw2 = _ArrayToString(_StringBetween($aPostData[$s], '<div class="entry-content','<!-- .entry-content -->'));My question is exactly for this line $StringBetw2 = StringRegExpReplace($StringBetw2, "(?is)(<script[^>]+javascript.*?/script>)", "") $StringBetw2 = StringRegExpReplace($StringBetw2, '(?s)<.*?>', "" & @CRLF) $StringBetw2 = StringRegExpReplace($StringBetw2, '( )+', "") ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $StringBetw2 & @CRLF) Else ConsoleWrite("Line : " & @ScriptLineNumber & " : " & " Problem " & @CRLF) EndIf Next EndIf Next Func _oHTTPGet($aUrL) Local $oHTTP = ObjCreate("winhttp.winhttprequest.5.1") $oHTTP.Open("GET", $aUrL, False) $oHTTP.SetRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0") $oHTTP.Send() If @error Then ConsoleWrite("Line : " & @ScriptLineNumber & " Not Connect " & @CRLF) $oHTTP = 0 Return SetError(1) EndIf If $oHTTP.Status = 200 Then Local $sReceived = $oHTTP.ResponseText $oHTTP = Null Return $sReceived EndIf $oHTTP = Null Return -1 EndFunc Edited November 19, 2017 by youtuber
iamtheky Posted November 19, 2017 Posted November 19, 2017 (edited) nice, combining some of those seems like a sporting next task, prior to nesting all that shit on one line for fun. Edited November 19, 2017 by iamtheky grammar Reveal hidden contents ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
mikell Posted November 19, 2017 Posted November 19, 2017 May I add... it could be a great idea to swap these For $s = 0 To UBound($aPostData) - 1 If IsArray($aPostData) Then because running a For/Next loop through an array works - generally - much better if the concerned array is really an array youtuber 1
youtuber Posted November 19, 2017 Author Posted November 19, 2017 @mikell So is she okay? $string = _oHTTPGet("https://www.autoitscript.com/site/post-sitemap.xml") $string = StringRegExpReplace($string, '(?s)[\n\r\t\v]', '') $string = StringStripWS($string, 7) $aData = _StringBetween($string, '<loc>','</loc>') For $j = 0 To UBound($aData) - 1 If IsArray($aData) Then ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $aData[$j] & @CRLF) $string2 = _oHTTPGet($aData[$j]) $string2 = StringRegExpReplace($string2, '(?s)[\n\r\t\v]', '') $string2 = StringStripWS($string2, 7) $aPostData = _StringBetween($string2, '</head>','<footer') For $s = 0 To UBound($aPostData) - 1 If IsArray($aPostData) Then $StringBetw2 = _StringBetween($aPostData[$s], '<div class="entry-content','<!-- .entry-content -->') For $i = 0 To UBound($StringBetw2) - 1 If IsArray($StringBetw2) Then $StringBetw2 = StringRegExpReplace($StringBetw2[$i], "(?is)(<script[^>]+javascript.*?/script>)", "") $StringBetw2 = StringRegExpReplace($StringBetw2, '(?s)<.*?>', "" & @CRLF) $StringBetw2 = StringRegExpReplace($StringBetw2, '( )+', "") ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $StringBetw2 & @CRLF) EndIf Next Else ConsoleWrite("Line : " & @ScriptLineNumber & " : " & " Problem " & @CRLF) EndIf Next EndIf Next
mikell Posted November 20, 2017 Posted November 20, 2017 Well I meant : check the array before the loop , so you can see the results/errors $string = _oHTTPGet("https://www.autoitscript.com/site/post-sitemap.xml") $string = StringRegExpReplace($string, '(?s)[\n\r\t\v]', '') $string = StringStripWS($string, 7) $aData = _StringBetween($string, '<loc>','</loc>') If IsArray($aData) Then For $j = 0 To UBound($aData) - 1 ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $aData[$j] & @CRLF) $string2 = _oHTTPGet($aData[$j]) $string2 = StringRegExpReplace($string2, '(?s)[\n\r\t\v]', '') $string2 = StringStripWS($string2, 7) $aPostData = _StringBetween($string2, '</head>','<footer') If IsArray($aPostData) Then For $s = 0 To UBound($aPostData) - 1 $StringBetw2 = _StringBetween($aPostData[$s], '<div class="entry-content','<!-- .entry-content -->') If IsArray($StringBetw2) Then For $i = 0 To UBound($StringBetw2) - 1 $StringBetw2 = StringRegExpReplace($StringBetw2[$i], "(?is)(<script[^>]+javascript.*?/script>)", "") $StringBetw2 = StringRegExpReplace($StringBetw2, '(?s)<.*?>', "" & @CRLF) $StringBetw2 = StringRegExpReplace($StringBetw2, '( )+', "") ConsoleWrite("Line : " & @ScriptLineNumber & " : " & $StringBetw2 & @CRLF) Next Else ConsoleWrite("Line : " & @ScriptLineNumber & " : " & " Problem $StringBetw2" & @CRLF) EndIf Next Else ConsoleWrite("Line : " & @ScriptLineNumber & " : " & " Problem $aPostData" & @CRLF) EndIf Next Else ConsoleWrite("Line : " & @ScriptLineNumber & " : " & " Problem $aData" & @CRLF) EndIf youtuber 1
youtuber Posted November 20, 2017 Author Posted November 20, 2017 @mikell Thank you How can I avoid the long vertical gaps that occur? Reveal hidden contents
mikell Posted November 20, 2017 Posted November 20, 2017 Matter of patterns in the SRER, depending on the expected result But gaps are whitespaces so StringStripWS with the adequate flag could do the job
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now