b47chguru Posted December 2, 2012 Share Posted December 2, 2012 hi..i am working on a script which extracts a webpage source and gets the table elements..but my regex isn't working properly..i want to extract the content between <table and </table> from the source..code: $file = FileOpen("tyu.txt") $file_content = FileRead($file) FileClose($file) $table = StringRegExp($file_content, "(?s)<table((?s).*?)</table>",3) _ArrayDisplay($table)and this is the text file : http://www.comfaca.com/aiyo.txt but the regex is working perfectly with the StringRegExpGui udf..Thanks in Advance. Link to comment Share on other sites More sharing options...
Chance Posted December 2, 2012 Share Posted December 2, 2012 (edited) You want to extract proxies from hidemyass using regular expression? I've tried that, you're not going to get very far.If that is what you want to do, this gets incredibly difficult and if you can't manage to simply extract a table from html, I don't see you anywhere in the near future succeeding with this. If it's not the proxies you're after, I couldn't imagine why you would want to extract the proxy table from their website.Also, there are far simpler ways to make a successful "scrapper". Hidemyass looks pretty but it doesn't have all the proxies.P.S. They implement about 7 or 8 different obfuscation methods to show their proxies, they do this because they obviously don't like scrappers taking their stuff, so on the user side, the proxies show up looking nice, but the html code is a gigantic confusing maze of html and CSS trickery that only render the right numbers and it's 10^100x simpler to just do it another way. This question is more suitable for a place like hackforums.net. They have working backdoored scrappers everywhere.Nevermind, I guess when I tried before, it was when I knew less AutoIt a few months ago, this seems to work. Edited December 3, 2012 by FlutterShy Link to comment Share on other sites More sharing options...
b47chguru Posted December 3, 2012 Author Share Posted December 3, 2012 You want to extract proxies from hidemyass using regular expression? I've tried that, you're not going to get very far.If that is what you want to do, this gets incredibly difficult and if you can't manage to simply extract a table from html, I don't see you anywhere in the near future succeeding with this. If it's not the proxies you're after, I couldn't imagine why you would want to extract the proxy table from their website.Also, there are far simpler ways to make a successful "scrapper". Hidemyass looks pretty but it doesn't have all the proxies.P.S. They implement about 7 or 8 different obfuscation methods to show their proxies, they do this because they obviously don't like scrappers taking their stuff, so on the user side, the proxies show up looking nice, but the html code is a gigantic confusing maze of html and CSS trickery that only render the right numbers and it's 10^100x simpler to just do it another way. This question is more suitable for a place like hackforums.net. They have working backdoored scrappers everywhere.Thanks for your reply,i have already succeeded in deobfuscating it.. i dont understand why you said i should post this question in some hackforum when my question is related to autoit regex..it would be of much help if anyone could provide me with a solution to this regex problem Link to comment Share on other sites More sharing options...
Chance Posted December 3, 2012 Share Posted December 3, 2012 (edited) i dont understand why you said i should post this question in some hackforum when my question is related to autoit regex.. Whatever regular expression that can be used in AutoIt can be used in most other languages. i have already succeeded in deobfuscating it.. I think you misunderstood me. But anyway, what the heck... it would be of much help if anyone could provide me with a solution to this regex problem $table = StringRegExp("<table HURRR DURRRF x3>YAY :D</table>", "(?s)(?i)<table(?i:[^>].*|)>(.*?)</table>",3) ConsoleWrite($table[0] & @CR) Edited December 3, 2012 by FlutterShy Link to comment Share on other sites More sharing options...
b47chguru Posted December 3, 2012 Author Share Posted December 3, 2012 Whatever regular expression that can be used in AutoIt can be used in most other languages. I think you misunderstood me. But anyway, what the heck... $table = StringRegExp("<table HURRR DURRRF x3>YAY :D</table>", "(?s)(?i)<table(?i:[^>].+?|)>(.*?)</table>",3) ConsoleWrite($table[0] & @CR) your regex doesnt work.. try it with the file link Link to comment Share on other sites More sharing options...
kylomas Posted December 3, 2012 Share Posted December 3, 2012 (edited) b47chguru, Here is what you are asking for, although it is unlikely that it is what you want. (?m)<table[.sS]*</table note: rookie with regexp kylomas edit: Opps forgot to add grouping try this pattern (?m)(?:<table)([.sS]*)(?:</table) Edited December 3, 2012 by kylomas Forum Rules        Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
Chance Posted December 3, 2012 Share Posted December 3, 2012 b47chguru, Here is what you are asking for, although it is unlikely that it is what you want. note: rookie with regexp kylomas hmm, I guess I should have tried it on the file first. I wondering though <td class=" jbjnbjnbjnbj Mines only returning that, but why? It "looks" like it should have worked.. Link to comment Share on other sites More sharing options...
kylomas Posted December 3, 2012 Share Posted December 3, 2012 (edited) @Fluttershy - maybe because you are not in multiline mode, don't now for sure, got a headache from thinking about regexp for 2 minskylomas@b47chguru - don't assume 1 match, iterate through the results array.edit: not true s includes the EOL chars Edited December 3, 2012 by kylomas Forum Rules        Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
Chance Posted December 3, 2012 Share Posted December 3, 2012 wut, I thought you always had to specify a capturing group within the regexp, I guess you learn stuff everyday... $table = StringRegExp(FileRead(".file.txt"), "(?s)(?i)<table(?i:[^>].*|)>.*</table>",3) ConsoleWrite($table[0] & @CR) Link to comment Share on other sites More sharing options...
kylomas Posted December 3, 2012 Share Posted December 3, 2012 @Fluttershy - see my addendum, the first pattern was wrong!!! Forum Rules        Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
b47chguru Posted December 3, 2012 Author Share Posted December 3, 2012 @kylomas the regex still doesnt work with the file.. but the funny thing is that regex works with the StringRegExpGui udf.. Link to comment Share on other sites More sharing options...
kylomas Posted December 3, 2012 Share Posted December 3, 2012 b47chguru, Define "still does'nt work". I copied your file to a regexp tester and it worked fine. Are you sure that you saw my update, the 1st pattern was incorrect. kylomas Forum Rules        Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
b47chguru Posted December 3, 2012 Author Share Posted December 3, 2012 @kylomas sorry, my bad... my first regex itself was working, the problem is that _arraydisplay doesnt display it anyways thanks for helping out! Link to comment Share on other sites More sharing options...
Chance Posted December 3, 2012 Share Posted December 3, 2012 (edited) @kylomassorry, my bad... my first regex itself was working, the problem is that _arraydisplay doesnt display itanyways thanks for helping out! yeah, you're right, but you should be using "(?s)<table(.*?)</table>", since you already defined (?s) once, you don't have to do it again fyi. lol, wow, I just did some testing, no, this is not correct.Also, you're trying to extract the proxies am I correct? Why not use tbody instead of table? Edited December 3, 2012 by FlutterShy Link to comment Share on other sites More sharing options...
kylomas Posted December 3, 2012 Share Posted December 3, 2012 b47chguru,Sure about that? Why would _arraydisplay not display an array, given that the regexp returned one. You can interrogate @error following a regexp call to make sure of the result. Flag = 3 or 4 : @Error Meaning 0 Array is valid. 1 Array is invalid. No matches. 2 Bad pattern, array is invalid. @Extended = offset of error in pattern. kylomas Forum Rules        Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
kylomas Posted December 3, 2012 Share Posted December 3, 2012 (edited) Or grab everything that is not html like this stringregexp($str,'>([^<].*?)<',3) Edited December 3, 2012 by kylomas Forum Rules        Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
b47chguru Posted December 3, 2012 Author Share Posted December 3, 2012 b47chguru, Sure about that? Why would _arraydisplay not display an array, given that the regexp returned one. You can interrogate @error following a regexp call to make sure of the result. kylomas yes, _arrayDisplay function doesnt display the array element in this case $table[0] Link to comment Share on other sites More sharing options...
kylomas Posted December 3, 2012 Share Posted December 3, 2012 That is my point, do NOT assume just one match, iterate through the array. Forum Rules        Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
b47chguru Posted December 3, 2012 Author Share Posted December 3, 2012 @FlutterShy yes , i am trying to extract proxies.. i have made a partial css interpreter to decode it into the ip and it works..the only problem which i had was regarding this regex.. @kylomas yes, thankyou very much for helping in solving this problem. Link to comment Share on other sites More sharing options...
Chance Posted December 3, 2012 Share Posted December 3, 2012 (edited) Ok, I wanted to test my self for some reason and felt an urge to do it. Especially after asking you how you did it to deobfuscate your stuff and repliad by saying you'd post it sometime in the exmples script, people who usually do this rarely actually do. So here it is, this will recover every single proxy by parsing the html and css. expandcollapse popup#AutoIt3Wrapper_AU3Check_Parameters=-d -w 1 -w 2 -w 3 -w- 4 -w 5 -w 6 -w- 7 ; #FUNCTION# ==================================================================================================================== ; Name ..........: _UnHideMyAss ; Description ...: Recovers proxies from hidemyass.com ; Syntax ........: _UnHideMyAss($HTML) ; Parameters ....: $HTML - HTML web source. ; Return values .: A string white space delimetered list ; Author ........: FlutterShy ; Modified ......: ; Remarks .......: will most liekly stop working after a month from this post ; Example .......: No ; =============================================================================================================================== Func _UnHideMyAss($HTML) Local $tables = StringRegExp($HTML, "(?s)(?i)<tbody.*>.*</tbody>",3); extract entire tabel If @error Then Return SetError(1, 0, 0) Local $aBody = StringRegExp($tables[0], "(?s)<tr(?i:[^>].*)>.*</tr>",3); get the smaller table If @error Then Return SetError(2, 0, 0) Local $Fields = StringRegExp($aBody[0], "<tr(?i:[^>].*)>((?s).*?)</tr>",3); seperate the groups of entries If @error Then Return SetError(3, 0, 0) Local $Step Local $Out Local $Styles Local $TempStyles For $o = 0 To UBound($Fields) -1 $HTML = StringReplace(StringReplace($Fields[$o], @CR,""), @LF, ""); remove all line break things $aBody = StringRegExp($HTML, "<style>(.*?)</style>",3); extract CSS styles If @error Then Return SetError(4, $o, 0) $Styles = StringRegExp($aBody[0], ".(.*?){display:(.+?)}",3); get css values If @error Then ContinueLoop $TempStyles = $Styles ReDim $Styles[200][2] $Step = 0 For $I = 0 to (UBound($TempStyles)-1) Step 2; load styles to array $Styles[$Step][0] = $TempStyles[$I] $Styles[$Step][1] = $TempStyles[$I+1] $Step += 1 Next ReDim $Styles[$Step][2] $aBody = StringRegExp($HTML, '(?s)</style>.*?<span class="country">',3); get the actual obfuscated proxy content If @error Then ContinueLoop $aBody[0] = StringRegExpReplace($aBody[0], '<spansstyle="display:s?none">.*?</span>', ""); remove the ones that will not show up $aBody[0] = StringRegExpReplace($aBody[0], '<divsstyle="display:s?none">.*?</div>', ""); seperate regexp to avoid confusion For $I = 0 To UBound($Styles) - 1 If $Styles[$I][1] == "none" Then _; remove the CSS styles none displayed entities or whatever $aBody[0] = StringRegExpReplace($aBody[0], '<(?:span|div)sclass="'&$Styles[$I][0]&'">.*?</span>', "") Next $aBody[0] = StringRegExpReplace($aBody[0], '<(?:span|div).*?>([.d]*?)</(?:span|div)>', "$1"); remove dummy tags $aBody[0] = StringRegExpReplace(StringStripWS($aBody[0], 8), '<td>([^>]d+)</td>', ":$1"); set port $aBody[0] = StringRegExpReplace($aBody[0], '<[^<]*>', ""); remove everything else now $Out &= $aBody[0]&" " Next Return SetError(0, 0, $Out) EndFunc Global $Result = _UnHideMyAss('$Source') Global $Discriminate = "312[8-9]|28134|54321|45612|443|1d{2,3}|9d{3}|8d{1,3}" Global $aResult = StringRegExp($Result, "((?:25[0-5]|2[0-4]d|1?[1-9]d?|1dd).(?:(?:25[0-5]|2[0-4]d|1?[1-9]d?|1dd|0).){2}(?:25[0-5]|2[0-4]d|1?[1-9]d?|1dd|0):(?:"&$Discriminate&"))", 3) Global $Match = 0 Global $OutPut $Result = StringSplit($Result, " ", 2) For $A = 0 To UBound($Result)-1 For $I = 0 To UBound($aResult)-1 If ($Result[$A]==$aResult[$I]) Then $Match = 1 Next Switch $Match Case 0 $Output &= $Result[$A] & @CRLF Case 1 $Output &= $Result[$A] & @TAB & " HURR DUURRFF :D" & @CRLF EndSwitch $Match = 0 Next ConsoleWrite($Output) OMG THIS SITE! It stopped working already very easy to fix though, I'll let anyone who's interested figure it out. Edited December 3, 2012 by FlutterShy Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now