footswitch Posted September 21, 2010 Share Posted September 21, 2010 Hello there, I need to get all the "function('random_string')" INSIDE the <script></script> tags and WITHOUT the "{" character. So I'm trying to accomplish something in between of these two: #include <array.au3> $html="{function('00')--<script--{function('11')--function('22')--function('33')--</script>--<script--{function('44')--function('55')--function('66')--</script>--function('77')" $array1=StringRegExp ($html, "(?s)(?i)<script.+function\('(.*?)'\).+</SCRIPT>",3) If @error==1 Then ConsoleWrite("-> No matches for first RegExp!"&@CRLF) ; RegExp explained: ; sets flag "(?s)", which means: "." matches any character including newline ; sets flag "(?i)", which means: case insensitive ; looks for "<script", then a bunch of text must be present - ".+" - until it finds the last occurrence of "function('random string here')" - why only the last occurrence? ; then another bunch of text must be present and finally "</SCRIPT>" needs to appear after it all _ArrayDisplay($array1) $array2=StringRegExp ($html, "(?s)(?i)[^\{]function\('(.*?)'\)",3) If @error==1 Then ConsoleWrite("-> No matches for second RegExp!"&@CRLF) ; RegExp explained: ; find every occurrence of "function('random string here')" that doesn't have a preceding "{" _ArrayDisplay($array2) I'm not quite the RegExp Guru Any thoughts? Thanks, footswitch Link to comment Share on other sites More sharing options...
GMK Posted September 21, 2010 Share Posted September 21, 2010 I'm still not entirely sure what you're after here. In this case, are you just after functions 11 through 66 (omitting functions 00 and 77) or are you after all functions?In your StringRegExp, you'll want a "?" after "[^\{]", as your bracket does not always appear, at least in your $html string as you have it now.Also, you'll probably want to check out the String Regular Expression Tester. Link to comment Share on other sites More sharing options...
Ascend4nt Posted September 21, 2010 Share Posted September 21, 2010 You could accomplish this in two steps - the first would be to isolate the portions that have surrounding <script </script> tags. (btw, why isn't there a '>' before the </script> ?) Then you would work through each array element. Another way is to identify the text that should come before the '--function' part of a statement. In the above scenario, you could do something like this (assuming 'script' or ')' comes before the dashes): $aMatches=StringRegExp($html,"(?:\)|script)--\{?function.'(\d+)'",3) My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
Mison Posted September 22, 2010 Share Posted September 22, 2010 "\w+\('\d+'\)" Hi ;) Link to comment Share on other sites More sharing options...
GEOSoft Posted September 22, 2010 Share Posted September 22, 2010 I'm pretty sure that you are not giving us an actual string to work with here and that makes it somewhat difficult to give you a proper answer. Post some of the actual html code and tell us what you want for a result. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
footswitch Posted September 22, 2010 Author Share Posted September 22, 2010 (edited) Thank you for your feedback. I eventually ended up doing this with two steps last night. Not that it came to me at first. Sorry for not posting on time. Anyway, would be interesting to see a way of doing this in just one step, if such thing is possible. The test string provided is all we need. I need all the functions() inside <script ... </script>, as long as they don't have a preceding "{". So, in this case, my output should be: function('22') function('33') function('55') function('66') @Ascend4nt, the lack of the ">" is just my natural lazyness, because of the many possible scenarios like <script>, <script javascript>, <script something>... @GMK, I believe I do NOT need a "?" after "[^\{]", because I actually want to exclude these scenarios. These are the lines I'm currently using: $array1=StringRegExp ($html, "(?s)(?i)<script(.+?)</SCRIPT>",3) ; next we combine the matches all together: $string="" For $i=0 To UBound($array1)-1 $string&=$array1[$i] Next $array2=StringRegExp ($string, "(?s)(?i)[^\{]function\('(.*?)'\)",3) Edited September 22, 2010 by footswitch Link to comment Share on other sites More sharing options...
Mison Posted September 22, 2010 Share Posted September 22, 2010 If the line always contains 3 function('num') in between the <script> -- </script> then this will works... <script--{(\w+\('\d+'\))--(\w+\('\d+'\))--(\w+\('\d+'\))--.*?> Hi ;) Link to comment Share on other sites More sharing options...
footswitch Posted September 22, 2010 Author Share Posted September 22, 2010 (edited) Okay, I get your point. I never know what I can find inside a <script> tag. In my case, it contains several lines of code. Among that code, function('random_text') might appear, often more than once. I believe this would be a good test string: $html="<html>(...)"&@CRLF& _ " this is a function('that i dont want in the output because its outside of the script tags');"&@CRLF& _ "<script javascript>"&@CRLF& _ "few lines of code here"&@CRLF& _ "few lines of code here"&@CRLF& _ "more code and then function('FIRST TAG testing one enter"&@CRLF& _ "two and thr33.'); and etc."&@CRLF& _ "this will continue through the ages"&@CRLF& _ "and possibly there's a second function('FIRST TAG with mor3 alphanumeric chars here')"&@CRLF& _ "; and now here it is a {function('i dont want to catch this one because it starts with {');}"&@CRLF& _ "few lines of code here"&@CRLF& _ "few lines of code here"&@CRLF& _ "and finally</script>(...)"&@CRLF& _ "remember that this {function('also cant be present in the output because its outside of the script tags');(...)</html>"&@CRLF& _ "<html>(...)"&@CRLF& _ " this is a function('that i dont want in the output because its outside of the script tags');"&@CRLF& _ "<script javascript>"&@CRLF& _ "few lines of code here"&@CRLF& _ "few lines of code here"&@CRLF& _ "more code and then function('SECOND TAG testing one enter"&@CRLF& _ "two and thr33.'); and etc."&@CRLF& _ "this will continue through the ages"&@CRLF& _ "and possibly there's a second function('SECOND TAG with mor3 alphanumeric chars here')"&@CRLF& _ "; and now here it is a {function('i dont want to catch this one because it starts with {');}"&@CRLF& _ "few lines of code here"&@CRLF& _ "few lines of code here"&@CRLF& _ "and finally</script>(...)"&@CRLF& _ "remember that this {function('also cant be present in the output because its outside of the script tags');(...)</html>" From this, i only want this output: (array) 0|FIRST TAG testing one entertwo and thr33. 1|FIRST TAG with mor3 alphanumeric chars here 2|SECOND TAG testing one entertwo and thr33. 3|SECOND TAG with mor3 alphanumeric chars here The script that I posted earlier today (one RegExp over another) performs this operation successfully: 1. Get everything inside <script> tags 2. Get everything inside function(''), as long as function doesn't have a preceding { Edited September 22, 2010 by footswitch Link to comment Share on other sites More sharing options...
Mison Posted September 23, 2010 Share Posted September 23, 2010 I prefer your way too. Perhaps, somebody could write a proper UDF for nested pattern matching. Hi ;) Link to comment Share on other sites More sharing options...
footswitch Posted September 23, 2010 Author Share Posted September 23, 2010 (edited) Yeah, like _StringRegExpNested ( ByRef $aPatterns ) ; with a virtually unlimited number of nested RegExps Just to think about the combinations of Flags, Return values and Error values... what a mess it would be EDIT: typo Edited September 23, 2010 by footswitch Link to comment Share on other sites More sharing options...
GEOSoft Posted September 23, 2010 Share Posted September 23, 2010 The whole concept makes one shudder. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Mat Posted September 23, 2010 Share Posted September 23, 2010 Hang on a sec. Why don't you do this: http://www.autoitscript.com/forum/index.php?showtopic=119220&view=findpost&p=828731 Similar theory, you have opening brackets and closing brackets. Doing it on html would be more complicated, but if you keep going through to the next tag it's possible. And what you should remember is that regex is NOT magic. It's loops in strings. AutoIt Project Listing Link to comment Share on other sites More sharing options...
footswitch Posted September 23, 2010 Author Share Posted September 23, 2010 (edited) @Mat,I like what you did. But then there's the whole HTML enchilada: lots of conditional arguments which I believe would really mess up the code.Nested StringRegExps are precise, easy to understand, easy to tune-up and acceptably efficient.Fighting for the best way of reinventing the wheel, are we? EDIT: typo Edited September 23, 2010 by footswitch Link to comment Share on other sites More sharing options...
GEOSoft Posted September 23, 2010 Share Posted September 23, 2010 No matter how many times you reinvent it, there is still going to be a flat spot someplace. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Mat Posted September 23, 2010 Share Posted September 23, 2010 All I'm trying to point out is that although regex looks neat in AutoIt. It's just loops in strings, so if you were a computer and could think, you would think it was a mess. But computers aren't prone to expressing their opinions, unless they are told to, so you get away with it. You are right though, my method would get messy when you start to deal with strings to open and close rather than single characters, not to mention a host of other factors. AutoIt Project Listing Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now