DeltaRocked Posted November 2, 2012 Share Posted November 2, 2012 (edited) Hi, Type String 1 : <META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh> Type String 2 : <META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com"> Present Regex: (?i)(?:<[\s*]{0,1}meta http-equiv[\s*]{0,1}=[\s*]{0,1}["']{0,1}refresh["']{0,1}[^>]*)content[\s*]?=[\s*]?\"(.*?)\" This regex extract "Content" from string 2 but fails while processing String 1. Is there any way , by what of which I can have 1 regex which will extract "Content" from both the strings? regards Edited November 2, 2012 by DeltaRocked Link to comment Share on other sites More sharing options...
Chance Posted November 2, 2012 Share Posted November 2, 2012 (edited) pffff #include <Array.au3> Global $Test[2] $Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' $Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com">' For $I = 0 To 1 $Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=""(.*?)"".*?>", 3) _ArrayDisplay($Result) Next In case its the url ur after #include <Array.au3> Global $Test[2] $Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' $Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' For $I = 0 To 1 $Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=[""'](?:d{0,3};s?url=)(.*?)[""'].*?>", 3) _ArrayDisplay($Result) Next Edited November 2, 2012 by FlutterShy Link to comment Share on other sites More sharing options...
blityon Posted November 2, 2012 Share Posted November 2, 2012 (edited) $arrayDatos = StringRegExp($datos, '<META .*? CONTENT="(.*?)".*?>', 3) Edited November 2, 2012 by blityon Link to comment Share on other sites More sharing options...
Robjong Posted November 2, 2012 Share Posted November 2, 2012 Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance.#include <Array.au3> Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _ '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF $aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content $aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only _ArrayDisplay($aResult) Link to comment Share on other sites More sharing options...
Chance Posted November 2, 2012 Share Posted November 2, 2012 Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance. #include <Array.au3> Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _ '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF $aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content $aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only _ArrayDisplay($aResult) Note taken, just opened my eyes to some flaws in a project of mine, thanks for the contribution! Link to comment Share on other sites More sharing options...
guinness Posted November 2, 2012 Share Posted November 2, 2012 Isn't using ? next to * pointless? So .*? is the same as .* but not .+?. UDF List:  _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
Robjong Posted November 2, 2012 Share Posted November 2, 2012 No. .* will match 0 or more occurrences of any character (except for newline, assuming single line mode), .+ would match 1 or more occurrences of any character, they would consume the largest possible match, this is called greedy. But in .*? the lazy operator (?) will tell it to return the smallest possible match, which would match nothing or 1 character. Because the pattern ".*?" starts and ends with a quote the engine will look for 0 or more character between 2 quotes, this would match against "", as well as "1". If we had the pattern ".+?" and matched it against the subject "" it would fail, because it needs at least 1 character, so it would match "1". Example: $sSubject = 'A string with a "quoted part", and a separate " floating in there.' ; Greedy $aResult = StringRegExp($sSubject, '".*"', 3) ; matches : "quoted part", and a separate " _ArrayDisplay($aResult, "Greedy") ; Lazy $aResult = StringRegExp($sSubject, '".*?"', 3) ; matches: "quoted part" _ArrayDisplay($aResult, "Lazy") Link to comment Share on other sites More sharing options...
guinness Posted November 2, 2012 Share Posted November 2, 2012 I was having a dumb moment then. Thanks Robjong. UDF List:  _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
DeltaRocked Posted November 2, 2012 Author Share Posted November 2, 2012 wow. learn't a lot from these posts. thank you everyone. now need to test this regex in real world pages - which incidentally contain phishing / malwares / clean too . regards Link to comment Share on other sites More sharing options...
Chance Posted November 5, 2012 Share Posted November 5, 2012 So, what's it you're doing? Link to comment Share on other sites More sharing options...
DeltaRocked Posted November 5, 2012 Author Share Posted November 5, 2012 malware research. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now