Jump to content
Sign in to follow this  
DeltaRocked

[Solved] Regex Help

Recommended Posts

DeltaRocked

Hi,

Type String 1 :

<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>

Type String 2 :

<META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com">

Present Regex:

(?i)(?:<[\s*]{0,1}meta http-equiv[\s*]{0,1}=[\s*]{0,1}["']{0,1}refresh["']{0,1}[^>]*)content[\s*]?=[\s*]?\"(.*?)\"

This regex extract "Content" from string 2 but fails while processing String 1. Is there any way , by what of which I can have 1 regex which will extract "Content" from both the strings?

regards

Edited by DeltaRocked

Share this post


Link to post
Share on other sites
Chance

pffff

#include <Array.au3>
Global $Test[2]
$Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>'
$Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com">'

For $I = 0 To 1
$Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=""(.*?)"".*?>", 3)
_ArrayDisplay($Result)
Next

In case its the url ur after

#include <Array.au3>
Global $Test[2]
$Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>'
$Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">'

For $I = 0 To 1
$Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=[""'](?:d{0,3};s?url=)(.*?)[""'].*?>", 3)
_ArrayDisplay($Result)
Next
Edited by FlutterShy

Share this post


Link to post
Share on other sites
blityon

$arrayDatos = StringRegExp($datos, '<META .*? CONTENT="(.*?)".*?>', 3)

Edited by blityon

Share this post


Link to post
Share on other sites
Robjong

Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance.

#include <Array.au3>


Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _
        '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF


$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content
$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only

_ArrayDisplay($aResult)

Share this post


Link to post
Share on other sites
Chance

Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance.

#include <Array.au3>


Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _
'<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF


$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content
$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only

_ArrayDisplay($aResult)

Note taken, just opened my eyes to some flaws in a project of mine, thanks for the contribution!

Share this post


Link to post
Share on other sites
guinness

Isn't using ? next to * pointless? So .*? is the same as .* but not .+?.


UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Share this post


Link to post
Share on other sites
Robjong

No.

.* will match 0 or more occurrences of any character (except for newline, assuming single line mode),

.+ would match 1 or more occurrences of any character,

they would consume the largest possible match, this is called greedy.

But in .*? the lazy operator (?) will tell it to return the smallest possible match, which would match nothing or 1 character.

Because the pattern ".*?" starts and ends with a quote the engine will look for 0 or more character between 2 quotes, this would match against "", as well as "1".

If we had the pattern ".+?" and matched it against the subject "" it would fail, because it needs at least 1 character, so it would match "1".

Example:

$sSubject = 'A string with a "quoted part", and a separate " floating in there.'

; Greedy
$aResult = StringRegExp($sSubject, '".*"', 3) ; matches : "quoted part", and a separate "
_ArrayDisplay($aResult, "Greedy")

; Lazy
$aResult = StringRegExp($sSubject, '".*?"', 3) ; matches: "quoted part"
_ArrayDisplay($aResult, "Lazy")

Share this post


Link to post
Share on other sites
guinness

I was having a dumb moment then. Thanks Robjong.


UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Share this post


Link to post
Share on other sites
DeltaRocked

wow. learn't a lot from these posts. thank you everyone.

now need to test this regex in real world pages - which incidentally contain phishing / malwares / clean too .

regards

Share this post


Link to post
Share on other sites
Chance

So, what's it you're doing?

Share this post


Link to post
Share on other sites
DeltaRocked

malware research.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×