Jump to content
Sign in to follow this  
DeltaRocked

[Solved] Regex Help

Recommended Posts

DeltaRocked

Hi,

Type String 1 :

<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>

Type String 2 :

<META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com">

Present Regex:

(?i)(?:<[\s*]{0,1}meta http-equiv[\s*]{0,1}=[\s*]{0,1}["']{0,1}refresh["']{0,1}[^>]*)content[\s*]?=[\s*]?\"(.*?)\"

This regex extract "Content" from string 2 but fails while processing String 1. Is there any way , by what of which I can have 1 regex which will extract "Content" from both the strings?

regards

Edited by DeltaRocked

Share this post


Link to post
Share on other sites
Chance

pffff

#include <Array.au3>
Global $Test[2]
$Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>'
$Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com">'

For $I = 0 To 1
$Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=""(.*?)"".*?>", 3)
_ArrayDisplay($Result)
Next

In case its the url ur after

#include <Array.au3>
Global $Test[2]
$Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>'
$Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">'

For $I = 0 To 1
$Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=[""'](?:d{0,3};s?url=)(.*?)[""'].*?>", 3)
_ArrayDisplay($Result)
Next
Edited by FlutterShy

Share this post


Link to post
Share on other sites
blityon

$arrayDatos = StringRegExp($datos, '<META .*? CONTENT="(.*?)".*?>', 3)

Edited by blityon

Share this post


Link to post
Share on other sites
Robjong

Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance.

#include <Array.au3>


Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _
        '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF


$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content
$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only

_ArrayDisplay($aResult)

Share this post


Link to post
Share on other sites
Chance

Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance.

#include <Array.au3>


Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _
'<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF


$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content
$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only

_ArrayDisplay($aResult)

Note taken, just opened my eyes to some flaws in a project of mine, thanks for the contribution!

Share this post


Link to post
Share on other sites
guinness

Isn't using ? next to * pointless? So .*? is the same as .* but not .+?.


UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Share this post


Link to post
Share on other sites
Robjong

No.

.* will match 0 or more occurrences of any character (except for newline, assuming single line mode),

.+ would match 1 or more occurrences of any character,

they would consume the largest possible match, this is called greedy.

But in .*? the lazy operator (?) will tell it to return the smallest possible match, which would match nothing or 1 character.

Because the pattern ".*?" starts and ends with a quote the engine will look for 0 or more character between 2 quotes, this would match against "", as well as "1".

If we had the pattern ".+?" and matched it against the subject "" it would fail, because it needs at least 1 character, so it would match "1".

Example:

$sSubject = 'A string with a "quoted part", and a separate " floating in there.'

; Greedy
$aResult = StringRegExp($sSubject, '".*"', 3) ; matches : "quoted part", and a separate "
_ArrayDisplay($aResult, "Greedy")

; Lazy
$aResult = StringRegExp($sSubject, '".*?"', 3) ; matches: "quoted part"
_ArrayDisplay($aResult, "Lazy")

Share this post


Link to post
Share on other sites
guinness

I was having a dumb moment then. Thanks Robjong.


UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Share this post


Link to post
Share on other sites
DeltaRocked

wow. learn't a lot from these posts. thank you everyone.

now need to test this regex in real world pages - which incidentally contain phishing / malwares / clean too .

regards

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×