Jump to content

[Solved] Regex Help


Recommended Posts

Hi,

Type String 1 :

<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>

Type String 2 :

<META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com">

Present Regex:

(?i)(?:<[\s*]{0,1}meta http-equiv[\s*]{0,1}=[\s*]{0,1}["']{0,1}refresh["']{0,1}[^>]*)content[\s*]?=[\s*]?\"(.*?)\"

This regex extract "Content" from string 2 but fails while processing String 1. Is there any way , by what of which I can have 1 regex which will extract "Content" from both the strings?

regards

Edited by DeltaRocked
Link to comment
Share on other sites

pffff

#include <Array.au3>
Global $Test[2]
$Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>'
$Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1; url=http://roundtopstatebank.com">'

For $I = 0 To 1
$Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=""(.*?)"".*?>", 3)
_ArrayDisplay($Result)
Next

In case its the url ur after

#include <Array.au3>
Global $Test[2]
$Test[0] = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>'
$Test[1] = '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">'

For $I = 0 To 1
$Result = StringRegExp($test[$I], "(?i)<META.*?CONTENT=[""'](?:d{0,3};s?url=)(.*?)[""'].*?>", 3)
_ArrayDisplay($Result)
Next
Edited by FlutterShy
Link to comment
Share on other sites

Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance.

#include <Array.au3>


Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _
        '<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF


$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content
$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only

_ArrayDisplay($aResult)
Link to comment
Share on other sites

Try to avoid using a lazy dot ".*?", instead use a negating character class "[^"]*" whenever possible. This avoids unnecessary backtracking and thus increases performance.

#include <Array.au3>


Global $sString = '<META CONTENT="1; url=http://roundtopstatebank.com" HTTP-EQUIV=refresh>' & @LF & _
'<META HTTP-EQUIV=refresh CONTENT="1;url=http://roundtopstatebank.com">' & @LF


$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"([^"]+)"[^>]*>', 3) ; content
$aResult = StringRegExp($sString, '(?i)<meta[^>]+contents*=s*"d+;s*url=([^"]+)"[^>]*>', 3) ; URL only

_ArrayDisplay($aResult)

Note taken, just opened my eyes to some flaws in a project of mine, thanks for the contribution!
Link to comment
Share on other sites

Isn't using ? next to * pointless? So .*? is the same as .* but not .+?.

UDF List:

 
_AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

No.

.* will match 0 or more occurrences of any character (except for newline, assuming single line mode),

.+ would match 1 or more occurrences of any character,

they would consume the largest possible match, this is called greedy.

But in .*? the lazy operator (?) will tell it to return the smallest possible match, which would match nothing or 1 character.

Because the pattern ".*?" starts and ends with a quote the engine will look for 0 or more character between 2 quotes, this would match against "", as well as "1".

If we had the pattern ".+?" and matched it against the subject "" it would fail, because it needs at least 1 character, so it would match "1".

Example:

$sSubject = 'A string with a "quoted part", and a separate " floating in there.'

; Greedy
$aResult = StringRegExp($sSubject, '".*"', 3) ; matches : "quoted part", and a separate "
_ArrayDisplay($aResult, "Greedy")

; Lazy
$aResult = StringRegExp($sSubject, '".*?"', 3) ; matches: "quoted part"
_ArrayDisplay($aResult, "Lazy")
Link to comment
Share on other sites

I was having a dumb moment then. Thanks Robjong.

UDF List:

 
_AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...