Jump to content
Sign in to follow this  
GPinzone

Strange RegEx fix that seems unnecessary

Recommended Posts

GPinzone

I wrote a program to convert single and double quotes to curly single and double quotes. The program uses regular expression search and replaces to swap the straight quotes to their curly counterparts and to ignore any HTML tags, too.

#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <FontConstants.au3>


Func Curlify($sInput)
    Local $sOutput = StringRegExpReplace($sInput, "'(?!([^<]+)?>)", "’")
    $sOutput = StringRegExpReplace($sOutput, "((^|\s+)(<[^>]*>)*)’", "${1}‘")
    $sOutput = StringRegExpReplace($sOutput, '"(?!([^<]+)?>)', "”")
    $sOutput = StringRegExpReplace($sOutput, "((^|\s+)(<[^>]*>)*)”", "${1}“")
    ; Fix nested quotes
    While StringRegExp($sOutput, "(‘|“)(<[^>]*>)*(’|”)")
        $sOutput = StringRegExpReplace($sOutput, "((‘|“)(<[^>]*>)*)’", "${1}‘")
        $sOutput = StringRegExpReplace($sOutput, "((‘|“)(<[^>]*>)*)”", "${1}“")
    WEnd

    Return $sOutput
EndFunc   ;==>Curlify

$FormCurly = GUICreate("Form_Curly", 708, 439, 192, 124)
$Original = GUICtrlCreateEdit("", 0, 0, 305, 433)
GUICtrlSetFont(-1, 9, $FW_NORMAL, 0, "Courier New") ; Set the font of the previous control.
GUICtrlSetLimit(-1, 1500000)
$Modified = GUICtrlCreateEdit("", 408, 0, 297, 433)
GUICtrlSetFont(-1, 9, $FW_NORMAL, 0, "Courier New") ; Set the font of the previous control.
GUICtrlSetLimit(-1, 1500000)
$Curly = GUICtrlCreateButton("Curly", 320, 64, 75, 25)
$Reset = GUICtrlCreateButton("Reset", 320, 164, 75, 25)
GUISetState(@SW_SHOW)

While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $Curly
            GUICtrlSetData($Modified, Curlify(GUICtrlRead($Original)))

        Case $Reset
            GUICtrlSetData($Original, "")
            GUICtrlSetData($Modified, "")

        Case $GUI_EVENT_CLOSE
            Exit

    EndSwitch
WEnd

 

So why am I posting a question about a program that works?  This regex:

    $sOutput = StringRegExpReplace($sOutput, "((^|\s+)(<[^>]*>)*)’", "${1}‘")

should be able to be rewritten without the "+":

    $sOutput = StringRegExpReplace($sOutput, "((^|\s)(<[^>]*>)*)’", "${1}‘")

Same goes for the one to do double quotes, obviously.

However, if I remove the "+" from the "\s" in the regex, the program will fail in some cases. Something like <b>’<i>Don’t</i>’</b> at the beginning of a line won't get fixed. I suspect it's got something to do with the fact the the previous character (aside from the HTML tags) is a newline. The "\s" should work just fine with newlines. I'm stumped because the regex without the "+" modifier works fine on http://www.regextester.com/ when I test it.

 

 

Edited by GPinzone

Gerard J. Pinzonegpinzone AT yahoo.com

Share this post


Link to post
Share on other sites
guinness

62 posts in the forum should mean you have an understanding of how to post code.


UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Share this post


Link to post
Share on other sites
GPinzone

OCD is a terrible mental issue.


Gerard J. Pinzonegpinzone AT yahoo.com

Share this post


Link to post
Share on other sites
iamtheky

it may be my ocd, but i would expect examples of where this succeeds and fails.  And if recent history is any indicator,  I would then expect the thread to grow and carry on with edge cases not mentioned in those examples while a battle for regex supremacy runs wild.

 

*and put your stuff in code tags, its for legibility purposes and consideration of the people you want to help you.  its not ocd its just your ugly ass word wall.

Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
mikell

And if recent history is any indicator,  I would then expect the thread to grow and carry on with edge cases not mentioned in those examples while a battle for regex supremacy runs wild.

Well, who knows...  :D

Share this post


Link to post
Share on other sites
guinness

OCD is a terrible mental issue.

With comments like that, it's pretty clear you won't be around the forum for very long.

Edited by guinness

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Share this post


Link to post
Share on other sites
GPinzone

it may be my ocd, but i would expect examples of where this succeeds and fails.

I did:

Something like <b>’<i>Don’t</i>’</b> at the beginning of a line won't get fixed.


Gerard J. Pinzonegpinzone AT yahoo.com

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • nend
      By nend
      This is a program that I made to help my self learn better regular expressions.
      There are a lot of other programs/website with the similar functions.
      But the main advantage of this program is that you don't have to click a button after every changes.
      The program detected changes and react on it.
      Function:
      Match Match of arrays Match and replace Load source data from website Load source data from a website with GET/POST Load text data from file Clear fields Export and Import settings (you can finish the expression a other time, just export/import it) Cheat sheet Generate AutoIt code The source code is not difficult and I think most user will understand it.
      In the zip file there are 2 export files (POST and a reg back example), you can drag and drop these files on the gui to import them.
      Download Regex Toolkit Regex toolkit.zip (Sourcode, exmaple and exe file)
      EDIT: Updated to version V1.2.0
      Changes are:
      Expand and collapse of the cheat sheet (Thanks to Melba23 for the Guiextender UDF) Usefull regular expressions websites links included in the program Text data update time EDIT: Updated to version V1.3.0
      Changes are:
       Automatic generate AutoIt code  Icons on the tab  Few minor bug fixes EDIT: Updated to version V1.4.0
      Changes are:
      Link to AutoIt regex helpfile If the regular expression has a error than the text becomes red Option Offset with Match and array of Matches Option Count with Match and replace Some small minor bug fixed EDIT: Updated to version V1.4.1
      Changes are:
      Small bug in "create AutoIt" code fixed
    • gruntydatsun
      By gruntydatsun
      I have an XML file and every time there are three lines in a row with only <null/> in them, i want to insert a fourth line with <null/>.   Each line starts with 3 white spaces, followed by <null/> and ends with a white space followed by CR LF.   The presence of the three lines as described is unique to the points where I want to insert a line in this document.
       I'm trying to figure out how to apply the repeating part of a regex  {1,4} but apply it to this whole segment. 
      So far I have the below which picks up an individual line ok:
      ^\s{3}<null/>\s\r\n I tried wrapping it all in braces () then adding {3} but I'm obviously getting something wrong. 
      Attached is a section from the xml file with a block of nulls that should be matched if anyone would like to have a look.
      Help_From_Forum.xml
    • milkmoron
      By milkmoron
      I am trying to search in a web browser dates XX/XX/XXXX that are also links. I want to click them after and remove them from the array. This is all I have so far. Nothing shows up. What am I doing wrong?
      ControlFocus ("Customer Center", "", "")
      Local $aArray = StringRegExp('(..)/(..)/(....)', '(..)/(..)/(....)', $STR_REGEXPARRAYFULLMATCH)
      For $i = 0 To UBound($aArray) - 1
          MsgBox($MB_SYSTEMMODAL, "RegExp Test with Option 2 - " & $i, $aArray[$i])
      Next
       
    • WoodGrain
      By WoodGrain
      Hi All,
      I'd like to replace 'COMMA' with ',' for example:
      $myString = "COMMA" StringRegExpReplace($myString, 'COMMA', ',') Now I've tried escaping the ',' in various ways unsuccessfully, such as:
      '[,]'
      "[,]"
      '\,'
      [,] seems to work in the pattern, I just can't figure out how to use it in the replace, and it seems everyone online is only interested in removing/replacing commas lol.
      I also tried creating and using a variable as the replacement but also didn't work:
      $myComma = "," $myString = "COMMA" StringRegExpReplace($myString, 'COMMA', $myComma) I'm sure it's super simple if someone could point me in the right direction - thanks.
    • therks
      By therks
      I'm looking for a regex genius, cus I'm stumped when it comes to assertions.
      So what I have now, is this regular expression: ([^|=]+)=([^|]+)
      It takes a string (user input) of keys=values separated by pipes (ie: "param=value|param=value") and splits them into an array.
      Example:
      $vParamData = 'example=value|fruit=apple|phrase=Hello world' $aRegEx = StringRegExp($vParamData, '([^|=]+)=([^|]+)', 3) ; Result ; [0] => example ; [1] => value ; [2] => fruit ; [3] => apple ; [4] => phrase ; [5] => Hello world So that's working fine, but I'm wondering if there's also a way I could have this capture escaped pipes instead of splitting by them.
      ie:
      $vParamData = 'pipe test=this \| is a pipe|example=value' $aRegEx = StringRegExp($vParamData, '([^|=]+)=([^|]+)', 3) ; I'm getting this: ; [0] => pipe test ; [1] => this \ ; [2] => example ; [3] => value ; But I'd like a result like this: ; [0] => pipe test ; [1] => this \| is a pipe ; [2] => example ; [3] => value Is there some pattern that would accomplish this, or am I better off parsing it some other way?
×