VIP

[Solved] Regular expression Unicode to ASCII

5 posts in this topic

#1 ·  Posted (edited)

 

mikell

;Global $sCharAllowed = ";!#$%&'()+-,.0123456789=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{ }~"

Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖D๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊAịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷOŸǾǿ₪₫€℅l№™Ωe⅛∑-•v8∫˜≠==□ּׂאַאָאּבּVגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿAﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇTﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐRﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤOﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶNﻷﻸﻹﻺﻻﻼْ%-㍱G-煱-둻-睤-㌹-"

Local $lStringOUT = StringRegExpReplace($uStringIN, '[[:^print:]]', "_")

ConsoleWrite($lStringOUT&@CRLF)

jguinch

Global $sCharAllowed = "\w;!#$%&'\(\)+,-.=@\[\]^`{ }~"

Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖D๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊAịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷOŸǾǿ₪₫€℅l№™Ωe⅛∑-•v8∫˜≠==□ּׂאַאָאּבּVגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿAﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇTﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐRﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤOﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶNﻷﻸﻹﻺﻻﻼْ%-㍱G-煱-둻-睤-㌹-"
$replace = StringRegExpReplace($uStringIN, "[^" & $sCharAllowed& "]", "_")
MsgBox(0, "", @CRLF & $replace & @CRLF)

 

Edited by Trong

Regards,
 

Share this post


Link to post
Share on other sites



Not tested but this should work

$lStringOUT = StringRegExpReplace($uStringIN, '[[:^print:]]', "_")

 

1 person likes this

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

or this ?

Global $sCharAllowed = "\w;!#$%&'\(\)+,-.=@\[\]^`{}~"

Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷŸǾǿ₪₫€℅l№™Ωe⅛∑-/•v8∫˜≠==□ּׂאַאָאּבּגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶﻷﻸﻹﻺﻻﻼْ%-㍱-煱-둻-睤-㌹-"

$replace = StringRegExpReplace($uStringIN, "[^" & $sCharAllowed& "]", "_")

MsgBox(0, "", @CRLF & $replace & @CRLF)

 

Edited by jguinch
1 person likes this

Share this post


Link to post
Share on other sites

or this ?

Global $sCharAllowed = "\w;!#$%&'\(\)+,-.=@\[\]^`{}~"

Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷŸǾǿ₪₫€℅l№™Ωe⅛∑-/•v8∫˜≠==□ּׂאַאָאּבּגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶﻷﻸﻹﻺﻻﻼْ%-㍱-煱-둻-睤-㌹-"

$replace = StringRegExpReplace($uStringIN, "[" & $sCharAllowed& "]", "_")

MsgBox(0, "", @CRLF & $replace & @CRLF)

 

Ouput=頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷŸǾǿ₪₫€℅l№™Ωe⅛∑-/•v8∫˜≠==□ּׂאַאָאּבּגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶﻷﻸﻹﻺﻻﻼْ%-㍱-煱-둻-睤-㌹-


@jguinch something wrong!

 


Regards,
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • nend
      By nend
      This is a program that I made to help my self learn better regular expressions.
      There are a lot of other programs/website with the similar functions.
      But the main advantage of this program is that you don't have to click a button after every changes.
      The program detected changes and react on it.

      Function:
      - Match - Match of arrays - Match and replace - Load source data from website - Load source data from a website with POST - Load text data from file - Clear fields - Export and Import settings (you can finish the expression a other time, just export/import it) - Cheat sheet The source code is not difficult and I think most user will understand it.
      This program does need the winhttp udf https://www.autoitscript.com/forum/topic/84133-winhttp-functions/ (it's also include in the zip file)
      In the zip file there is also a export file (POST example), this is a example of website source code with POST.
      You can download it here Regex toolkit.zip
       
      EDIT: Updated to version V1.2.0
      Change are:
      - expand and collapse of the cheat sheet (Thanks to Melba23 for the Guiextender UDF) - usefull regular expressions websites links included in the prgram - text data update time
    • nikink
      By nikink
      Hi all, it's been a while since I last used regular expressions and I find myself out of time to experiment with this particular issue, so I throw myself upon your mercy and expertise.
      I am looking to create a function that will say whether or not a supplied string is a valid UUID or not.
      Local $sTestF = '4C4C4544-004A-4C10-8054-B7C04F46343' Local $sTestT = '4C4C4544-004A-4C10-8054-B7C04F463432' ConsoleWrite('False = ' & _IsValidUUID($sTestF) & @CRLF) ConsoleWrite('True = ' & _IsValidUUID($sTestT) & @CRLF) Func _IsValidUUID($sUUID) ;[\p{XDigit}]{8}-[\p{XDigit}]{4}-[34][\p{XDigit}]{3}-[89ab][\p{XDigit}]{3}-[\p{XDigit}]{12} ; Test UUID = '4C4C4544-004A-4C10-8054-B7C04F463432' Local $sRegExp = '([:xdigit:]){8}\-([:xdigit:]){4}\-([34])([:xdigit:]){3}\-([89ab])([:xdigit:]){3}\-([:xdigit:]){12}' ConsoleWrite(StringRegExp($sUUID, $sRegExp) & @CRLF) Local $Result = StringRegExp($sUUID, $sRegExp) ConsoleWrite($Result & @CRLF) If @error Then ConsoleWrite('Error: [' & @error & ']' & @CRLF) Return 'False' Else ConsoleWrite('Error2: [' & @error & ']' & @CRLF) Return 'True' EndIf EndFunc In the line under the Function call, you'll see the regex I found to do this from a google search. That was my starting point, and I'm trying to get it to work in Au3 and failing miserably.
      $sTestF is a known invalid String
      $sTestT is a known valid String
      Everything I've tried so far has produced the same results for both.
      Any help you could provide me is greatly appreciated. Thanks for your time!
    • sjaikumar
      By sjaikumar
      I am looking to writing an automation script for converting the following SQL procedure code into VBCode as shown below
      Example
       
      ALTER PROCEDURE [dbo].[firstprocedure] (        @var1 varchar(10),        @var2 varchar(7),        @var3 float )    CONVERSION Public Function firstprocedure(ByVal var1 As String, ByVal var2 As String, ByVal var3 As Integer) As DataSet         Dim ds As New DataSet()         '**************query  with stored procedure**********         Dim CMD As New SqlCommand("GetCountOfTempGramWtsGwByFoodCodeProgressAndNewSequence")         CMD.Connection = GetConnection()         CMD.Parameters.Add("@var1", SqlDbType.VarChar).Value = var1         CMD.Parameters.Add("@var2", SqlDbType.VarChar).Value = var2         CMD.Parameters.Add("@var3", SqlDbType.Float).Value = var3         CMD.CommandType = CommandType.StoredProcedure         Dim adapter As New SqlDataAdapter(CMD)  

       
      I will be reading the procedural code from the first file that has to be read and create the VB code by writing onto a new file.
      My approach is that I need the following information captured in Variables which I can insert later onto the new file as and where applicable.
       
      In order to do that I need to extract the following bit of information from the file to be READ
      Name of procedure : firstprocedure
      List of Variables : @var1, @var2, @var3  
      Data Types: varchar, varchar & float
       
      What I need help with is extracting the list of variables and data types in separate variables.
      I am looking to build a Regular Expression which I can use to achieve the same. 
      I tried making use of StringSplit function delimited on spaces(" ") but that did not work when reading the file from notepad. I reckon it does not detect spaces in the file.
      Please help me with the RegExp.
      Any other suggestions on how best to go about doing this conversion are also welcome.
      Thank You
       
       
    • cag8f
      By cag8f
      Hi all.   I'm revisiting poorly commented code of mine from over a year ago.  In one line I search a string for a regular expression and cannot figure out exactly what the code is searching for.  The expression is:
      .+(?>\R) I've tried to piece it together from the StringRegExp() page.  My educated guess is that it searches for newline characters in some capacity.  Here is what I have so far:
      (?>\R):  The (?>...) indicates an atomic non-capturing group, meaning in-part that string matches are not recorded for later reference.  I'm not sure what the 'atomic' means though.  The description says that this 'locks,' which I'm also unclear on.  The \R matches any (Unicode) newline character.  So is this component somehow searching for new lines? .+:  I'm not sure how these are modifying the above, nor exactly how they work together.  The . matches any single character except newline characters, unless (/S) is active.  How can I check if /S is active?  Would it be a parameter set in one of my options files?  And the + seems to match 1 or more.  This seems to make the preceding . redundant? The complete line of code is:
      $aArray = StringRegExp(_IEBodyReadText($oIE2), '.+(?>\R)', 3)
      I'm reading body text from an HTML page.  I can successfully print out each element of this array.  Each element contains one line of text, followed by a blank line.  Hopefully that helps to confirm things.
      Thanks in advance.
    • GPinzone
      By GPinzone
      I wrote a program to convert single and double quotes to curly single and double quotes. The program uses regular expression search and replaces to swap the straight quotes to their curly counterparts and to ignore any HTML tags, too.
      #include <ButtonConstants.au3>
      #include <EditConstants.au3>
      #include <GUIConstantsEx.au3>
      #include <WindowsConstants.au3>
      #include <FontConstants.au3>

      Func Curlify($sInput)
          Local $sOutput = StringRegExpReplace($sInput, "'(?!([^<]+)?>)", "’")
          $sOutput = StringRegExpReplace($sOutput, "((^|\s+)(<[^>]*>)*)’", "${1}‘")
          $sOutput = StringRegExpReplace($sOutput, '"(?!([^<]+)?>)', "”")
          $sOutput = StringRegExpReplace($sOutput, "((^|\s+)(<[^>]*>)*)”", "${1}“")
          ; Fix nested quotes
          While StringRegExp($sOutput, "(‘|“)(<[^>]*>)*(’|”)")
              $sOutput = StringRegExpReplace($sOutput, "((‘|“)(<[^>]*>)*)’", "${1}‘")
              $sOutput = StringRegExpReplace($sOutput, "((‘|“)(<[^>]*>)*)”", "${1}“")
          WEnd
          Return $sOutput
      EndFunc   ;==>Curlify
      $FormCurly = GUICreate("Form_Curly", 708, 439, 192, 124)
      $Original = GUICtrlCreateEdit("", 0, 0, 305, 433)
      GUICtrlSetFont(-1, 9, $FW_NORMAL, 0, "Courier New") ; Set the font of the previous control.
      GUICtrlSetLimit(-1, 1500000)
      $Modified = GUICtrlCreateEdit("", 408, 0, 297, 433)
      GUICtrlSetFont(-1, 9, $FW_NORMAL, 0, "Courier New") ; Set the font of the previous control.
      GUICtrlSetLimit(-1, 1500000)
      $Curly = GUICtrlCreateButton("Curly", 320, 64, 75, 25)
      $Reset = GUICtrlCreateButton("Reset", 320, 164, 75, 25)
      GUISetState(@SW_SHOW)
      While 1
          $nMsg = GUIGetMsg()
          Switch $nMsg
              Case $Curly
                  GUICtrlSetData($Modified, Curlify(GUICtrlRead($Original)))
              Case $Reset
                  GUICtrlSetData($Original, "")
                  GUICtrlSetData($Modified, "")
              Case $GUI_EVENT_CLOSE
                  Exit
          EndSwitch
      WEnd
       
      So why am I posting a question about a program that works?  This regex:
          $sOutput = StringRegExpReplace($sOutput, "((^|\s+)(<[^>]*>)*)’", "${1}‘")
      should be able to be rewritten without the "+":
          $sOutput = StringRegExpReplace($sOutput, "((^|\s)(<[^>]*>)*)’", "${1}‘")
      Same goes for the one to do double quotes, obviously.
      However, if I remove the "+" from the "\s" in the regex, the program will fail in some cases. Something like <b>’<i>Don’t</i>’</b> at the beginning of a line won't get fixed. I suspect it's got something to do with the fact the the previous character (aside from the HTML tags) is a newline. The "\s" should work just fine with newlines. I'm stumped because the regex without the "+" modifier works fine on http://www.regextester.com/ when I test it.