Jump to content
VIP

[Solved] Regular expression Unicode to ASCII

Recommended Posts

VIP

 

mikell

;Global $sCharAllowed = ";!#$%&'()+-,.0123456789=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{ }~"

Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖D๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊAịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷOŸǾǿ₪₫€℅l№™Ωe⅛∑-•v8∫˜≠==□ּׂאַאָאּבּVגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿAﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇTﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐRﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤOﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶNﻷﻸﻹﻺﻻﻼْ%-㍱G-煱-둻-睤-㌹-"

Local $lStringOUT = StringRegExpReplace($uStringIN, '[[:^print:]]', "_")

ConsoleWrite($lStringOUT&@CRLF)

jguinch

Global $sCharAllowed = "\w;!#$%&'\(\)+,-.=@\[\]^`{ }~"

Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖D๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊAịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷOŸǾǿ₪₫€℅l№™Ωe⅛∑-•v8∫˜≠==□ּׂאַאָאּבּVגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿAﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇTﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐRﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤOﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶNﻷﻸﻹﻺﻻﻼْ%-㍱G-煱-둻-睤-㌹-"
$replace = StringRegExpReplace($uStringIN, "[^" & $sCharAllowed& "]", "_")
MsgBox(0, "", @CRLF & $replace & @CRLF)

 

Edited by Trong

Regards,
 

Share this post


Link to post
Share on other sites
mikell

Not tested but this should work

$lStringOUT = StringRegExpReplace($uStringIN, '[[:^print:]]', "_")

 

  • Like 1

Share this post


Link to post
Share on other sites
jguinch

or this ?

Global $sCharAllowed = "\w;!#$%&'\(\)+,-.=@\[\]^`{}~"

Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷŸǾǿ₪₫€℅l№™Ωe⅛∑-/•v8∫˜≠==□ּׂאַאָאּבּגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶﻷﻸﻹﻺﻻﻼْ%-㍱-煱-둻-睤-㌹-"

$replace = StringRegExpReplace($uStringIN, "[^" & $sCharAllowed& "]", "_")

MsgBox(0, "", @CRLF & $replace & @CRLF)

 

Edited by jguinch
  • Like 1

Share this post


Link to post
Share on other sites
VIP

or this ?

Global $sCharAllowed = "\w;!#$%&'\(\)+,-.=@\[\]^`{}~"

Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷŸǾǿ₪₫€℅l№™Ωe⅛∑-/•v8∫˜≠==□ּׂאַאָאּבּגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶﻷﻸﻹﻺﻻﻼْ%-㍱-煱-둻-睤-㌹-"

$replace = StringRegExpReplace($uStringIN, "[" & $sCharAllowed& "]", "_")

MsgBox(0, "", @CRLF & $replace & @CRLF)

 

Ouput=頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷŸǾǿ₪₫€℅l№™Ωe⅛∑-/•v8∫˜≠==□ּׂאַאָאּבּגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶﻷﻸﻹﻺﻻﻼْ%-㍱-煱-둻-睤-㌹-


@jguinch something wrong!

 


Regards,
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • nend
      By nend
      This is a program that I made to help my self learn better regular expressions.
      There are a lot of other programs/website with the similar functions.
      But the main advantage of this program is that you don't have to click a button after every changes.
      The program detected changes and react on it.
      Function:
      Match Match of arrays Match and replace Load source data from website Load source data from a website with GET/POST Load text data from file Clear fields Export and Import settings (you can finish the expression a other time, just export/import it) Cheat sheet Generate AutoIt code The source code is not difficult and I think most user will understand it.
      In the zip file there are 2 export files (POST and a reg back example), you can drag and drop these files on the gui to import them.
      Download Regex Toolkit Regex toolkit.zip (Sourcode, exmaple and exe file)
      EDIT: Updated to version V1.2.0
      Changes are:
      Expand and collapse of the cheat sheet (Thanks to Melba23 for the Guiextender UDF) Usefull regular expressions websites links included in the program Text data update time EDIT: Updated to version V1.3.0
      Changes are:
       Automatic generate AutoIt code  Icons on the tab  Few minor bug fixes EDIT: Updated to version V1.4.0
      Changes are:
      Link to AutoIt regex helpfile If the regular expression has a error than the text becomes red Option Offset with Match and array of Matches Option Count with Match and replace Some small minor bug fixed EDIT: Updated to version V1.4.1
      Changes are:
      Small bug in "create AutoIt" code fixed
    • therks
      By therks
      I'm looking for a regex genius, cus I'm stumped when it comes to assertions.
      So what I have now, is this regular expression: ([^|=]+)=([^|]+)
      It takes a string (user input) of keys=values separated by pipes (ie: "param=value|param=value") and splits them into an array.
      Example:
      $vParamData = 'example=value|fruit=apple|phrase=Hello world' $aRegEx = StringRegExp($vParamData, '([^|=]+)=([^|]+)', 3) ; Result ; [0] => example ; [1] => value ; [2] => fruit ; [3] => apple ; [4] => phrase ; [5] => Hello world So that's working fine, but I'm wondering if there's also a way I could have this capture escaped pipes instead of splitting by them.
      ie:
      $vParamData = 'pipe test=this \| is a pipe|example=value' $aRegEx = StringRegExp($vParamData, '([^|=]+)=([^|]+)', 3) ; I'm getting this: ; [0] => pipe test ; [1] => this \ ; [2] => example ; [3] => value ; But I'd like a result like this: ; [0] => pipe test ; [1] => this \| is a pipe ; [2] => example ; [3] => value Is there some pattern that would accomplish this, or am I better off parsing it some other way?
    • Chimp
      By Chimp
      regex and iso escape sequences
      Hi, I would like to extract all ISO escape squences embedded in a string and separate them from the rest of the string, still keeping the information about their position, so that, for exemple, a string like this one (or even more complex):
      (the string could start with normal text or iso sequences)
       
      '\u001B[4mUnicorn\u001B[0m' should be 'transformed' in an array like this
      $a[0] = '\u001B[4m' ; first iso escape sequence $a[1] = 'Unicorn' ; normal text $a[2] = '\u001B[4m' ; second iso escape sequence ... and so on (note: the above escape sequence has 'control codes' marked as "\u001B' for the asc "esc" char for exemple and a similar notation is also used for other control chars, but in the real string to be parsed those control chars  are embedded  as a single byte with a value from 01 to 31). at this link (http://artscene.textfiles.com/ansi/) there are many example of real ANSI text files .
      searching on the web I've found some possible solutions that make use of regexp to achieve similar purpose, and above some others, the regexp pattern posted in the following link by kfir (https://stackoverflow.com/questions/14693701/how-can-i-remove-the-ansi-escape-sequences-from-a-string-in-python) seems to be able to catch a wider range of ISO escape sequences (not only color sequences), but my lack of skills on regexp, prevents me from evaluating and testing such patterns
      I would be very grateful if some regexp guru could come to my rescue...
      thanks everybody  for reading...
    • ur
      By ur
      I am trying to identify the window based on the window title and text.
      The title will be the "erwin DM - filename"

      It is working till date, but some operating systems our application is displaying window as "erwin DM - [filename]"
       
      I tried  "erwin DM - *filename*" But this regular expression is not working.
      Any suggestion?
       
      $sModelFile = "C:\Users\Administrator\Documents\My Models\eMovies.erwin" $wdModel = _WinWaitActivate1("erwin DM - "&FileNameOnly($sModelFile),"") Func _WinWaitActivate1($title,$text,$timeout=0);Will Return the window Handler Logging("Waiting for "&$title&":"&$text) $dHandle = WinWait($title,$text,$timeout) if not ($dHandle = 0) then If Not WinActive($title,$text) Then WinActivate($title,$text) return WinWaitActive($title,$text,$timeout) Else Logging("Timeout occured while waiting for the window...") Exit EndIf EndFunc Func FileNameOnly($sFilePath) Local $sDrive = "", $sDir = "", $sFileName = "", $sExtension = "" Local $aPathSplit = _PathSplit($sFilePath, $sDrive, $sDir, $sFileName, $sExtension) ;_ArrayDisplay($aPathSplit, "_PathSplit of " & @ScriptFullPath) return $sFileName EndFunc  
    • nikink
      By nikink
      Hi all, it's been a while since I last used regular expressions and I find myself out of time to experiment with this particular issue, so I throw myself upon your mercy and expertise.
      I am looking to create a function that will say whether or not a supplied string is a valid UUID or not.
      Local $sTestF = '4C4C4544-004A-4C10-8054-B7C04F46343' Local $sTestT = '4C4C4544-004A-4C10-8054-B7C04F463432' ConsoleWrite('False = ' & _IsValidUUID($sTestF) & @CRLF) ConsoleWrite('True = ' & _IsValidUUID($sTestT) & @CRLF) Func _IsValidUUID($sUUID) ;[\p{XDigit}]{8}-[\p{XDigit}]{4}-[34][\p{XDigit}]{3}-[89ab][\p{XDigit}]{3}-[\p{XDigit}]{12} ; Test UUID = '4C4C4544-004A-4C10-8054-B7C04F463432' Local $sRegExp = '([:xdigit:]){8}\-([:xdigit:]){4}\-([34])([:xdigit:]){3}\-([89ab])([:xdigit:]){3}\-([:xdigit:]){12}' ConsoleWrite(StringRegExp($sUUID, $sRegExp) & @CRLF) Local $Result = StringRegExp($sUUID, $sRegExp) ConsoleWrite($Result & @CRLF) If @error Then ConsoleWrite('Error: [' & @error & ']' & @CRLF) Return 'False' Else ConsoleWrite('Error2: [' & @error & ']' & @CRLF) Return 'True' EndIf EndFunc In the line under the Function call, you'll see the regex I found to do this from a google search. That was my starting point, and I'm trying to get it to work in Au3 and failing miserably.
      $sTestF is a known invalid String
      $sTestT is a known valid String
      Everything I've tried so far has produced the same results for both.
      Any help you could provide me is greatly appreciated. Thanks for your time!
×