Jump to content
Sign in to follow this  
Chimp

regexp and ANSI escape sequences

Recommended Posts

regex and iso escape sequences

Hi, I would like to extract all ISO escape squences embedded in a string and separate them from the rest of the string, still keeping the information about their position, so that, for exemple, a string like this one (or even more complex):

(the string could start with normal text or iso sequences)
 

'\u001B[4mUnicorn\u001B[0m'

should be 'transformed' in an array like this

$a[0] = '\u001B[4m'   ; first iso escape sequence
$a[1] = 'Unicorn'     ; normal text
$a[2] = '\u001B[4m'   ; second iso escape sequence
... and so on

(note: the above escape sequence has 'control codes' marked as "\u001B' for the asc "esc" char for exemple and a similar notation is also used for other control chars, but in the real string to be parsed those control chars  are embedded  as a single byte with a value from 01 to 31). at this link (http://artscene.textfiles.com/ansi/) there are many example of real ANSI text files .

searching on the web I've found some possible solutions that make use of regexp to achieve similar purpose, and above some others, the regexp pattern posted in the following link by kfir (https://stackoverflow.com/questions/14693701/how-can-i-remove-the-ansi-escape-sequences-from-a-string-in-python) seems to be able to catch a wider range of ISO escape sequences (not only color sequences), but my lack of skills on regexp, prevents me from evaluating and testing such patterns
I would be very grateful if some regexp guru could come to my rescue...

thanks everybody  for reading...

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

I discovered that while you can cajole regexp into giving you a position of arbitrary escape sequences it is way slower than just walking the string using stringinstr like on the order of 2:1 or 3:1

better off doing this 

Local $yourstr = "Mytest" & Chr(1) & "String" & Chr(0) & "Mytest" & Chr(3) & "String" & "Mytest" & Chr(31) & "String"
;------------------------------
Local Const $STR_CASESENSE = 0
Local $sMatch
For $i = 0 To 31
    $sMatch &= Chr($i)
Next

For $i = 1 To StringLen($yourstr)
    $sChr = StringMid($yourstr, $i, 1)
    If StringInStr($sMatch, $sChr, $STR_CASESENSE) > 0 Then ;(note the casesense)
        ConsoleWrite("Esc \" & Asc($sChr) & " @ Pos:" & $i & @CRLF)
        ;More Processing
    EndIf
Next
;---------------------------------------------------------
;OR Faster

For $i = 1 To BinaryLen($yourstr)
    $iChr = BinaryMid($yourstr, $i, 1)

    If $iChr < 32 Then
        ConsoleWrite("Esc " & $iChr & " @ Pos:" & $i & @CRLF)
        ;More Processing
    EndIf
Next
Esc \1 @ Pos:7
Esc \0 @ Pos:14
Esc \3 @ Pos:21
Esc \31 @ Pos:34
Esc 0x01 @ Pos:7
Esc 0x00 @ Pos:14
Esc 0x03 @ Pos:21
Esc 0x1F @ Pos:34

 

Edited by Bilgus

Share this post


Link to post
Share on other sites

thanks @Bilgus, for yor replay, but maybe I'm not been very clear of what I'm trying to achieve (sorry),
using your way I can catch all "single" control chars found along the string and get the location where was found, but an ISO "escape sequence" is a 'complex' set of more chars starting with the two bytes chr(27) & '[' and followed by a variable number of chars depending on the particular sequence itself. Now I would like to catch "any" of the many possible 'escape' sequences, splitting them from the whole string in a way as from post #1.

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

Ah Sorry I misunderstood I thought you were looking for their start positions you want to split on them

Like maybe this

#include <Array.au3>


Local $yourstr = 'Regulartext' & CHR(0x1B) & '[40m' & CHR(0x1B) & '[2J' & CHR(0x1B) & '[2C' & CHR(0x1B) & '[0;34m³   ' & CHR(0x1B) & '[1;37m³   ' & CHR(0x1B) & '[0;34m³' & CHR(0x1B) & '[31mÜ' & CHR(0x1B) & '[33mßßß' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[31mßßßßßßßÛßßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[33mßßß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÛßßßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[33mßßß' & CHR(0x1B) & '[31mÛßßßßßßßßÛßßßßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[31mßÛßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[33mßß' & CHR(0x1B) & '[31mßßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mÜ' & CHR(0x1B) & '[34m³' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u   ³' & CHR(0x1B) & '[2;1H  ÄÄÄÄ' & CHR(0x1B) & '[1;37mÛ' & CHR(0x1B) & '[0;34mÄÄ' & CHR(0x1B) & '[31mÜ' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[34mÄ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[1;35mÜÛÛÛÛÛÛÛÛÜ ÛÛÛÛÛÛÛÛÛÛÛÜßÛÛÛÛÛÛÛß ÛÛÛÛÛÛÛ Ü²ÛÛÛÛ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÛÛÜßÛÛÛÛÛÛÛÛÛÜ' & CHR(0x1B) & '[0;33mßÜ' & CHR(0x1B) & '[34mÄ' & CHR(0x1B) & '[1;37mß' & CHR(0x1B) & '[0;34m³Ä' & CHR(0x1B) & '[3;1H  ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u³   ' & CHR(0x1B) & '[1;37mÛ  ' & CHR(0x1B) & '[0;33m±' & CHR(0x1B) & '[34m³' & CHR(0x1B) & '[1;35m²²ÛÛÛÛÛÛÛÛÛÝÛÛÛÛÛÛÛ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÛÛÛÛÛÛÜßÛÛÛß ÜÞÛÛÛ²ÛÛÛÞ²ÛÛÛ ²ÛÛÛÝÛÛÛÛÛÛÛÛÛÛÛÜ' & CHR(0x1B) & '[0;31mß' & CHR(0x1B) & '[sUnicorn' & CHR(0x1B) & '[u' & CHR(0x1B) & '[1;37mÛ' & CHR(0x1B) & '[4;1H  ' & CHR(0x1B) & '[0;34mÄÄ' & CHR(0x1B) & '[1mÄÄ' & CHR(0x1B) & '[37mÛ' & CHR(0x1B) & '[34mÄ' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[0;33m°' & CHR(0x1B) & '[1;30m°' & CHR(0x1B) & '[35m±Û²ÛÛÛßßÞÛß ²Û²Û' & CHR(0x1B) & '[30m°°±±² ' & CHR(0x1B) & '[35mÜ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÜÜÜÜÛß Ü²ÛÞßÞ²Û²Û Þ±²ÛÛ ±²ÛÛݲÛÛ' & CHR(0x1B) & '[30m°°±²' & CHR(0x1B) & '[35mÛÛÛÛÛ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[37mÛ' & CHR(0x1B) & '[30m±' & CHR(0x1B) & '[0;34mÄ' & CHR(0x1B) & '[5;1H  ' & CHR(0x1B) & '[1;37mÜ  ' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[37mÛ' & CHR(0x1B) & '[30m°±' & CHR(0x1B) & '[37mÜ' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[35m°±²²ÛBLAG  þß   ²±ÛÛ ÛÜ   ²Û' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÛÛß °±²²Û  Þ±²ÛÛ' & CHR(0x1B) & '[30m²' & CHR(0x1B) & '[35mÞ°±²Û ²²ÛÛݱ²Û' & CHR(0x1B) & '[30mÄ' & CHR(0x1B) & '[0;34m' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÄ' & CHR(0x1B) & '[1;35mÜÛÛÛÛßXKCD?' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[0;33mÜ' & CHR(0x1B) & ''
;------------------------------
$aMatch = StringRegExp($yourstr, "[^\x1B]++|\x1B\[[0-?]*[ -/]*[@-~]", 3)
ConsoleWrite("Err:" & @Error & @CRLF)
;---------------------------------------------------------
_ArrayDisplay($aMatch)

I grabbed that regex from your linked stackoverflow page you could just '|' other beginning sequences as well

but if you can link me a good file that fails i can build it up a bit more

Share this post


Link to post
Share on other sites

hi @Bilgus, thanks for your regexp  pattern,
at first glance seems to work... I will make some 'intensive' testing...
p.s. You say you grabbed that pattern from the link I posted on post #1, since there are more patterns in that page, isn't the pattern located a bit down (the penultimate) posted by the user "kfir" able to catch a wider range of ISO escape sequences ? (I just say this becouse I see that that pattern is a longer one, and so i suppose thath  it makes a more accurate search?  ... just a wild guess). Thanks again.


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

Idk the first one seems to match more but perhaps it has some non valid ones in the capture you can try this one it won't grab the non escape sequences but you can see if it gets any more than the first and let me know the format of the extras that are captured

(\e\[\??\d+[hl]|\e[=<>a-kzNM78]|\e[\(\)][a-b0-2]|\e\[\d{0,2}[ma-dgkjqi]|\e\[\d+;\d+[hfy]?|\e\[;?[hf]|\e#[3-68]|\e[01356]n|\eO[mlnp-z]?|\e/Z|\e\d+|\e\[\?\d;\d0c|\e\d;\dR)

also the first one can be rewritten as

[^\e]++|\e\[[0-?]*[ -/]*[@-~]

apparently \e is the escape character

Edited by Bilgus

Share this post


Link to post
Share on other sites

Ok So I found the list that other poster used to make his captures

I think this is about right

[^\e]++|\e\[?[!-\?]*[!-\/]*[0-~]*
#include <Array.au3>


Local $yourstr = "Regulartext" & CHR(0x1B) &  "[20h Set new line mode   LMN" & CHR(0x1B) &  "[?1h   Set cursor key to application   DECCKM none Set ANSI (versus VT52)  DECANM" & _
CHR(0x1B) &  "[?3h  Set number of columns to 132    DECCOLM" & CHR(0x1B) &  "[?4h   Set smooth scrolling    DECSCLM" & CHR(0x1B) &  "[?5h   Set reverse video on screen DECSCNM" & _
CHR(0x1B) &  "[?6h  Set origin to relative  DECOM" & CHR(0x1B) &  "[?7h Set auto-wrap mode  DECAWM" & CHR(0x1B) &  "[?8h    Set auto-repeat mode    DECARM" & _
CHR(0x1B) &  "[?9h  Set interlacing mode    DECINLM  " & CHR(0x1B) &  "[20l Set line feed mode  LMN" & CHR(0x1B) &  "[?1l   Set cursor key to cursor    DECCKM" & CHR(0x1B) &  _
"[?2l   Set VT52 (versus ANSI)  DECANM" & CHR(0x1B) &  "[?3l    Set number of columns to 80 DECCOLM" & CHR(0x1B) &  "[?4l   Set jump scrolling  DECSCLM" & CHR(0x1B) &  "[?5l   Set normal video on screen  DECSCNM" & _
CHR(0x1B) &  "[?6l  Set origin to absolute  DECOM" & CHR(0x1B) &  "[?7l Reset auto-wrap mode    DECAWM" & CHR(0x1B) &  "[?8l    Reset auto-repeat mode  DECARM" & CHR(0x1B) &  "[?9l    Reset interlacing mode  DECINLM  " & _
CHR(0x1B) &  "= Set alternate keypad mode   DECKPAM" & CHR(0x1B) &  ">  Set numeric keypad mode DECKPNM  " & CHR(0x1B) &  "(A   Set United Kingdom G0 character set setukg0" & CHR(0x1B) &  ")A Set United Kingdom G1 character set setukg1" & CHR(0x1B) &  "(B Set United States G0 character set  setusg0" & CHR(0x1B) &  ")B Set United States G1 character set  setusg1" & CHR(0x1B) &  "(0 Set G0 special chars. & line set    setspecg0" & CHR(0x1B) &  ")0   Set G1 special chars. & line set    setspecg1" & CHR(0x1B) &  "(1   Set G0 alternate character ROM  setaltg0" & CHR(0x1B) &  ")1    Set G1 alternate character ROM  setaltg1" & CHR(0x1B) &  "(2    Set G0 alt char ROM and spec. graphics  setaltspecg0" & CHR(0x1B) &  ")2    Set G1 alt char ROM and spec. graphics  setaltspecg1  " & CHR(0x1B) &  "N   Set single shift 2  SS2" & CHR(0x1B) &  "O  Set single shift 3  SS3  " & CHR(0x1B) &  "[m   Turn off character attributes   SGR0" & CHR(0x1B) &  "[0m   Turn off character attributes   SGR0" & CHR(0x1B) &  "[1m   Turn bold mode on   SGR1" & CHR(0x1B) &  "[2m   Turn low intensity mode on  SGR2" & CHR(0x1B) &  "[4m   Turn underline mode on  SGR4" & CHR(0x1B) &  "[5m   Turn blinking mode on   SGR5" & CHR(0x1B) &  "[7m   Turn reverse video on   SGR7" & CHR(0x1B) &  "[8m   Turn invisible text mode on SGR8  " & CHR(0x1B) &  "[Line;Liner Set top and bottom lines of a window    DECSTBM  " & CHR(0x1B) &  "[ValueA  Move cursor up n lines  CUU" & CHR(0x1B) &  "[ValueB    Move cursor down n lines    CUD" & CHR(0x1B) &  "[ValueC    Move cursor right n lines   CUF" & CHR(0x1B) &  "[ValueD    Move cursor left n lines    CUB" & CHR(0x1B) &  "[H Move cursor to upper left corner    cursorhome" & CHR(0x1B) &  "[;H Move cursor to upper left corner    cursorhome" & CHR(0x1B) &  "[Line;ColumnH   Move cursor to screen location v,h  CUP" & CHR(0x1B) &  "[f Move cursor to upper left corner    hvhome" & CHR(0x1B) &  "[;f Move cursor to upper left corner    hvhome" & CHR(0x1B) &  "[Line;Columnf   Move cursor to screen location v,h  CUP" & CHR(0x1B) &  "D  Move/scroll window up one line  IND" & CHR(0x1B) &  "M  Move/scroll window down one line    RI" & CHR(0x1B) &  "E   Move to next line   NEL" & CHR(0x1B) &  "7  Save cursor position and attributes DECSC" & CHR(0x1B) &  "8    Restore cursor position and attributes  DECSC  " & CHR(0x1B) &  "H  Set a tab at the current column HTS" & CHR(0x1B) &  "[g Clear a tab at the current column   TBC" & CHR(0x1B) &  "[0g    Clear a tab at the current column   TBC" & CHR(0x1B) &  "[3g    Clear all tabs  TBC  " & CHR(0x1B) &  "#3   Double-height letters, top half DECDHL"
Local $yourstr2 = CHR(0x1B) &  "#4  Double-height letters, bottom half  DECDHL" & CHR(0x1B) &  "#5  Single width, single height letters DECSWL" & CHR(0x1B) & _
"#6 Double width, single height letters DECDWL  " & CHR(0x1B) &  "[K    Clear line from cursor right    EL0" & CHR(0x1B) &  "[0K    Clear line from cursor right    EL0" & CHR(0x1B) &  "[1K    Clear line from cursor left EL1" & CHR(0x1B) &  "[2K    Clear entire line   EL2  " & CHR(0x1B) &  "[J   Clear screen from cursor down   ED0" & CHR(0x1B) &  "[0J Clear screen from cursor down ED0" & CHR(0x1B) & "[1J Clear screen from cursor up ED1" & _
 CHR(0x1B) &  "[2J Clear entire screen ED2  " & _
 CHR(0x1B) &  "5n Device status report DSR" & _
 CHR(0x1B) &  "0n Response: terminal is OK DSR" & _
 CHR(0x1B) &  "3n Response: terminal is not OK DSR  " & _
 CHR(0x1B) &  "6n Get cursor position DSR" & _
 CHR(0x1B) &  "Line;ColumnR Response: cursor is at v,h CPR  " & _
 CHR(0x1B) &  "[c Identify what terminal type DA" & _
 CHR(0x1B) &  "[0c Identify what terminal type (another) DA" & _
 CHR(0x1B) &  "[?1;Value0c Response: terminal type code n DA  " & _
 CHR(0x1B) &  "c Reset terminal to initial state RIS  " & _
 CHR(0x1B) &  "#8 Screen alignment display DECALN" & _
 CHR(0x1B) &  "[2;1y Confidence power up test DECTST" & _
 CHR(0x1B) &  "[2;2y Confidence loopback test DECTST" & _
 CHR(0x1B) &  "[2;9y Repeat power up test DECTST" & _
 CHR(0x1B) &  "[2;10y Repeat loopback test DECTST  " & _
 CHR(0x1B) &  "[0q Turn off all four leds DECLL0" & _
 CHR(0x1B) &  "[1q Turn on LED #1 DECLL1" & _
 CHR(0x1B) &  "[2q Turn on LED #2 DECLL2" & _
 CHR(0x1B) &  "[3q Turn on LED #3 DECLL3" & _
 CHR(0x1B) &  "[4q Turn on LED #4 DECLL4     Codes for use in VT52 compatibility mode" & _
 CHR(0x1B) &  "< Enter/exit ANSI mode (VT52) setansi  " & _
 CHR(0x1B) &  "= Enter alternate keypad mode altkeypad" & _
 CHR(0x1B) &  "> Exit alternate keypad mode numkeypad  " & _
 CHR(0x1B) &  "F Use special graphics character set setgr" & _
 CHR(0x1B) &  "G Use normal US/UK character set resetgr  " & _
 CHR(0x1B) &  "A Move cursor up one line cursorup" & _
 CHR(0x1B) &  "B Move cursor down one line cursordn" & _
 CHR(0x1B) &  "C Move cursor right one char cursorrt" & _
 CHR(0x1B) &  "D Move cursor left one char cursorlf" & _
 CHR(0x1B) &  "H Move cursor to upper left corner cursorhome" & _
 CHR(0x1B) &  "LineColumn Move cursor to v,h location cursorpos(v,h)" & _
 CHR(0x1B) &  "I Generate a reverse line-feed revindex  " & _
 CHR(0x1B) &  "K Erase to end of current line cleareol" & _
 CHR(0x1B) &  "J Erase to end of screen cleareos  " & _
 CHR(0x1B) &  "Z Identify what the terminal is ident" & _
 CHR(0x1B) &  "/Z Correct response to ident identresp       VT100 Special Key Codes These are sent from the terminal back to the computer when the particular key is pressed. Note that the numeric keypad keys send different codes in numeric mode than in alternate mode. See" & _
 CHR(0x1B) &  "ape codes above to change keypad mode.     Function Keys: " & _
 CHR(0x1B) &  "9 PF1" & _
 CHR(0x1B) &  "OQ PF2" & _
 CHR(0x1B) &  "OR PF3" & _
 CHR(0x1B) &  "OS PF4    Arrow Keys:    Reset Set up" & CHR(0x1B) &  "A" & CHR(0x1B) &  "OA down" & CHR(0x1B) &  "B" & CHR(0x1B) &  "OB right" & CHR(0x1B) &  "C" &  CHR(0x1B) &  "OC left" & CHR(0x1B) &  "D" & CHR(0x1B) &  "OD    Numeric Keypad Keys: " & _
 CHR(0x1B) &  "Op 0" & _
 CHR(0x1B) &  "Oq 1" & _
 CHR(0x1B) &  "Or 2" & _
 CHR(0x1B) & "Os 3" & _
 CHR(0x1B) & "Ot4" & _
 CHR(0x1B) & "Ou5" & _
 CHR(0x1B) & "Ov6" & _
 CHR(0x1B) & "Ow7" & _
 CHR(0x1B) & "Ox8" & _
 CHR(0x1B) & "Oy9" & _
 CHR(0x1B) & "Om-(minus)" & _
 CHR(0x1B) & "Ol,(comma)" & _
 CHR(0x1B) & "On.(period)" & _
 CHR(0x1B) & "OM^M    Printing: " & _
 CHR(0x1B) & "[iPrint ScreenPrint the current screen" & _
 CHR(0x1B) & "[1iPrint LinePrint the current line" & _
 CHR(0x1B) & "[4iStop Print LogDisable log" & _
 CHR(0x1B) & "[5iStart Print LogStart log; all received text is echoed to a printer"
;------------------------------
$aMatch = StringRegExp($yourstr & $yourstr2, "[^\e]++|\e\[?[!-\?]*[!-\/]*[0-~]*", 3)
ConsoleWrite("Err:" & @Error & @CRLF)
;---------------------------------------------------------
_ArrayDisplay($aMatch)

 

Share this post


Link to post
Share on other sites

Hi @Bilgus, thanks for your post,

From a quick test I see that main problem is that ANSI escape sequences tested here ends all with a white space, while usually those doesn't end with a white space, (and nor with a lowercase 'm' as we could expect). the normal text following the escape sequence is directly attached to the end of the escape sequence without any spaces.
I think that this is the main difficulty, that is, identify the escape sequence embedded in the "middle" of text where the start of the sequence is marked by the 2 bytes "esc" + "[" while the end of the sequence depends from the particular sequence itself...

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

Like I said If you can get me a sample file that shows exactly the conditions you need and where it fails I can create a regexp to match it but my output will only be as good as the input GIGO so to speak

Share this post


Link to post
Share on other sites

according to what is esemplified at this link, http://man7.org/linux/man-pages/man4/console_codes.4.html (see the ECMA-48 CSI sequences section) ansi iso escape sequences shuld conform to the following "rules":

  1. start with the so called CSI characters (Control Sequence Introducer) that are nothing more that the 2 codes chr(27) & chr(91) that are "esc["
  2. following there can be: none, one or more decimal numbers separated by semicolons
  3. and at the end an alphabetic letter that terminates the sequence. upper or lowercase make difference. this final letter determines the action that is to be taken. (white spaces within the escape sequence, that is between <esc>  and the last char, should be ignored)

 

ch3examp2a

in short that's all.

The wished regexp should capture any sequence responding to points 1, 2 and 3 and return a global match array where "normal text" is returned in an element of the array, then as soon as an escape sequence is catched, it should be stored in the next element of the array, and following normal text in the next element of the array and so on till the end of text.
so, for example the following text
 

"Hello, " & chr(27) & "[ 40;31mThis is red on black " & chr(27) & "[ 47;32m And this is green on white"

should be "splitted" into the following array elements for example
 

[0] Hello,
[1] esc]40;31m
[2] This is red on black
[3] esc]47;32m
[4] And this is green on white

Another similar(?) regexp should instead remove all escape sequences from the whole string, leaving and returning only the "normal" text cleaned.

... I don't know if this above is GIGO, but ansi escape sequences works more or less like that. Are there regular expressions capable to produce the above array and the cleaned string??
sorry if I was a bit wordy...  any help I will be very welcome. Thank you on advance.

Edited by Chimp
inserted image

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

Don't know if that's GIGO below but I believe it fits the specs:

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red on black " & _
            Chr(27) & "[ 47;32m And this is green on white" & _
            Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
            Chr(27) & "[1234567890123456798Kvftio this matches the definition, albeit probably invalid ANSI escape" & _
            Chr(27) & "[ 47;32 !!!! this is not an ANSI sequence"
Local $aRes = StringRegExp($s, "(?x)" & _
                            "(?(DEFINE) (?<ANSI_Escape> \[\s*\d+\s* (?:;\s*\d+\s*)* [[:alpha:]]) )" & _
                            "(?| \x1B (?&ANSI_Escape) | (?:[^\x1B] (?!(?&ANSI_Escape)))+ )", 3)
_ArrayDisplay($aRes, "Mixed results")

Local $sClean = StringRegExpReplace($s, "(?x) (\x1B \[\s*\d+\s* (?:;\s*\d+\s*)* [[:alpha:]])",  "")
MsgBox(0, "Pure text", $sClean)

The advantage of using DEFINE in such regexp is that you can very explicitely specify what's what and give it a name, using a construct equivalent to a procedural programming language subroutine. Using (?x) allows unsignificant whitespaces in the regexp to make it easier to read/dissect/understand.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

@Jchd have you seen that spec lol

I think this should do it

"\x9b|\e\[[:<=>?]?[\d;]*[\x20-/]?[@-~]|\cX|\ea|\x9d|\e]|\x9f|\e_|\x90|\eP|\x9e|\e^(?:[\x0f\x0e\t-\r\x20-0x7e]*|[\xA0-\xFE\t-\r\x20-0x7e]*)\cG|\x9c|\e\\|\cX|\ea|x98|\eX[^\x98\x9c]*?\cG|\x9c|\e\\|\cX|\ea|\e[\x20-/]*(?:[0-~]|\cX|\ea)"

@Chimp you just aren't going to get a regexp that will do everything you want use this gather the esc sequences run it again replace them with a easy to use sentinel

Split on that sentinel and rebuild from there

I take no credit for that above regexp I used the definitions here to build it

https://metacpan.org/source/JOSEF/Ecma48-Util-0.01/lib/Ecma48/Util.pm

Share this post


Link to post
Share on other sites

Something like this

#include <Array.au3>
#include <StringConstants.au3>

;........
Local $sRegEsc = "\x9b|\e\[[:<=>?]?[\d;]*[\x20-/]?[@-~]|\cX|\ea|\x9d|\e]|\x9f|\e_|\x90|\eP|\x9e|\e^(?:[\x0f\x0e\t-\r\x20-0x7e]*|[\xA0-\xFE\t-\r\x20-0x7e]*)\cG|\x9c|\e\\|\cX|\ea|x98|\eX[^\x98\x9c]*?\cG|\x9c|\e\\|\cX|\ea|\e[\x20-/]*(?:[0-~]|\cX|\ea)";;;

Local $aMatch = StringRegExp($yourstr, $sRegEsc, 3)
ConsoleWrite("Err:" & @error & @CRLF)

Local $sEscSentinel = "[#ESCSEQ#]"
Local $sStringRem = StringRegExpReplace($yourstr, $sRegEsc, "^###^" & $sEscSentinel & "^###^")
ConsoleWrite("Err:" & @error & @CRLF)

Local $aResult = StringSplit($sStringRem, "^###^", $STR_ENTIRESPLIT)
ConsoleWrite("Err:" & @error & @CRLF)

If IsArray($aMatch) And IsArray($aResult) Then
    Local $iEsc = 0
    For $i = 1 To $aResult[0]
        If $aResult[$i] == $sEscSentinel Then
            $aResult[$i] = $aMatch[$iEsc]
            $iEsc += 1
        EndIf
    Next
EndIf
;---------------------------------------------------------
_ArrayDisplay($aResult)

 

Share this post


Link to post
Share on other sites

thanks @jchd, your pattern is nearly perfect, just a little lack, it should also catch sequences where there is no parameters at all between csi (esc[) and the final character, or also if there is any combination of numbers and/or semicolons, since the absence of parameter, or numbers betwenn semicolons, is considered by ansi as the implicit value of 0. So, for example, parameter strings like the following should be considered valid escape sequences: esc[m or esc[;m or esc[3;;4m or esc[0;m or esc[;2;;3m and so on ... all valid.
Thanks a lot for your pattern


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

@Bilgus, thanks for your pattern, but testing it using the example string from @jchd post above,  it splits also or splits in a wrong way sequences that are not complete or containing spaces in between. perhaps, with a little calibration, the pattern by  jchd is OK and is able to correctly catch most of the sequences. Thank You anyway


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites
4 hours ago, Bilgus said:

@Jchd have you seen that spec lol

Oh yes, I know there's much more than what I examplified. I simply followed Chimp restricted spec, simply forgot to make the numeric part optional, a gross overview. It's been many blue moons that I've typed ANSI escapes by hart.

3 hours ago, Chimp said:

perhaps, with a little calibration, the pattern by  jchd is OK and is able to correctly catch most of the sequences.

Should do what you asked for:

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red on black " & _
            Chr(27) & "[ 47;32m And this is green on white" & _
            Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
            Chr(27) & "[1234567890123456798Kvftio this matches the definition, albeit probably invalid ANSI escape" & _
            Chr(27) & "[ 47;32 !!!! this is not an ANSI sequence" & _
            Chr(27) & "[ z but this one is OK and final."
Local $aRes = StringRegExp($s, "(?x)" & _
                            "(?(DEFINE) (?<ANSI_Escape> \[ (?:\s*\d*\s*;?)* [[:alpha:]]) )" & _
                            "(?| \x1B (?&ANSI_Escape) | (?:[^\x1B] (?!(?&ANSI_Escape)))+ )", 3)
_ArrayDisplay($aRes, "Mixed results")

Local $sClean = StringRegExpReplace($s, "(?x) (\x1B \[ (?:\s*\d*\s*;?)* [[:alpha:]])",  "")
MsgBox(0, "Pure text", $sClean)

 


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Thanks a lot @jchd, now it seems "perfect" .... or, at least, for the use that I have to do of it, it's more than OK.
(p.s. ...I'm playing around a simple ansi file viewer)
Many thanks to everyone


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

... one more question about this please,

how patterns by @jchd can be modified so that, in addition to capturing escape sequences, also some single control characters can be captured, say for example chr (10) chr (13) and chr (30)... (maybe creating a group of control codes that can be easily modified by adding or deleting some of them into the pattern?).  so for example the following string

"Hello, " & Chr(27) & "[ 40;31mThis is red on black " & chr(13) & "today is Tuesday" & chr(10) & " ... and so on"

could be transformed into the following array

[0] Hello, 
[1] esc[ 40;31m
[2] This is red on black 
[3] @CR
[4] today is Tuesday
[5] @LF
[6] ... and so on


thanks for any help

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

Instead of walking into hazardous wild ways I would personally use a 2nd step as a workaround  :idiot:
(BTW much easier to manage, probably)

#Include <Array.au3>

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red on black " & chr(13) & _
            Chr(27) & "[ 47;32m And this is green on white" & chr(10) & _
            Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
            Chr(27) & "[1234567890123456798Kvftio this matches the definition, albeit probably invalid ANSI escape" & _
            Chr(27) & "[ 47;32 !!!! this is not an ANSI sequence" & _
            Chr(27) & "[ z but this one is OK and final."
Local $aRes = StringRegExp($s, "(?x)" & _
                            "(?(DEFINE) (?<ANSI_Escape> \[ (?:\s*\d*\s*;?)* [[:alpha:]]) )" & _
                            "(?| \x1B (?&ANSI_Escape) | (?:[^\x1B] (?!(?&ANSI_Escape)))+ )", 3)
;_ArrayDisplay($aRes, "Mixed results")

$s2 = _ArrayToString($aRes, "*")
$s2 = StringRegExpReplace($s2, '(?=[\r\n])', "*")
$aRes2 = StringSplit($s2, '*', 2)
_ArrayDisplay($aRes2, "Mixed results2")

;check
;Msgbox(0,"", "chr(" & asc($aRes2[3]) & ")" &@cr& "chr(" & asc($aRes2[6]) & ")" )

 

Share this post


Link to post
Share on other sites

thanks @mikell, I was thinking on how to modify that good pattern by @jchd, but you are right, this way  is easyer to manage and it do well it's job, thanks for this 2 steps version.
p.s. I see that I can add or delete other control codes to be catched (example chr(30)) by changing the list in the pattern and using the hex notation to specify exactly the wanted control codes, also, I've used chr(0) as the replace char so I can safely use the asterisc within the main text. Is ok like this or there is a better way?
Thanks again for the help.

#include <Array.au3>

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red* on black " & Chr(30) & _
        Chr(27) & "[ 47;32m And this is green* on white" & Chr(10) & _
        Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
        Chr(27) & "[1234567890123456798Kvftio this matches the definition, albeit probably invalid ANSI escape" & _
        Chr(27) & "[ 47;32 !!!! this is not an ANSI sequence" & _
        Chr(27) & "[ z but this one is OK and final."
Local $aRes = StringRegExp($s, "(?x)" & _
        "(?(DEFINE) (?<ANSI_Escape> \[ (?:\s*\d*\s*;?)* [[:alpha:]]) )" & _
        "(?| \x1B (?&ANSI_Escape) | (?:[^\x1B] (?!(?&ANSI_Escape)))+ )", 3)
; _ArrayDisplay($aRes, "Mixed results")

$s2 = _ArrayToString($aRes, Chr(0))
$s2 = StringRegExpReplace($s2, '(?=[\x1E\x0A])', Chr(0))
$aRes2 = StringSplit($s2, Chr(0), 2)
_ArrayDisplay($aRes2, "Mixed results2")

;check
MsgBox(0, "", "chr(" & Asc($aRes2[3]) & ")" & @CR & "chr(" & Asc($aRes2[6]) & ")")

 

 

Also, another question please,  how can I also clean the string from those control code with another pattern??

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By cruisepandey
      Hi, 
      I have a string like this : 
      Global $Msga = "urrent directory is /send.  (Submission of file with log number 29381077284 is confirmed)";
      I want to extract the number 29381077284  from the string. I did StringSplit to split based on "(" and then use space to reach there, But it's not a good choice. 
      Can anyone help me with regular expression to find the number from String using AutoIT. TIA
    • By Chimp
      Hello
      if I have a string like in the example below,
      is there a regular expression that can surround any "string" (and only strings) within quotes?.
      The whole input string is a "constructor" to populate an array so even if an element contains more words (a phrase) it should be considered as a single word (Elton John should be considered a single word and as that quoted as "Elton John")
      for example
      the following string
      [[Elton John,Peter,Sally,123],[1 one 1,2,3,4 four 4]] should be transformed to this other string
      [["Elton John","Peter","Sally",123],["1 one 1",2,3,"4 four 4"]] Thanks for your help
      Here a small script to use as "guinea pig"
      #include <Array.au3> Local $aArray = [["Elton John", "Peter", "Sally", 123],["one 1", 2, 3, "4 four 4"]] MsgBox(0, "Result", _Array2Json($aArray)) Func _Array2Json($aArray) If (Not IsArray($aArray)) Or (UBound($aArray, 0) > 2) Then Return SetError(1, 0, '') Local $sOpening, $sClosing If UBound($aArray, 0) = 1 Then $sOpening = '[' $sClosing = ']' Else $sOpening = '[[' $sClosing = ']]' EndIf $sOutpt = $sOpening & _ArrayToString($aArray, ",", -1, -1, "],[") & $sClosing ; $sOutpt = ???? how to quote strings ???? Return $sOutpt EndFunc ;==>_Array2Json  
    • By genius257
      Inspired by PHP's preg_split.
      Split string by a regular expression.
      Also supports the same flags as the PHP equivalent.
      v1.0.1
       
      Example:
      #include "StringRegExpSplit.au3" StringRegExpSplit('splitCamelCaseWords', '(?<=\w)(?=[A-Z])') ; ['split', 'Camel', 'Case', 'Words']  
    • By nend
      This is a program that I made to help my self learn better regular expressions.
      There are a lot of other programs/website with the similar functions.
      But the main advantage of this program is that you don't have to click a button after every changes.
      The program detected changes and react on it.
      Function:
      Match Match of arrays Match and replace Load source data from website Load source data from a website with GET/POST Load text data from file Clear fields Export and Import settings (you can finish the expression a other time, just export/import it) Cheat sheet Generate AutoIt code example code The source code is not difficult and I think most user will understand it.
      In the zip file there is a export files (reg back example), you can drag and drop this files on the gui to import it.
      Download Regex Toolkit Regex toolkit.zip  (Sourcode, example and compiled exe file)
      EDIT: Updated to version V1.2.0
      Changes are:
      Expand and collapse of the cheat sheet (Thanks to Melba23 for the Guiextender UDF) Usefull regular expressions websites links included in the program Text data update time EDIT: Updated to version V1.3.0
      Changes are:
       Automatic generate AutoIt code  Icons on the tab  Few minor bug fixes EDIT: Updated to version V1.4.0
      Changes are:
      Link to AutoIt regex helpfile If the regular expression has a error than the text becomes red Option Offset with Match and array of Matches Option Count with Match and replace Some small minor bug fixed EDIT: Updated to version V1.4.1
      Changes are:
      Small bug in "create AutoIt" code fixed EDIT: Updated to version V1.4.2
      Changes are:
      Small bug in "create AutoIt" code fixed Bug with website data  fixed 
      Regex toolkit.zip  (Sourcode, example and compiled exe file)
    • By UEZ
      Hi,
      here a little tool to create ISO files from default ISO 9660 (2048 bytes/sector) CD or DVD format (no audio cd and BD support yet!)

      Source is too huge for code box -> Look here to have a look to the source code
      Additional credits to:
      Ward for MD5 checksum / MemoryDLL routines
      Harald Vistnes for cd2iso used in v1 and v2
      Yashied for WinAPIEx.au3
      wolf9228 to play wave from memory
      AutoItObject Team
      Download (purely written with AutoIt): ISO Creator v1.16 build 2015-07-13 beta.7z (1110 downloads previously)
      Thanks to smashly for pointing me to right direction
      The development of v1 and v2 is discontinued!
      Download v1: ISO Creator v1.0.0 build 2011-08-03 beta v1.7z (195 downloads previously)
      Download v2 (everything is called directly from memory): ISO Creator v1.0.0 build 2011-08-03 beta v2.7z (117 downloads previously)
      Thanks to smartee for the DLL version (experimental) of cd2iso!
      v1 is using cd2iso.exe to create the ISO
      v2 is using cd2iso.dll which was created by smartee.
      For compiled v1, v2 and pure AutoIt versions only (x86) visit (ISO Creator Exe only): 4shared.com or MediaFire
      You can call ISO Creator.exe also with command line parameters: ISO Creator.exe -s [source cd/dvd drive] -d [filename] (-md5) (-aem) (-exit)
      -s and -d are mendatory if called from command line!
      Tested on Win7 x64.
      If you find any bug please report here!
      Many thanks to smartee and smashly for their efforts on this project!
      Br,
      UEZ
      Change Log:
       
       
×
×
  • Create New...