Jump to content

regexp and ANSI escape sequences


Recommended Posts

regex and iso escape sequences

Hi, I would like to extract all ISO escape squences embedded in a string and separate them from the rest of the string, still keeping the information about their position, so that, for exemple, a string like this one (or even more complex):

(the string could start with normal text or iso sequences)
 

'\u001B[4mUnicorn\u001B[0m'

should be 'transformed' in an array like this

$a[0] = '\u001B[4m'   ; first iso escape sequence
$a[1] = 'Unicorn'     ; normal text
$a[2] = '\u001B[4m'   ; second iso escape sequence
... and so on

(note: the above escape sequence has 'control codes' marked as "\u001B' for the asc "esc" char for exemple and a similar notation is also used for other control chars, but in the real string to be parsed those control chars  are embedded  as a single byte with a value from 01 to 31). at this link (http://artscene.textfiles.com/ansi/) there are many example of real ANSI text files .

searching on the web I've found some possible solutions that make use of regexp to achieve similar purpose, and above some others, the regexp pattern posted in the following link by kfir (https://stackoverflow.com/questions/14693701/how-can-i-remove-the-ansi-escape-sequences-from-a-string-in-python) seems to be able to catch a wider range of ISO escape sequences (not only color sequences), but my lack of skills on regexp, prevents me from evaluating and testing such patterns
I would be very grateful if some regexp guru could come to my rescue...

thanks everybody  for reading...

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

I discovered that while you can cajole regexp into giving you a position of arbitrary escape sequences it is way slower than just walking the string using stringinstr like on the order of 2:1 or 3:1

better off doing this 

Local $yourstr = "Mytest" & Chr(1) & "String" & Chr(0) & "Mytest" & Chr(3) & "String" & "Mytest" & Chr(31) & "String"
;------------------------------
Local Const $STR_CASESENSE = 0
Local $sMatch
For $i = 0 To 31
    $sMatch &= Chr($i)
Next

For $i = 1 To StringLen($yourstr)
    $sChr = StringMid($yourstr, $i, 1)
    If StringInStr($sMatch, $sChr, $STR_CASESENSE) > 0 Then ;(note the casesense)
        ConsoleWrite("Esc \" & Asc($sChr) & " @ Pos:" & $i & @CRLF)
        ;More Processing
    EndIf
Next
;---------------------------------------------------------
;OR Faster

For $i = 1 To BinaryLen($yourstr)
    $iChr = BinaryMid($yourstr, $i, 1)

    If $iChr < 32 Then
        ConsoleWrite("Esc " & $iChr & " @ Pos:" & $i & @CRLF)
        ;More Processing
    EndIf
Next
Esc \1 @ Pos:7
Esc \0 @ Pos:14
Esc \3 @ Pos:21
Esc \31 @ Pos:34
Esc 0x01 @ Pos:7
Esc 0x00 @ Pos:14
Esc 0x03 @ Pos:21
Esc 0x1F @ Pos:34

 

Edited by Bilgus
Link to comment
Share on other sites

thanks @Bilgus, for yor replay, but maybe I'm not been very clear of what I'm trying to achieve (sorry),
using your way I can catch all "single" control chars found along the string and get the location where was found, but an ISO "escape sequence" is a 'complex' set of more chars starting with the two bytes chr(27) & '[' and followed by a variable number of chars depending on the particular sequence itself. Now I would like to catch "any" of the many possible 'escape' sequences, splitting them from the whole string in a way as from post #1.

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Ah Sorry I misunderstood I thought you were looking for their start positions you want to split on them

Like maybe this

#include <Array.au3>


Local $yourstr = 'Regulartext' & CHR(0x1B) & '[40m' & CHR(0x1B) & '[2J' & CHR(0x1B) & '[2C' & CHR(0x1B) & '[0;34m³   ' & CHR(0x1B) & '[1;37m³   ' & CHR(0x1B) & '[0;34m³' & CHR(0x1B) & '[31mÜ' & CHR(0x1B) & '[33mßßß' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[31mßßßßßßßÛßßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[33mßßß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÛßßßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[33mßßß' & CHR(0x1B) & '[31mÛßßßßßßßßÛßßßßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[31mßÛßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mß' & CHR(0x1B) & '[33mßß' & CHR(0x1B) & '[31mßßß' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[31mÜ' & CHR(0x1B) & '[34m³' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u   ³' & CHR(0x1B) & '[2;1H  ÄÄÄÄ' & CHR(0x1B) & '[1;37mÛ' & CHR(0x1B) & '[0;34mÄÄ' & CHR(0x1B) & '[31mÜ' & CHR(0x1B) & '[33mß' & CHR(0x1B) & '[34mÄ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[1;35mÜÛÛÛÛÛÛÛÛÜ ÛÛÛÛÛÛÛÛÛÛÛÜßÛÛÛÛÛÛÛß ÛÛÛÛÛÛÛ Ü²ÛÛÛÛ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÛÛÜßÛÛÛÛÛÛÛÛÛÜ' & CHR(0x1B) & '[0;33mßÜ' & CHR(0x1B) & '[34mÄ' & CHR(0x1B) & '[1;37mß' & CHR(0x1B) & '[0;34m³Ä' & CHR(0x1B) & '[3;1H  ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u³   ' & CHR(0x1B) & '[1;37mÛ  ' & CHR(0x1B) & '[0;33m±' & CHR(0x1B) & '[34m³' & CHR(0x1B) & '[1;35m²²ÛÛÛÛÛÛÛÛÛÝÛÛÛÛÛÛÛ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÛÛÛÛÛÛÜßÛÛÛß ÜÞÛÛÛ²ÛÛÛÞ²ÛÛÛ ²ÛÛÛÝÛÛÛÛÛÛÛÛÛÛÛÜ' & CHR(0x1B) & '[0;31mß' & CHR(0x1B) & '[sUnicorn' & CHR(0x1B) & '[u' & CHR(0x1B) & '[1;37mÛ' & CHR(0x1B) & '[4;1H  ' & CHR(0x1B) & '[0;34mÄÄ' & CHR(0x1B) & '[1mÄÄ' & CHR(0x1B) & '[37mÛ' & CHR(0x1B) & '[34mÄ' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[0;33m°' & CHR(0x1B) & '[1;30m°' & CHR(0x1B) & '[35m±Û²ÛÛÛßßÞÛß ²Û²Û' & CHR(0x1B) & '[30m°°±±² ' & CHR(0x1B) & '[35mÜ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÜÜÜÜÛß Ü²ÛÞßÞ²Û²Û Þ±²ÛÛ ±²ÛÛݲÛÛ' & CHR(0x1B) & '[30m°°±²' & CHR(0x1B) & '[35mÛÛÛÛÛ' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[37mÛ' & CHR(0x1B) & '[30m±' & CHR(0x1B) & '[0;34mÄ' & CHR(0x1B) & '[5;1H  ' & CHR(0x1B) & '[1;37mÜ  ' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[s' & CHR(0x1B) & '[u' & CHR(0x1B) & '[37mÛ' & CHR(0x1B) & '[30m°±' & CHR(0x1B) & '[37mÜ' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[35m°±²²ÛBLAG  þß   ²±ÛÛ ÛÜ   ²Û' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÛÛß °±²²Û  Þ±²ÛÛ' & CHR(0x1B) & '[30m²' & CHR(0x1B) & '[35mÞ°±²Û ²²ÛÛݱ²Û' & CHR(0x1B) & '[30mÄ' & CHR(0x1B) & '[0;34m' & CHR(0x1B) & '[s' & CHR(0x1B) & '[uÄ' & CHR(0x1B) & '[1;35mÜÛÛÛÛßXKCD?' & CHR(0x1B) & '[30m°' & CHR(0x1B) & '[0;33mÜ' & CHR(0x1B) & ''
;------------------------------
$aMatch = StringRegExp($yourstr, "[^\x1B]++|\x1B\[[0-?]*[ -/]*[@-~]", 3)
ConsoleWrite("Err:" & @Error & @CRLF)
;---------------------------------------------------------
_ArrayDisplay($aMatch)

I grabbed that regex from your linked stackoverflow page you could just '|' other beginning sequences as well

but if you can link me a good file that fails i can build it up a bit more

Link to comment
Share on other sites

hi @Bilgus, thanks for your regexp  pattern,
at first glance seems to work... I will make some 'intensive' testing...
p.s. You say you grabbed that pattern from the link I posted on post #1, since there are more patterns in that page, isn't the pattern located a bit down (the penultimate) posted by the user "kfir" able to catch a wider range of ISO escape sequences ? (I just say this becouse I see that that pattern is a longer one, and so i suppose thath  it makes a more accurate search?  ... just a wild guess). Thanks again.

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Idk the first one seems to match more but perhaps it has some non valid ones in the capture you can try this one it won't grab the non escape sequences but you can see if it gets any more than the first and let me know the format of the extras that are captured

(\e\[\??\d+[hl]|\e[=<>a-kzNM78]|\e[\(\)][a-b0-2]|\e\[\d{0,2}[ma-dgkjqi]|\e\[\d+;\d+[hfy]?|\e\[;?[hf]|\e#[3-68]|\e[01356]n|\eO[mlnp-z]?|\e/Z|\e\d+|\e\[\?\d;\d0c|\e\d;\dR)

also the first one can be rewritten as

[^\e]++|\e\[[0-?]*[ -/]*[@-~]

apparently \e is the escape character

Edited by Bilgus
Link to comment
Share on other sites

Ok So I found the list that other poster used to make his captures

I think this is about right

[^\e]++|\e\[?[!-\?]*[!-\/]*[0-~]*
#include <Array.au3>


Local $yourstr = "Regulartext" & CHR(0x1B) &  "[20h Set new line mode   LMN" & CHR(0x1B) &  "[?1h   Set cursor key to application   DECCKM none Set ANSI (versus VT52)  DECANM" & _
CHR(0x1B) &  "[?3h  Set number of columns to 132    DECCOLM" & CHR(0x1B) &  "[?4h   Set smooth scrolling    DECSCLM" & CHR(0x1B) &  "[?5h   Set reverse video on screen DECSCNM" & _
CHR(0x1B) &  "[?6h  Set origin to relative  DECOM" & CHR(0x1B) &  "[?7h Set auto-wrap mode  DECAWM" & CHR(0x1B) &  "[?8h    Set auto-repeat mode    DECARM" & _
CHR(0x1B) &  "[?9h  Set interlacing mode    DECINLM  " & CHR(0x1B) &  "[20l Set line feed mode  LMN" & CHR(0x1B) &  "[?1l   Set cursor key to cursor    DECCKM" & CHR(0x1B) &  _
"[?2l   Set VT52 (versus ANSI)  DECANM" & CHR(0x1B) &  "[?3l    Set number of columns to 80 DECCOLM" & CHR(0x1B) &  "[?4l   Set jump scrolling  DECSCLM" & CHR(0x1B) &  "[?5l   Set normal video on screen  DECSCNM" & _
CHR(0x1B) &  "[?6l  Set origin to absolute  DECOM" & CHR(0x1B) &  "[?7l Reset auto-wrap mode    DECAWM" & CHR(0x1B) &  "[?8l    Reset auto-repeat mode  DECARM" & CHR(0x1B) &  "[?9l    Reset interlacing mode  DECINLM  " & _
CHR(0x1B) &  "= Set alternate keypad mode   DECKPAM" & CHR(0x1B) &  ">  Set numeric keypad mode DECKPNM  " & CHR(0x1B) &  "(A   Set United Kingdom G0 character set setukg0" & CHR(0x1B) &  ")A Set United Kingdom G1 character set setukg1" & CHR(0x1B) &  "(B Set United States G0 character set  setusg0" & CHR(0x1B) &  ")B Set United States G1 character set  setusg1" & CHR(0x1B) &  "(0 Set G0 special chars. & line set    setspecg0" & CHR(0x1B) &  ")0   Set G1 special chars. & line set    setspecg1" & CHR(0x1B) &  "(1   Set G0 alternate character ROM  setaltg0" & CHR(0x1B) &  ")1    Set G1 alternate character ROM  setaltg1" & CHR(0x1B) &  "(2    Set G0 alt char ROM and spec. graphics  setaltspecg0" & CHR(0x1B) &  ")2    Set G1 alt char ROM and spec. graphics  setaltspecg1  " & CHR(0x1B) &  "N   Set single shift 2  SS2" & CHR(0x1B) &  "O  Set single shift 3  SS3  " & CHR(0x1B) &  "[m   Turn off character attributes   SGR0" & CHR(0x1B) &  "[0m   Turn off character attributes   SGR0" & CHR(0x1B) &  "[1m   Turn bold mode on   SGR1" & CHR(0x1B) &  "[2m   Turn low intensity mode on  SGR2" & CHR(0x1B) &  "[4m   Turn underline mode on  SGR4" & CHR(0x1B) &  "[5m   Turn blinking mode on   SGR5" & CHR(0x1B) &  "[7m   Turn reverse video on   SGR7" & CHR(0x1B) &  "[8m   Turn invisible text mode on SGR8  " & CHR(0x1B) &  "[Line;Liner Set top and bottom lines of a window    DECSTBM  " & CHR(0x1B) &  "[ValueA  Move cursor up n lines  CUU" & CHR(0x1B) &  "[ValueB    Move cursor down n lines    CUD" & CHR(0x1B) &  "[ValueC    Move cursor right n lines   CUF" & CHR(0x1B) &  "[ValueD    Move cursor left n lines    CUB" & CHR(0x1B) &  "[H Move cursor to upper left corner    cursorhome" & CHR(0x1B) &  "[;H Move cursor to upper left corner    cursorhome" & CHR(0x1B) &  "[Line;ColumnH   Move cursor to screen location v,h  CUP" & CHR(0x1B) &  "[f Move cursor to upper left corner    hvhome" & CHR(0x1B) &  "[;f Move cursor to upper left corner    hvhome" & CHR(0x1B) &  "[Line;Columnf   Move cursor to screen location v,h  CUP" & CHR(0x1B) &  "D  Move/scroll window up one line  IND" & CHR(0x1B) &  "M  Move/scroll window down one line    RI" & CHR(0x1B) &  "E   Move to next line   NEL" & CHR(0x1B) &  "7  Save cursor position and attributes DECSC" & CHR(0x1B) &  "8    Restore cursor position and attributes  DECSC  " & CHR(0x1B) &  "H  Set a tab at the current column HTS" & CHR(0x1B) &  "[g Clear a tab at the current column   TBC" & CHR(0x1B) &  "[0g    Clear a tab at the current column   TBC" & CHR(0x1B) &  "[3g    Clear all tabs  TBC  " & CHR(0x1B) &  "#3   Double-height letters, top half DECDHL"
Local $yourstr2 = CHR(0x1B) &  "#4  Double-height letters, bottom half  DECDHL" & CHR(0x1B) &  "#5  Single width, single height letters DECSWL" & CHR(0x1B) & _
"#6 Double width, single height letters DECDWL  " & CHR(0x1B) &  "[K    Clear line from cursor right    EL0" & CHR(0x1B) &  "[0K    Clear line from cursor right    EL0" & CHR(0x1B) &  "[1K    Clear line from cursor left EL1" & CHR(0x1B) &  "[2K    Clear entire line   EL2  " & CHR(0x1B) &  "[J   Clear screen from cursor down   ED0" & CHR(0x1B) &  "[0J Clear screen from cursor down ED0" & CHR(0x1B) & "[1J Clear screen from cursor up ED1" & _
 CHR(0x1B) &  "[2J Clear entire screen ED2  " & _
 CHR(0x1B) &  "5n Device status report DSR" & _
 CHR(0x1B) &  "0n Response: terminal is OK DSR" & _
 CHR(0x1B) &  "3n Response: terminal is not OK DSR  " & _
 CHR(0x1B) &  "6n Get cursor position DSR" & _
 CHR(0x1B) &  "Line;ColumnR Response: cursor is at v,h CPR  " & _
 CHR(0x1B) &  "[c Identify what terminal type DA" & _
 CHR(0x1B) &  "[0c Identify what terminal type (another) DA" & _
 CHR(0x1B) &  "[?1;Value0c Response: terminal type code n DA  " & _
 CHR(0x1B) &  "c Reset terminal to initial state RIS  " & _
 CHR(0x1B) &  "#8 Screen alignment display DECALN" & _
 CHR(0x1B) &  "[2;1y Confidence power up test DECTST" & _
 CHR(0x1B) &  "[2;2y Confidence loopback test DECTST" & _
 CHR(0x1B) &  "[2;9y Repeat power up test DECTST" & _
 CHR(0x1B) &  "[2;10y Repeat loopback test DECTST  " & _
 CHR(0x1B) &  "[0q Turn off all four leds DECLL0" & _
 CHR(0x1B) &  "[1q Turn on LED #1 DECLL1" & _
 CHR(0x1B) &  "[2q Turn on LED #2 DECLL2" & _
 CHR(0x1B) &  "[3q Turn on LED #3 DECLL3" & _
 CHR(0x1B) &  "[4q Turn on LED #4 DECLL4     Codes for use in VT52 compatibility mode" & _
 CHR(0x1B) &  "< Enter/exit ANSI mode (VT52) setansi  " & _
 CHR(0x1B) &  "= Enter alternate keypad mode altkeypad" & _
 CHR(0x1B) &  "> Exit alternate keypad mode numkeypad  " & _
 CHR(0x1B) &  "F Use special graphics character set setgr" & _
 CHR(0x1B) &  "G Use normal US/UK character set resetgr  " & _
 CHR(0x1B) &  "A Move cursor up one line cursorup" & _
 CHR(0x1B) &  "B Move cursor down one line cursordn" & _
 CHR(0x1B) &  "C Move cursor right one char cursorrt" & _
 CHR(0x1B) &  "D Move cursor left one char cursorlf" & _
 CHR(0x1B) &  "H Move cursor to upper left corner cursorhome" & _
 CHR(0x1B) &  "LineColumn Move cursor to v,h location cursorpos(v,h)" & _
 CHR(0x1B) &  "I Generate a reverse line-feed revindex  " & _
 CHR(0x1B) &  "K Erase to end of current line cleareol" & _
 CHR(0x1B) &  "J Erase to end of screen cleareos  " & _
 CHR(0x1B) &  "Z Identify what the terminal is ident" & _
 CHR(0x1B) &  "/Z Correct response to ident identresp       VT100 Special Key Codes These are sent from the terminal back to the computer when the particular key is pressed. Note that the numeric keypad keys send different codes in numeric mode than in alternate mode. See" & _
 CHR(0x1B) &  "ape codes above to change keypad mode.     Function Keys: " & _
 CHR(0x1B) &  "9 PF1" & _
 CHR(0x1B) &  "OQ PF2" & _
 CHR(0x1B) &  "OR PF3" & _
 CHR(0x1B) &  "OS PF4    Arrow Keys:    Reset Set up" & CHR(0x1B) &  "A" & CHR(0x1B) &  "OA down" & CHR(0x1B) &  "B" & CHR(0x1B) &  "OB right" & CHR(0x1B) &  "C" &  CHR(0x1B) &  "OC left" & CHR(0x1B) &  "D" & CHR(0x1B) &  "OD    Numeric Keypad Keys: " & _
 CHR(0x1B) &  "Op 0" & _
 CHR(0x1B) &  "Oq 1" & _
 CHR(0x1B) &  "Or 2" & _
 CHR(0x1B) & "Os 3" & _
 CHR(0x1B) & "Ot4" & _
 CHR(0x1B) & "Ou5" & _
 CHR(0x1B) & "Ov6" & _
 CHR(0x1B) & "Ow7" & _
 CHR(0x1B) & "Ox8" & _
 CHR(0x1B) & "Oy9" & _
 CHR(0x1B) & "Om-(minus)" & _
 CHR(0x1B) & "Ol,(comma)" & _
 CHR(0x1B) & "On.(period)" & _
 CHR(0x1B) & "OM^M    Printing: " & _
 CHR(0x1B) & "[iPrint ScreenPrint the current screen" & _
 CHR(0x1B) & "[1iPrint LinePrint the current line" & _
 CHR(0x1B) & "[4iStop Print LogDisable log" & _
 CHR(0x1B) & "[5iStart Print LogStart log; all received text is echoed to a printer"
;------------------------------
$aMatch = StringRegExp($yourstr & $yourstr2, "[^\e]++|\e\[?[!-\?]*[!-\/]*[0-~]*", 3)
ConsoleWrite("Err:" & @Error & @CRLF)
;---------------------------------------------------------
_ArrayDisplay($aMatch)

 

Link to comment
Share on other sites

Hi @Bilgus, thanks for your post,

From a quick test I see that main problem is that ANSI escape sequences tested here ends all with a white space, while usually those doesn't end with a white space, (and nor with a lowercase 'm' as we could expect). the normal text following the escape sequence is directly attached to the end of the escape sequence without any spaces.
I think that this is the main difficulty, that is, identify the escape sequence embedded in the "middle" of text where the start of the sequence is marked by the 2 bytes "esc" + "[" while the end of the sequence depends from the particular sequence itself...

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

according to what is esemplified at this link, http://man7.org/linux/man-pages/man4/console_codes.4.html (see the ECMA-48 CSI sequences section) ansi iso escape sequences shuld conform to the following "rules":

  1. start with the so called CSI characters (Control Sequence Introducer) that are nothing more that the 2 codes chr(27) & chr(91) that are "esc["
  2. following there can be: none, one or more decimal numbers separated by semicolons
  3. and at the end an alphabetic letter that terminates the sequence. upper or lowercase make difference. this final letter determines the action that is to be taken. (white spaces within the escape sequence, that is between <esc>  and the last char, should be ignored)

 

ch3examp2a

in short that's all.

The wished regexp should capture any sequence responding to points 1, 2 and 3 and return a global match array where "normal text" is returned in an element of the array, then as soon as an escape sequence is catched, it should be stored in the next element of the array, and following normal text in the next element of the array and so on till the end of text.
so, for example the following text
 

"Hello, " & chr(27) & "[ 40;31mThis is red on black " & chr(27) & "[ 47;32m And this is green on white"

should be "splitted" into the following array elements for example
 

[0] Hello,
[1] esc]40;31m
[2] This is red on black
[3] esc]47;32m
[4] And this is green on white

Another similar(?) regexp should instead remove all escape sequences from the whole string, leaving and returning only the "normal" text cleaned.

... I don't know if this above is GIGO, but ansi escape sequences works more or less like that. Are there regular expressions capable to produce the above array and the cleaned string??
sorry if I was a bit wordy...  any help I will be very welcome. Thank you on advance.

Edited by Chimp
inserted image

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Don't know if that's GIGO below but I believe it fits the specs:

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red on black " & _
            Chr(27) & "[ 47;32m And this is green on white" & _
            Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
            Chr(27) & "[1234567890123456798Kvftio this matches the definition, albeit probably invalid ANSI escape" & _
            Chr(27) & "[ 47;32 !!!! this is not an ANSI sequence"
Local $aRes = StringRegExp($s, "(?x)" & _
                            "(?(DEFINE) (?<ANSI_Escape> \[\s*\d+\s* (?:;\s*\d+\s*)* [[:alpha:]]) )" & _
                            "(?| \x1B (?&ANSI_Escape) | (?:[^\x1B] (?!(?&ANSI_Escape)))+ )", 3)
_ArrayDisplay($aRes, "Mixed results")

Local $sClean = StringRegExpReplace($s, "(?x) (\x1B \[\s*\d+\s* (?:;\s*\d+\s*)* [[:alpha:]])",  "")
MsgBox(0, "Pure text", $sClean)

The advantage of using DEFINE in such regexp is that you can very explicitely specify what's what and give it a name, using a construct equivalent to a procedural programming language subroutine. Using (?x) allows unsignificant whitespaces in the regexp to make it easier to read/dissect/understand.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@Jchd have you seen that spec lol

I think this should do it

"\x9b|\e\[[:<=>?]?[\d;]*[\x20-/]?[@-~]|\cX|\ea|\x9d|\e]|\x9f|\e_|\x90|\eP|\x9e|\e^(?:[\x0f\x0e\t-\r\x20-0x7e]*|[\xA0-\xFE\t-\r\x20-0x7e]*)\cG|\x9c|\e\\|\cX|\ea|x98|\eX[^\x98\x9c]*?\cG|\x9c|\e\\|\cX|\ea|\e[\x20-/]*(?:[0-~]|\cX|\ea)"

@Chimp you just aren't going to get a regexp that will do everything you want use this gather the esc sequences run it again replace them with a easy to use sentinel

Split on that sentinel and rebuild from there

I take no credit for that above regexp I used the definitions here to build it

https://metacpan.org/source/JOSEF/Ecma48-Util-0.01/lib/Ecma48/Util.pm

Link to comment
Share on other sites

Something like this

#include <Array.au3>
#include <StringConstants.au3>

;........
Local $sRegEsc = "\x9b|\e\[[:<=>?]?[\d;]*[\x20-/]?[@-~]|\cX|\ea|\x9d|\e]|\x9f|\e_|\x90|\eP|\x9e|\e^(?:[\x0f\x0e\t-\r\x20-0x7e]*|[\xA0-\xFE\t-\r\x20-0x7e]*)\cG|\x9c|\e\\|\cX|\ea|x98|\eX[^\x98\x9c]*?\cG|\x9c|\e\\|\cX|\ea|\e[\x20-/]*(?:[0-~]|\cX|\ea)";;;

Local $aMatch = StringRegExp($yourstr, $sRegEsc, 3)
ConsoleWrite("Err:" & @error & @CRLF)

Local $sEscSentinel = "[#ESCSEQ#]"
Local $sStringRem = StringRegExpReplace($yourstr, $sRegEsc, "^###^" & $sEscSentinel & "^###^")
ConsoleWrite("Err:" & @error & @CRLF)

Local $aResult = StringSplit($sStringRem, "^###^", $STR_ENTIRESPLIT)
ConsoleWrite("Err:" & @error & @CRLF)

If IsArray($aMatch) And IsArray($aResult) Then
    Local $iEsc = 0
    For $i = 1 To $aResult[0]
        If $aResult[$i] == $sEscSentinel Then
            $aResult[$i] = $aMatch[$iEsc]
            $iEsc += 1
        EndIf
    Next
EndIf
;---------------------------------------------------------
_ArrayDisplay($aResult)

 

Link to comment
Share on other sites

thanks @jchd, your pattern is nearly perfect, just a little lack, it should also catch sequences where there is no parameters at all between csi (esc[) and the final character, or also if there is any combination of numbers and/or semicolons, since the absence of parameter, or numbers betwenn semicolons, is considered by ansi as the implicit value of 0. So, for example, parameter strings like the following should be considered valid escape sequences: esc[m or esc[;m or esc[3;;4m or esc[0;m or esc[;2;;3m and so on ... all valid.
Thanks a lot for your pattern

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

@Bilgus, thanks for your pattern, but testing it using the example string from @jchd post above,  it splits also or splits in a wrong way sequences that are not complete or containing spaces in between. perhaps, with a little calibration, the pattern by  jchd is OK and is able to correctly catch most of the sequences. Thank You anyway

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

4 hours ago, Bilgus said:

@Jchd have you seen that spec lol

Oh yes, I know there's much more than what I examplified. I simply followed Chimp restricted spec, simply forgot to make the numeric part optional, a gross overview. It's been many blue moons that I've typed ANSI escapes by hart.

3 hours ago, Chimp said:

perhaps, with a little calibration, the pattern by  jchd is OK and is able to correctly catch most of the sequences.

Should do what you asked for:

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red on black " & _
            Chr(27) & "[ 47;32m And this is green on white" & _
            Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
            Chr(27) & "[1234567890123456798Kvftio this matches the definition, albeit probably invalid ANSI escape" & _
            Chr(27) & "[ 47;32 !!!! this is not an ANSI sequence" & _
            Chr(27) & "[ z but this one is OK and final."
Local $aRes = StringRegExp($s, "(?x)" & _
                            "(?(DEFINE) (?<ANSI_Escape> \[ (?:\s*\d*\s*;?)* [[:alpha:]]) )" & _
                            "(?| \x1B (?&ANSI_Escape) | (?:[^\x1B] (?!(?&ANSI_Escape)))+ )", 3)
_ArrayDisplay($aRes, "Mixed results")

Local $sClean = StringRegExpReplace($s, "(?x) (\x1B \[ (?:\s*\d*\s*;?)* [[:alpha:]])",  "")
MsgBox(0, "Pure text", $sClean)

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks a lot @jchd, now it seems "perfect" .... or, at least, for the use that I have to do of it, it's more than OK.
(p.s. ...I'm playing around a simple ansi file viewer)
Many thanks to everyone

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

... one more question about this please,

how patterns by @jchd can be modified so that, in addition to capturing escape sequences, also some single control characters can be captured, say for example chr (10) chr (13) and chr (30)... (maybe creating a group of control codes that can be easily modified by adding or deleting some of them into the pattern?).  so for example the following string

"Hello, " & Chr(27) & "[ 40;31mThis is red on black " & chr(13) & "today is Tuesday" & chr(10) & " ... and so on"

could be transformed into the following array

[0] Hello, 
[1] esc[ 40;31m
[2] This is red on black 
[3] @CR
[4] today is Tuesday
[5] @LF
[6] ... and so on


thanks for any help

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Instead of walking into hazardous wild ways I would personally use a 2nd step as a workaround  :idiot:
(BTW much easier to manage, probably)

#Include <Array.au3>

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red on black " & chr(13) & _
            Chr(27) & "[ 47;32m And this is green on white" & chr(10) & _
            Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
            Chr(27) & "[1234567890123456798Kvftio this matches the definition, albeit probably invalid ANSI escape" & _
            Chr(27) & "[ 47;32 !!!! this is not an ANSI sequence" & _
            Chr(27) & "[ z but this one is OK and final."
Local $aRes = StringRegExp($s, "(?x)" & _
                            "(?(DEFINE) (?<ANSI_Escape> \[ (?:\s*\d*\s*;?)* [[:alpha:]]) )" & _
                            "(?| \x1B (?&ANSI_Escape) | (?:[^\x1B] (?!(?&ANSI_Escape)))+ )", 3)
;_ArrayDisplay($aRes, "Mixed results")

$s2 = _ArrayToString($aRes, "*")
$s2 = StringRegExpReplace($s2, '(?=[\r\n])', "*")
$aRes2 = StringSplit($s2, '*', 2)
_ArrayDisplay($aRes2, "Mixed results2")

;check
;Msgbox(0,"", "chr(" & asc($aRes2[3]) & ")" &@cr& "chr(" & asc($aRes2[6]) & ")" )

 

Link to comment
Share on other sites

thanks @mikell, I was thinking on how to modify that good pattern by @jchd, but you are right, this way  is easyer to manage and it do well it's job, thanks for this 2 steps version.
p.s. I see that I can add or delete other control codes to be catched (example chr(30)) by changing the list in the pattern and using the hex notation to specify exactly the wanted control codes, also, I've used chr(0) as the replace char so I can safely use the asterisc within the main text. Is ok like this or there is a better way?
Thanks again for the help.

#include <Array.au3>

Local $s = "Hello, " & Chr(27) & "[ 40;31mThis is red* on black " & Chr(30) & _
        Chr(27) & "[ 47;32m And this is green* on white" & Chr(10) & _
        Chr(27) & "[ 47 ;  32   m And this is also green on white (more spaces)" & _
        Chr(27) & "[1234567890123456798Kvftio this matches the definition, albeit probably invalid ANSI escape" & _
        Chr(27) & "[ 47;32 !!!! this is not an ANSI sequence" & _
        Chr(27) & "[ z but this one is OK and final."
Local $aRes = StringRegExp($s, "(?x)" & _
        "(?(DEFINE) (?<ANSI_Escape> \[ (?:\s*\d*\s*;?)* [[:alpha:]]) )" & _
        "(?| \x1B (?&ANSI_Escape) | (?:[^\x1B] (?!(?&ANSI_Escape)))+ )", 3)
; _ArrayDisplay($aRes, "Mixed results")

$s2 = _ArrayToString($aRes, Chr(0))
$s2 = StringRegExpReplace($s2, '(?=[\x1E\x0A])', Chr(0))
$aRes2 = StringSplit($s2, Chr(0), 2)
_ArrayDisplay($aRes2, "Mixed results2")

;check
MsgBox(0, "", "chr(" & Asc($aRes2[3]) & ")" & @CR & "chr(" & Asc($aRes2[6]) & ")")

 

 

Also, another question please,  how can I also clean the string from those control code with another pattern??

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...