Jump to content
Sign in to follow this  
myspacee

StringRegExp : keep only number and chars

Recommended Posts

myspacee

Hello to all,

using StringReplace i encountered strange problem.

Have thousand files to check to create a report.

All these files have some ascii control codes on head (propetary format) :mellow:

eg:

NULNULSTXDC2 some text find also number then again text

is possible to :

- open file

- store all in a var

- keep only text/numbers (remove all symbol/ascii codes/etc)

Thank you for reading and any info,

m.

ps: can post some files if need

Share this post


Link to post
Share on other sites
Melba23

myspeacee,

Not sure you need a SRE for this:

; Create a "binary" string as you would get from a file read in Binary format
$sText = "0x"
For $i = 0 To 127
    $sText &= Hex($i, 2)
Next
MsgBox(0,"Binary String", $sText)

; Move through the "binary" string and remove all characters below 32
$sNewText = "0x"
For $i = 0 to 127
    $sChar = BinaryMid($sText, 1 + $i, 1)
    ConsoleWrite($sChar & @CRLF)
    If $sChar > 31 Then $sNewText &= StringTrimLeft($sChar, 2)
Next
MsgBox(0, "", $sNewText)

If you want to remove other symbols, just change the If in the second loop to a Switch and use as many Case statements as you need. :mellow:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
martin

Hello to all,

using StringReplace i encountered strange problem.

Have thousand files to check to create a report.

All these files have some ascii control codes on head (propetary format) :mellow:

eg:

NULNULSTXDC2 some text find also number then again text

is possible to :

- open file

- store all in a var

- keep only text/numbers (remove all symbol/ascii codes/etc)

Thank you for reading and any info,

m.

ps: can post some files if need

I'm not sure what you want to remove since an ascii code might be representing a character you want to keep.

Anyway, the easiest way to do it might be to decide what you want to keep,

Supposing you want to keep all numbers, all letters a to z, spaces and any vertical or horizontal whitespace character. Then you could remove everything else from the string $s like this

$sStripped = StringRegExpReplace($s,"[^0-9,a-z,A-Z, ,\h,\v]","")

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites
dani

@martin

Actually, you include the comma now :mellow: You cannot use the comma to separate character classes. Just omit them and it will work.

Compare:

$s = "123a,bc*^@),."
$sStripped_1 = StringRegExpReplace($s,"[^0-9,a-z,A-Z, ,\h,\v]","") 
$sStripped_2 = StringRegExpReplace($s,"[^0-9a-zA-Z\s]","") ; Space is included in \h btw -- \h = tabs & spaces so I left " " out. Also, as far as I know \h\v == \s

ConsoleWrite($sStripped_1 & @CR)
ConsoleWrite($sStripped_2 & @CR)
Edited by dani

Share this post


Link to post
Share on other sites
myspacee

Thank you for reply,

but can't solve and going mad.

Extract little part of my script :

#include <Array.au3>
#Include <File.au3>
        
        
;Gather files list into an array
$fileList = _FileListToArray(@ScriptDir, "*.", 1)
if @Error = 0 then ;if some files exist


    ;Loop through array from 1 to last file
    For $X = 1 to $fileList[0]
        ToolTip("",0,0)

            ;read file
            $foo = FileOpen ($fileList[$X], 0)
            $bar = FileRead ($foo)
            
            MsgBox(0,$fileList[$X],$bar)

            FileClose($foo)

    next
EndIf

post zipped folder with 204 'txt' files, for test.

http://www.webalice.it/t.bavaro/pvv47p.zip

Can't solve this riddle....

m.

Share this post


Link to post
Share on other sites
martin

@dani. Yes, thanks for pointing out my mistake.

@myspacee

I expect it does seem a bit strange but the files you are reading start with ascii code 00 which is used to mark the end of a string so the string you try to display will be "".

Try this

#include <Array.au3>
#Include <File.au3>


;Gather files list into an array
$fileList = _FileListToArray(@ScriptDir, "*.", 1)
if @Error = 0 then ;if some files exist
ConsoleWrite("No. of files = " & $fileList[0] & @CRLF)

    ;Loop through array from 1 to last file
    For $X = 1 to $fileList[0]
    ToolTip("",0,0)

    
    $bar = StringRegExpReplace(FileRead ($fileList[$X]),"[^0-9a-zA-Z \h\v]","")

    MsgBox(0,$fileList[$X],$bar)

    
    next
EndIf

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites
myspacee

Thank ou Martin,

incredible but doesn't work with posted files...

I see ascii code 00 and can't find way to avoid it,

if manually remove (eg: notepad) it works.

Can't do anything to mass of file there are too many,

I must find a solution...

m.

Share this post


Link to post
Share on other sites
MvGulik
whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites
martin

Thank ou Martin,

incredible but doesn't work with posted files...

I see ascii code 00 and can't find way to avoid it,

if manually remove (eg: notepad) it works.

Can't do anything to mass of file there are too many,

I must find a solution...

m.

This works for me with the posted files, which I checked before I posted the last time.

#include <Array.au3>
#Include <File.au3>

dircreate(@scriptDir & "\Atemp")
;Gather files list into an array
$fileList = _FileListToArray(@ScriptDir, "*.", 1)
if @Error = 0 then ;if some files exist
ConsoleWrite("No. of files = " & $fileList[0] & @CRLF)

    ;Loop through array from 1 to last file
    For $X = 1 to $fileList[0]
    ToolTip($x,0,0)


    $bar = StringRegExpReplace(FileRead ($fileList[$X]),"[^0-9a-zA-Z \h\v]","")
    filewrite("Atemp\" & $fileList[$X],$bar)
    ; MsgBox(0,$fileList[$X],$bar)

    next
EndIf

It converts the posted files in about 2 seconds.


Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites
myspacee

thank you again Martin,

but in folder that last script create, i find zeroed files...

Maybe language setting can influence results ?

Ansi/unicode/utf issue ?

Some binary result using fileopen forcing binary(byte) reading:

#include <Array.au3>
#Include <File.au3>

dircreate(@scriptDir & "\Atemp")
;Gather files list into an array
$fileList = _FileListToArray(@ScriptDir, "*.", 1)
if @Error = 0 then ;if some files exist
    ConsoleWrite("No. of files = " & $fileList[0] & @CRLF)

    ;Loop through array from 1 to last file
    For $X = 1 to $fileList[0]
        ToolTip($x,0,0)
        ConsoleWrite("file = " & $fileList[$X] & @CRLF)
        
        
        $foo = FileOpen (@ScriptDir & "\" & $fileList[$X], 16)
        $bar = StringRegExpReplace(FileRead ($foo),"[^0-9a-zA-Z \h\v]","")
        if @Error then msgbox(0,"a",@Error)
            
        filewrite("Atemp\" & $fileList[$X], $bar )
        if @Error then msgbox(0,"b",@Error)



        fileclose($foo)
    next
EndIf

!?

m.

Edited by myspacee

Share this post


Link to post
Share on other sites
martin

thank you again Martin,

but in folder that last script create, i find zeroed files...

Maybe language setting can influence results ?

Ansi/unicode/utf issue ?

Some binary result using fileopen forcing binary(byte) reading:

#include <Array.au3>
#Include <File.au3>

dircreate(@scriptDir & "\Atemp")
;Gather files list into an array
$fileList = _FileListToArray(@ScriptDir, "*.", 1)
if @Error = 0 then ;if some files exist
    ConsoleWrite("No. of files = " & $fileList[0] & @CRLF)

 ;Loop through array from 1 to last file
 For $X = 1 to $fileList[0]
        ToolTip($x,0,0)
        ConsoleWrite("file = " & $fileList[$X] & @CRLF)
        
        
        $foo = FileOpen (@ScriptDir & "\" & $fileList[$X], 16)
        $bar = StringRegExpReplace(FileRead ($foo),"[^0-9a-zA-Z \h\v]","")
        if @Error then msgbox(0,"a",@Error)
            
        filewrite("Atemp\" & $fileList[$X], $bar )
        if @Error then msgbox(0,"b",@Error)



        fileclose($foo)
 next
EndIf

!?

m.

There is probably something I don't understand. When I use the script I posted to get new files in Atemp. I get files that I can open and read in SciTE or notepad.

Here is a screenshot showing the binary values and the text for the first converted file.

post-3602-12680889017326_thumb.png

Does that look like what you get?(I forgot to make sure the cursor wasn't in the screenshot.)

EDIT: But you aren't using the code I posted! What happens if you do use the code I posted?

Edited by martin

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites
myspacee

Martin,

I use code you post and return directory with zeroed files.

I add some error control and fileopen/fileclose func,

but obtain always zeroed files, until i forcing binary(byte) reading.

when i'm in office post zipped drectory with your code and my file so you can test

your script with my 'setting'.

Thank you again for your time,

m.

Share this post


Link to post
Share on other sites
martin

Martin,

I use code you post and return directory with zeroed files.

I add some error control and fileopen/fileclose func,

but obtain always zeroed files, until i forcing binary(byte) reading.

when i'm in office post zipped drectory with your code and my file so you can test

your script with my 'setting'.

Thank you again for your time,

m.

Ok, but if when you try with the posted files it doesn't work but when I try with the posted files it does then I don't know how to make any progress. If you try with exactly the code I posted using exactly the same files you gave and the first file created doesn't look exactly like the result I showed then I'm a bit lost. The code I posted removes NULLs so I don't understand the need to read binary. If you are going to read the whole file then FileRead is sufficient; FileOpen, FileRead, FileClose is not needed.

I'm using AUtoIt version 3.3.6.0 and Beta 3.3.5.6. Both seem to give the same results.


Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites
trancexx

There was a change in RegExp behavior regarding NULL character. I'm not sure this is documented. Some time ago I made a ticket regarding NULL and RegExp but that went nowhere. Nevertheless, seems it was not for nothing after all.

myspacee should really say what version of AutoIt she/he is using. It make no sense helping something that is outdated.


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites
martin

There was a change in RegExp behavior regarding NULL character. I'm not sure this is documented. Some time ago I made a ticket regarding NULL and RegExp but that went nowhere. Nevertheless, seems it was not for nothing after all.

myspacee should really say what version of AutoIt she/he is using. It make no sense helping something that is outdated.

Thanks trancexx, that could explain it. Let's see what myspacee says.


Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites
myspacee

Upgrade from 3.3.0.0 to 3.3.6.0 solve problem.

Scary to upgrade AI version, this upgrade solve RegExp problem

but broken my FTP_Ex.au3.

D:\Prj\myScript\103_indagine_territoriale\FTP_Ex.au3(10,40) : ERROR: $GENERIC_READ previously declared as a 'Const'
Global Const $GENERIC_READ = 0x80000000
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
D:\Prj\myScript\103_indagine_territoriale\FTP_Ex.au3(11,41) : ERROR: $GENERIC_WRITE previously declared as a 'Const'
Global Const $GENERIC_WRITE = 0x40000000
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
D:\Prj\myScript\103_indagine_territoriale\FTP_Ex.au3(22,268) : ERROR: $tagWIN32_FIND_DATA previously declared as a 'Const'
Global Const $tagWIN32_FIND_DATA = "DWORD dwFileAttributes; dword ftCreationTime[2]; dword ftLastAccessTime[2]; dword ftLastWriteTime[2]; DWORD nFileSizeHigh; DWORD nFileSizeLow; dword dwReserved0; dword dwReserved1; CHAR cFileName[260]; CHAR cAlternateFileName[14];"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
D:\Prj\myScript\103_indagine_territoriale\FTP_20.au3 - 3 error(s), 0 warning(s)

Rollback :mellow:

m.

Edited by myspacee

Share this post


Link to post
Share on other sites
trancexx

If you insist (no matter how stupid that is) on using the old version of AutoIt then change related code to:

$bar = StringReplace(FileRead ($fileList[$X]), Chr(0), "")
$bar = StringRegExpReplace($bar,"[^0-9a-zA-Z \h\v]","")

♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites
myspacee

If you insist (no matter how stupid that is) on using the old version of AutoIt then change related code to:

$bar = StringReplace(FileRead ($fileList[$X]), Chr(0), "")
$bar = StringRegExpReplace($bar,"[^0-9a-zA-Z \h\v]","")

trancexx,

stupid or not i've some AI script in production, not only on my office.

Develop environment -> test -> production chain can't be broken in my case.

Rollback to 3.3.0.0 solve problem, choose minor malus.

Thank you again,

m.

Share this post


Link to post
Share on other sites
martin

Upgrade from 3.3.0.0 to 3.3.6.0 solve problem.

Scary to upgrade AI version, this upgrade solve RegExp problem

but broken my FTP_Ex.au3.

D:\Prj\myScript\103_indagine_territoriale\FTP_Ex.au3(10,40) : ERROR: $GENERIC_READ previously declared as a 'Const'
Global Const $GENERIC_READ = 0x80000000
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
D:\Prj\myScript\103_indagine_territoriale\FTP_Ex.au3(11,41) : ERROR: $GENERIC_WRITE previously declared as a 'Const'
Global Const $GENERIC_WRITE = 0x40000000
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
D:\Prj\myScript\103_indagine_territoriale\FTP_Ex.au3(22,268) : ERROR: $tagWIN32_FIND_DATA previously declared as a 'Const'
Global Const $tagWIN32_FIND_DATA = "DWORD dwFileAttributes; dword ftCreationTime[2]; dword ftLastAccessTime[2]; dword ftLastWriteTime[2]; DWORD nFileSizeHigh; DWORD nFileSizeLow; dword dwReserved0; dword dwReserved1; CHAR cFileName[260]; CHAR cAlternateFileName[14];"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
D:\Prj\myScript\103_indagine_territoriale\FTP_20.au3 - 3 error(s), 0 warning(s)

Rollback :mellow:

m.

I'm glad you fixed your problem but I am dismayed that you would rather roll back to 3.3.0.0 than add 3 semicolons to comment out the constants that are now already defined for you.

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites
MvGulik
whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×