Sign in to follow this  
Followers 0
donvier

Reformatting Text into CSV

14 posts in this topic

Hello everyone,

I am in a situation where a have thousands of separate text files that are formatted as:

FILEID: "FileName"

PATH: "FilePath"

TYPE: "JPG"

SECLEV: "10"

STATID: "ID"

USRID: "UserName"

REQDATE: "03/02/2009"

REQTIME: "09:18:07"

GENDATE: "03/02/2009"

GENTIME: "09:18:07"

PROGID: "ProgramID"

GROUPID: "Photo"

DESC: "TextDescription"

What I need to do is take the text out of each of these files and format it into a single CSV file. To top that off, I also need to chop off the tags so that I am only left with the values. My end result with the sample above would be:

"FileName","FilePath","JPG","10","ID","UserName","03/02/2009","09:18:07","03/02/2009","09:18:07","ProgramID","Photo","TextDescription"

Right now, I have 2 main issues/questions.

#1: Do any of your have any ideas on how I could trim the tags out so that I just have the values? (PROGID: "ProgramID" would be "ProgramID")

#2: None of these text files have end of file tags, is it possible to tell the app to move on when the rest of the file is blank?

Right now I have code setup to pull each file and show me the lines individually (in a message box at the moment for testing).

$search = FileFindFirstFile("C:\Users\Test\Desktop\BaTest\*.arf")
dim $Path = "C:\Users\Test\Desktop\BaTest\"

If $search = -1 Then
    MsgBox(0, "Error", "No files matched")
    Exit
EndIf

While 1
    $file = FileFindNextFile($search)
    If @error Then ExitLoop

    $fileopen = FileOpen($Path & $file, 0)

If $fileopen = -1 then
    MsgBox(0, "Error", "Open Fail")
    Exit
EndIf

While 1
    $line = FileReadLine($fileopen)
    ;If $line = "            " Then ExitLoop
        MsgBox(0, "LineRead=", $line)
WEnd
WEnd

; Close the search handle
FileClose($search)

Any help would be appreciated! Even if it's to start over and think about this a different way.

Thanks!

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

If it's *exactly* the way you posted it, this should work for you ($sInput is the data you posted above):

$sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]+")','$1,')
$sOutput=StringTrimRight($sOutput,1)    ; strip last ','
$sOutput=StringStripWS($sOutput,2)  ; get rid of any whitespace after

*edit: forgot about the trailing whitespace (added StringStripWS)

Edited by Ascend4nt

Share this post


Link to post
Share on other sites

Something like this can be modified to read from the file instead of this array:

$sInput = 'FILEID: "FileName"|PATH: "FilePath"|TYPE: "JPG"|SECLEV: "10"|STATID: "ID"|' & _
        'USRID: "UserName"|REQDATE: "03/02/2009"|REQTIME: "09:18:07"|GENDATE: "03/02/2009"|' & _
        'GENTIME: "09:18:07"|PROGID: "ProgramID"|GROUPID: "Photo"|DESC: "TextDescription"'
$aInput = StringSplit($sInput, "|")

For $n = 1 To $aInput[0]
    $aData = StringRegExp($aInput[$n], '\A(.+):\s\"(.+)\"', 3)
    If (@error = 0) And IsArray($aData) And UBound($aData) >= 2 Then
        ConsoleWrite($n & ":  " & $aData[0] & " = " & $aData[1] & @LF)
    Else
        ConsoleWrite($n & ":  Failed" & @LF)
    EndIf
Next

;)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Here my version:

Txt2CSV(@ScriptDir & "\Text1.arf")

Func Txt2CSV($file)
    Local $csv, $i, $txt, $hFile, $aSplit, $aRegEx
    $hFile = FileOpen($file)
    $txt = FileRead($file)
    FileClose($hFile)
    $aSplit = StringSplit($txt, Chr(13), 2)
    For $i = 0 To UBound($aSplit) - 1
    $aRegEx = StringRegExp($aSplit[$i], "(?i)(.+): (.+)", 1) ;split string
    $csv &= $aRegEx[1] & "," ;and merge csv like
    Next
    $csv = StringTrimRight($csv, 1) ;remove last comma
    $hSave = FileOpen(StringTrimRight($file, 3) & "csv", 2) ; replace file extension with csv
    FileWrite($hSave, $csv)
    FileClose($hSave)
EndFunc

Br,

UEZ

Edit: oversaw that you need only the values

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯

Share this post


Link to post
Share on other sites

Thanks for all the replies, it seems everyone has a different way of doing it and i'm almost there.

$search = FileFindFirstFile("C:\Users\test\Desktop\BaTest\*.arf")
If $search = -1 Then
    MsgBox(0, "Error", "No files matched")
    Exit
EndIf

; Path to the files
dim $Path = "C:\Users\test\Desktop\BaTest\"

While 1
    $file = FileFindNextFile($search)
    If @error Then ExitLoop

    $fileopen = FileOpen($Path & $file, 0)

If $fileopen = -1 then
    MsgBox(0, "Error", "Open Fail")
    Exit
EndIf

While 1
    $sInput = FileReadLine($fileopen)

    $sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]+")','$1,')
    $sOutput=StringTrimRight($sOutput,1)    ; strip last ','
    $sOutput=StringStripWS($sOutput,2)  ; get rid of any whitespace after
;open csv file
    $DataFile = FileOpen("C:\DataFileIndex.csv", 1)
        If $DataFile = -1 Then
        MsgBox(0, "Error", "Unable to open file.")
        Exit
        EndIf
; write values to csv file
FileWrite($DataFile, $sOutput & ",")
FileClose($DataFile)
        ;MsgBox(0, "LineRead=", $sOutput)
WEnd
WEnd

; Close the search handle
FileClose($search)

The only problem left is the fact that none of the files this app will be reading have end of file characters so the loop doesnt restart after the last value because it doesnt know it should. Any thoughts? Other than that, this thing does exactly what I need. Thanks again for all the input!

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

You should check @error after FileReadLine():

While 1
    $file = FileFindNextFile($search)

    ; ...

    While 1
        $sInput = FileReadLine($fileopen)
        If @error Then ExitLoop ; Done, next file
        
        $sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]+")','$1,')
        
        ; ...

    WEnd
WEnd

;)

Edited by PsaltyDS

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Thanks for all the replies, it seems everyone has a different way of doing it and i'm almost there.

The only problem left is the fact that none of the files this app will be reading have end of file characters so the loop doesnt restart after the last value because it doesnt know it should. Any thoughts? Other than that, this thing does exactly what I need. Thanks again for all the input!

donvier, in using my PCRE, you wouldn't read line-by-line. Just read the entire file contents (assuming they aren't files >2 billion characters) by 'FileRead()'. You don't even need FileOpen/Close for the input file this way. As far as the 'end of file characters', I don't know what you mean by that - is it whitespace? If so, just strip the whitespace from the FileRead - probably best to do that before using my StringRegExpReplace().

Also, use ClipPut() with the output or somesuch with MsgBox()'s to debug the thing to make sure everything is coming out right.

*edit: also, you don't need to insert commas, as they are included in the StringRegExpReplace().

Edited by Ascend4nt

Share this post


Link to post
Share on other sites

First I would like to say that you guys are awesome. As you can obviously tell, I dont have much experience with scripting or coding and am just trying to break into it. What I was saying about the EOF character was a misunderstanding on my part from what I was reading in the help file, it was just white space and after adding the @error line as PsaltyDS suggested it fixed the never ending loop.

I switched over to FileRead instead of doing it line by line and used the code examples Ascend4nt suggested and got exactly what I was trying to acheive. As you can see below, I was not able to get FileRead to work without doing the file open/close. If I got rid of that it would just create a blank message. I'm sure you'll notice other inefficiencies that I caused in here but here is the final piece: or at least the piece before I add a config file so I don't have to hard code paths.

$search = FileFindFirstFile("C:\Users\test\Desktop\BaTest\*.arf")
If $search = -1 Then
    MsgBox(0, "Error", "No files matched")
    Exit
EndIf

dim $Path = "C:\Users\test\Desktop\BaTest\"

While 1
    $file = FileFindNextFile($search)
    If @error Then ExitLoop

    $fileopen = FileOpen($Path & $file, 0)
    If $fileopen = -1 then
        MsgBox(0, "Error", "Open Fail")
        Exit
EndIf

While 1
    $sInput = FileRead($fileopen)
    If @error Then ExitLoop
    $sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]+")','$1,')
    $sOutput=StringTrimRight($sOutput,1)    ; strip last ','
    $sOutput=StringStripWS($sOutput,2)  ; get rid of any whitespace after

        ;MsgBox(0, "FileRead=", $sOutput)
$DataFile = FileOpen("C:\DataFileIndex.csv", 1)
        If $DataFile = -1 Then
        MsgBox(0, "Error", "Unable to open file.")
        Exit
        EndIf
FileWrite($DataFile, @CR)
FileWrite($DataFile, $sOutput)
FileClose($DataFile)
        FileClose($fileopen)
WEnd
WEnd

; Close the search handle
FileClose($search)

Share this post


Link to post
Share on other sites

Apparently my test set wasnt big enough. there are some occurences where the value maybe blank.

Instead of the original:

FILEID: "FileName"

PATH: "FilePath"

TYPE: "JPG"

SECLEV: "10"

STATID: "ID"

USRID: "UserName"

REQDATE: "03/02/2009"

REQTIME: "09:18:07"

GENDATE: "03/02/2009"

GENTIME: "09:18:07"

PROGID: "ProgramID"

GROUPID: "Photo"

DESC: "TextDescription"

it would be:

FILEID: "FileName"

PATH: "FilePath"

TYPE: "JPG"

SECLEV: ""

STATID: "ID"

USRID: "UserName"

REQDATE: "03/02/2009"

REQTIME: "09:18:07"

GENDATE: "03/02/2009"

GENTIME: "09:18:07"

PROGID: "ProgramID"

GROUPID: "Photo"

DESC: "TextDescription"

So instead of the trimmed value, I get:

"FileName","FilePath","JPG",

SECLEV: """ID","UserName","03/26/2009","15:29:35","03/26/2009","15:29:35","ProgramID","Photo","TextDescription",

I would be alright with having a junk value in there like 1 or something if it would be easier to just insert a value? What are your thoughts?

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

First change the order so that the whitespace gets destroyed before the StringRegExpReplace(), then change one character to make it find blank strings (+ [1 or more occurrences] becomes a * [0 or more occurrences]):

$sOutput=StringStripWS($sOutput,2) ; get rid of any whitespace at the end
$sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]*")','$1,')
$sOutput=StringTrimRight($sOutput,1) ; strip last ','

And just a suggestion, but you also might want to keep track of the files that FileRead() fails on? (Just to be safe)

*edit: and I just realized - you are still opening the files - this is unnecessary. Plus when a FileRead() fails, you do an ExitLoop instead of properly closing the File handles you open (another reason why just reading and not first opening a file is good). And if you are changing all that, 'ContinueLoop' would make sense if you don't want it to stop at the first file it finds an error on - though you'll still want to know which files are failing I would think..

Edited by Ascend4nt

Share this post


Link to post
Share on other sites

Now correct me if I'm wrong as I'm getting most of my other information from the help file.

If a filename is given rather than a file handle - the file will be opened and closed during the function call

I can only use handles, because my file names change constantly, right?

For some reason I can only get the FileRead to work when I first open the file. If I do specify a file individually, it will do the read without me having to open/close the file, but not with the handle...

Thanks for that explanation on the regexp replace Ascend4nt, that did fix the blank value issue.

Share this post


Link to post
Share on other sites

All you need to do is:

$sInput = FileRead($Path & $file)

I don't think your code is well-written however. You have two while loops which doesn't make sense to me - you read from the file only one time, then you are done - no need to loop in that part.

Also, you'd probably only want to open the output file once *outside* of the loop. Once the loop is done, you can close the output file.

Share this post


Link to post
Share on other sites

I don't think your code is well-written however. You have two while loops which doesn't make sense to me - you read from the file only one time, then you are done - no need to loop in that part.

I definitely agree that its not well written. When I said new to scripting, this is quite frankly my first useful utility. I moved the data file opening to outside of the loop, got rid of the second loop, no longer opening the file for fileread. I do have a question on the ContinueLoop. How would I implement that if I wanted it to write it to a log file?

At the FileRead @error, would it look like (doesnt feel right):

$sInput = FileRead($VarLocation & $file)
    If @error = 1 Then Filewriteline($LogFileERR, "Error Processing File:" & $file)
ElseIf @error = 1 Then ContinueLoop
?

Below is how my loop is currently setup now.

While 1
    $file = FileFindNextFile($search)
    If @error Then ExitLoop

    $sInput = FileRead($VarLocation & $file)
    If @error = 1 Then Filewriteline($LogFileERR, "Error Processing File:" & $file & "File Read Failed")

    $sOutput=StringStripWS($sInput,2)  ; get rid of any whitespace after
    $sOutput=StringRegExpReplace($sOutput,'[^"]+("[^"]*")','$1,')
    $sOutput=StringTrimRight($sOutput,1)    ; strip last ','

Filewriteline($LogFile, "Processed File:" & $file)

FileWrite($DataFile, @CR & @LF)
FileWrite($DataFile, $sOutput)

WEnd

Share this post


Link to post
Share on other sites

Since you are new to AutoIt, I suggest you look into Tutorials and the example code provided in the Help file. KaFu has a nice link collection on his profile. Melba23 made a nice post about where to look for good tutorials on this thread.

I'll give you a freebie though:

While 1
    $file = FileFindNextFile($search)
    If @error Then ExitLoop

    $sInput = FileRead($VarLocation & $file)
    If @error Then
        Filewriteline($LogFileERR, "Error Processing File:" & $file & "File Read Failed")
    Else
        $sOutput=StringStripWS($sInput,2) ; get rid of any whitespace after
        $sOutput=StringRegExpReplace($sOutput,'[^"]+("[^"]*")','$1,')
        $sOutput=StringTrimRight($sOutput,1)    ; strip last ','

        Filewriteline($LogFile, "Processed File:" & $file)
        FileWrite($DataFile, @CRLF&$sOutput)
    EndIf
WEnd

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0