donvier Posted August 30, 2010 Share Posted August 30, 2010 Hello everyone, I am in a situation where a have thousands of separate text files that are formatted as: FILEID: "FileName" PATH: "FilePath" TYPE: "JPG" SECLEV: "10" STATID: "ID" USRID: "UserName" REQDATE: "03/02/2009" REQTIME: "09:18:07" GENDATE: "03/02/2009" GENTIME: "09:18:07" PROGID: "ProgramID" GROUPID: "Photo" DESC: "TextDescription" What I need to do is take the text out of each of these files and format it into a single CSV file. To top that off, I also need to chop off the tags so that I am only left with the values. My end result with the sample above would be: "FileName","FilePath","JPG","10","ID","UserName","03/02/2009","09:18:07","03/02/2009","09:18:07","ProgramID","Photo","TextDescription" Right now, I have 2 main issues/questions. #1: Do any of your have any ideas on how I could trim the tags out so that I just have the values? (PROGID: "ProgramID" would be "ProgramID") #2: None of these text files have end of file tags, is it possible to tell the app to move on when the rest of the file is blank? Right now I have code setup to pull each file and show me the lines individually (in a message box at the moment for testing). $search = FileFindFirstFile("C:\Users\Test\Desktop\BaTest\*.arf") dim $Path = "C:\Users\Test\Desktop\BaTest\" If $search = -1 Then MsgBox(0, "Error", "No files matched") Exit EndIf While 1 $file = FileFindNextFile($search) If @error Then ExitLoop $fileopen = FileOpen($Path & $file, 0) If $fileopen = -1 then MsgBox(0, "Error", "Open Fail") Exit EndIf While 1 $line = FileReadLine($fileopen) ;If $line = " " Then ExitLoop MsgBox(0, "LineRead=", $line) WEnd WEnd ; Close the search handle FileClose($search) Any help would be appreciated! Even if it's to start over and think about this a different way. Thanks! Link to comment Share on other sites More sharing options...
Ascend4nt Posted August 30, 2010 Share Posted August 30, 2010 (edited) If it's *exactly* the way you posted it, this should work for you ($sInput is the data you posted above): $sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]+")','$1,') $sOutput=StringTrimRight($sOutput,1) ; strip last ',' $sOutput=StringStripWS($sOutput,2) ; get rid of any whitespace after *edit: forgot about the trailing whitespace (added StringStripWS) Edited August 30, 2010 by Ascend4nt My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
PsaltyDS Posted August 30, 2010 Share Posted August 30, 2010 Something like this can be modified to read from the file instead of this array: $sInput = 'FILEID: "FileName"|PATH: "FilePath"|TYPE: "JPG"|SECLEV: "10"|STATID: "ID"|' & _ 'USRID: "UserName"|REQDATE: "03/02/2009"|REQTIME: "09:18:07"|GENDATE: "03/02/2009"|' & _ 'GENTIME: "09:18:07"|PROGID: "ProgramID"|GROUPID: "Photo"|DESC: "TextDescription"' $aInput = StringSplit($sInput, "|") For $n = 1 To $aInput[0] $aData = StringRegExp($aInput[$n], '\A(.+):\s\"(.+)\"', 3) If (@error = 0) And IsArray($aData) And UBound($aData) >= 2 Then ConsoleWrite($n & ": " & $aData[0] & " = " & $aData[1] & @LF) Else ConsoleWrite($n & ": Failed" & @LF) EndIf Next Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
UEZ Posted August 30, 2010 Share Posted August 30, 2010 (edited) Here my version: Txt2CSV(@ScriptDir & "\Text1.arf") Func Txt2CSV($file) Local $csv, $i, $txt, $hFile, $aSplit, $aRegEx $hFile = FileOpen($file) $txt = FileRead($file) FileClose($hFile) $aSplit = StringSplit($txt, Chr(13), 2) For $i = 0 To UBound($aSplit) - 1 $aRegEx = StringRegExp($aSplit[$i], "(?i)(.+): (.+)", 1) ;split string $csv &= $aRegEx[1] & "," ;and merge csv like Next $csv = StringTrimRight($csv, 1) ;remove last comma $hSave = FileOpen(StringTrimRight($file, 3) & "csv", 2) ; replace file extension with csv FileWrite($hSave, $csv) FileClose($hSave) EndFunc Br, UEZ Edit: oversaw that you need only the values Edited August 30, 2010 by UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
donvier Posted August 30, 2010 Author Share Posted August 30, 2010 Thanks for all the replies, it seems everyone has a different way of doing it and i'm almost there. expandcollapse popup$search = FileFindFirstFile("C:\Users\test\Desktop\BaTest\*.arf") If $search = -1 Then MsgBox(0, "Error", "No files matched") Exit EndIf ; Path to the files dim $Path = "C:\Users\test\Desktop\BaTest\" While 1 $file = FileFindNextFile($search) If @error Then ExitLoop $fileopen = FileOpen($Path & $file, 0) If $fileopen = -1 then MsgBox(0, "Error", "Open Fail") Exit EndIf While 1 $sInput = FileReadLine($fileopen) $sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]+")','$1,') $sOutput=StringTrimRight($sOutput,1) ; strip last ',' $sOutput=StringStripWS($sOutput,2) ; get rid of any whitespace after ;open csv file $DataFile = FileOpen("C:\DataFileIndex.csv", 1) If $DataFile = -1 Then MsgBox(0, "Error", "Unable to open file.") Exit EndIf ; write values to csv file FileWrite($DataFile, $sOutput & ",") FileClose($DataFile) ;MsgBox(0, "LineRead=", $sOutput) WEnd WEnd ; Close the search handle FileClose($search) The only problem left is the fact that none of the files this app will be reading have end of file characters so the loop doesnt restart after the last value because it doesnt know it should. Any thoughts? Other than that, this thing does exactly what I need. Thanks again for all the input! Link to comment Share on other sites More sharing options...
PsaltyDS Posted August 30, 2010 Share Posted August 30, 2010 (edited) You should check @error after FileReadLine(): While 1 $file = FileFindNextFile($search) ; ... While 1 $sInput = FileReadLine($fileopen) If @error Then ExitLoop ; Done, next file $sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]+")','$1,') ; ... WEnd WEnd Edited August 30, 2010 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
Ascend4nt Posted August 30, 2010 Share Posted August 30, 2010 (edited) Thanks for all the replies, it seems everyone has a different way of doing it and i'm almost there. The only problem left is the fact that none of the files this app will be reading have end of file characters so the loop doesnt restart after the last value because it doesnt know it should. Any thoughts? Other than that, this thing does exactly what I need. Thanks again for all the input!donvier, in using my PCRE, you wouldn't read line-by-line. Just read the entire file contents (assuming they aren't files >2 billion characters) by 'FileRead()'. You don't even need FileOpen/Close for the input file this way. As far as the 'end of file characters', I don't know what you mean by that - is it whitespace? If so, just strip the whitespace from the FileRead - probably best to do that before using my StringRegExpReplace().Also, use ClipPut() with the output or somesuch with MsgBox()'s to debug the thing to make sure everything is coming out right.*edit: also, you don't need to insert commas, as they are included in the StringRegExpReplace(). Edited August 30, 2010 by Ascend4nt My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
donvier Posted August 31, 2010 Author Share Posted August 31, 2010 First I would like to say that you guys are awesome. As you can obviously tell, I dont have much experience with scripting or coding and am just trying to break into it. What I was saying about the EOF character was a misunderstanding on my part from what I was reading in the help file, it was just white space and after adding the @error line as PsaltyDS suggested it fixed the never ending loop. I switched over to FileRead instead of doing it line by line and used the code examples Ascend4nt suggested and got exactly what I was trying to acheive. As you can see below, I was not able to get FileRead to work without doing the file open/close. If I got rid of that it would just create a blank message. I'm sure you'll notice other inefficiencies that I caused in here but here is the final piece: or at least the piece before I add a config file so I don't have to hard code paths. expandcollapse popup$search = FileFindFirstFile("C:\Users\test\Desktop\BaTest\*.arf") If $search = -1 Then MsgBox(0, "Error", "No files matched") Exit EndIf dim $Path = "C:\Users\test\Desktop\BaTest\" While 1 $file = FileFindNextFile($search) If @error Then ExitLoop $fileopen = FileOpen($Path & $file, 0) If $fileopen = -1 then MsgBox(0, "Error", "Open Fail") Exit EndIf While 1 $sInput = FileRead($fileopen) If @error Then ExitLoop $sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]+")','$1,') $sOutput=StringTrimRight($sOutput,1) ; strip last ',' $sOutput=StringStripWS($sOutput,2) ; get rid of any whitespace after ;MsgBox(0, "FileRead=", $sOutput) $DataFile = FileOpen("C:\DataFileIndex.csv", 1) If $DataFile = -1 Then MsgBox(0, "Error", "Unable to open file.") Exit EndIf FileWrite($DataFile, @CR) FileWrite($DataFile, $sOutput) FileClose($DataFile) FileClose($fileopen) WEnd WEnd ; Close the search handle FileClose($search) Link to comment Share on other sites More sharing options...
donvier Posted August 31, 2010 Author Share Posted August 31, 2010 Apparently my test set wasnt big enough. there are some occurences where the value maybe blank.Instead of the original:FILEID: "FileName"PATH: "FilePath"TYPE: "JPG"SECLEV: "10"STATID: "ID"USRID: "UserName"REQDATE: "03/02/2009"REQTIME: "09:18:07"GENDATE: "03/02/2009"GENTIME: "09:18:07"PROGID: "ProgramID"GROUPID: "Photo"DESC: "TextDescription"it would be:FILEID: "FileName"PATH: "FilePath"TYPE: "JPG"SECLEV: ""STATID: "ID"USRID: "UserName"REQDATE: "03/02/2009"REQTIME: "09:18:07"GENDATE: "03/02/2009"GENTIME: "09:18:07"PROGID: "ProgramID"GROUPID: "Photo"DESC: "TextDescription"So instead of the trimmed value, I get:"FileName","FilePath","JPG",SECLEV: """ID","UserName","03/26/2009","15:29:35","03/26/2009","15:29:35","ProgramID","Photo","TextDescription",I would be alright with having a junk value in there like 1 or something if it would be easier to just insert a value? What are your thoughts? Link to comment Share on other sites More sharing options...
Ascend4nt Posted August 31, 2010 Share Posted August 31, 2010 (edited) First change the order so that the whitespace gets destroyed before the StringRegExpReplace(), then change one character to make it find blank strings (+ [1 or more occurrences] becomes a * [0 or more occurrences]):$sOutput=StringStripWS($sOutput,2) ; get rid of any whitespace at the end $sOutput=StringRegExpReplace($sInput,'[^"]+("[^"]*")','$1,') $sOutput=StringTrimRight($sOutput,1) ; strip last ','And just a suggestion, but you also might want to keep track of the files that FileRead() fails on? (Just to be safe)*edit: and I just realized - you are still opening the files - this is unnecessary. Plus when a FileRead() fails, you do an ExitLoop instead of properly closing the File handles you open (another reason why just reading and not first opening a file is good). And if you are changing all that, 'ContinueLoop' would make sense if you don't want it to stop at the first file it finds an error on - though you'll still want to know which files are failing I would think.. Edited August 31, 2010 by Ascend4nt My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
donvier Posted August 31, 2010 Author Share Posted August 31, 2010 Now correct me if I'm wrong as I'm getting most of my other information from the help file.If a filename is given rather than a file handle - the file will be opened and closed during the function callI can only use handles, because my file names change constantly, right? For some reason I can only get the FileRead to work when I first open the file. If I do specify a file individually, it will do the read without me having to open/close the file, but not with the handle...Thanks for that explanation on the regexp replace Ascend4nt, that did fix the blank value issue. Link to comment Share on other sites More sharing options...
Ascend4nt Posted August 31, 2010 Share Posted August 31, 2010 All you need to do is: $sInput = FileRead($Path & $file) I don't think your code is well-written however. You have two while loops which doesn't make sense to me - you read from the file only one time, then you are done - no need to loop in that part. Also, you'd probably only want to open the output file once *outside* of the loop. Once the loop is done, you can close the output file. My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
donvier Posted September 2, 2010 Author Share Posted September 2, 2010 I don't think your code is well-written however. You have two while loops which doesn't make sense to me - you read from the file only one time, then you are done - no need to loop in that part. I definitely agree that its not well written. When I said new to scripting, this is quite frankly my first useful utility. I moved the data file opening to outside of the loop, got rid of the second loop, no longer opening the file for fileread. I do have a question on the ContinueLoop. How would I implement that if I wanted it to write it to a log file? At the FileRead @error, would it look like (doesnt feel right): $sInput = FileRead($VarLocation & $file) If @error = 1 Then Filewriteline($LogFileERR, "Error Processing File:" & $file) ElseIf @error = 1 Then ContinueLoop? Below is how my loop is currently setup now. While 1 $file = FileFindNextFile($search) If @error Then ExitLoop $sInput = FileRead($VarLocation & $file) If @error = 1 Then Filewriteline($LogFileERR, "Error Processing File:" & $file & "File Read Failed") $sOutput=StringStripWS($sInput,2) ; get rid of any whitespace after $sOutput=StringRegExpReplace($sOutput,'[^"]+("[^"]*")','$1,') $sOutput=StringTrimRight($sOutput,1) ; strip last ',' Filewriteline($LogFile, "Processed File:" & $file) FileWrite($DataFile, @CR & @LF) FileWrite($DataFile, $sOutput) WEnd Link to comment Share on other sites More sharing options...
Ascend4nt Posted September 2, 2010 Share Posted September 2, 2010 Since you are new to AutoIt, I suggest you look into Tutorials and the example code provided in the Help file. KaFu has a nice link collection on his profile. Melba23 made a nice post about where to look for good tutorials on this thread.I'll give you a freebie though:While 1 $file = FileFindNextFile($search) If @error Then ExitLoop $sInput = FileRead($VarLocation & $file) If @error Then Filewriteline($LogFileERR, "Error Processing File:" & $file & "File Read Failed") Else $sOutput=StringStripWS($sInput,2) ; get rid of any whitespace after $sOutput=StringRegExpReplace($sOutput,'[^"]+("[^"]*")','$1,') $sOutput=StringTrimRight($sOutput,1) ; strip last ',' Filewriteline($LogFile, "Processed File:" & $file) FileWrite($DataFile, @CRLF&$sOutput) EndIf WEnd My contributions: Performance Counters in Windows - Measure CPU, Disk, Network etc Performance | Network Interface Info, Statistics, and Traffic | CPU Multi-Processor Usage w/o Performance Counters | Disk and Device Read/Write Statistics | Atom Table Functions | Process, Thread, & DLL Functions UDFs | Process CPU Usage Trackers | PE File Overlay Extraction | A3X Script Extract | File + Process Imports/Exports Information | Windows Desktop Dimmer Shade | Spotlight + Focus GUI - Highlight and Dim for Eyestrain Relief | CrossHairs (FullScreen) | Rubber-Band Boxes using GUI's (_GUIBox) | GUI Fun! | IE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) | Magnifier (Vista+) Functions UDF | _DLLStructDisplay (Debug!) | _EnumChildWindows (controls etc) | _FileFindEx | _ClipGetHTML | _ClipPutHTML + ClipPutHyperlink | _FileGetShortcutEx | _FilePropertiesDialog | I/O Port Functions | File(s) Drag & Drop | _RunWithReducedPrivileges | _ShellExecuteWithReducedPrivileges | _WinAPI_GetSystemInfo | dotNETGetVersions | Drive(s) Power Status | _WinGetDesktopHandle | _StringParseParameters | Screensaver, Sleep, Desktop Lock Disable | Full-Screen Crash Recovery Wrappers/Modifications of others' contributions: _DOSWildcardsToPCRegEx (original code: RobSaunder's) | WinGetAltTabWinList (original: Authenticity) UDF's added support/programming to: _ExplorerWinGetSelectedItems | MIDIEx UDF (original code: eynstyne) (All personal code/wrappers centrally located at Ascend4nt's AutoIT Code) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now