Jump to content

text conversion (large amount of data)


Recommended Posts

Hi all,

I need to convert a space delimited text file in a .csv compatible file of a large amount of data coming from a logger.

Becouse of the file format, I need to check if the line:

- is a number > conver the multiple space between the numbers in ";"

- - the numers could be fraction so the "." eg 23.15.59 or 120.98 are imported as is

- is empty > do not import

- is alfa-numeric text > do not import

Any advice where I can start from?

More infos:

- the .txt file could be very large: from 10Mb to 1Gb !

- the .csv file will be used as source for an Access DB only for generate reports

- if the row start with 'Created on' the next 10 characters are used to fill the first 'column'

- if the second character of each row is 0 or 1 or 2 the entire row must be converted

Edited by headhole
Link to comment
Share on other sites

It sounds like a prime opportunity for using regular expressions. If you haven't used regex before, it will take some on-line reading, but you will be able to get the results you want with very little code. Start by looking at StringRegExp in the AutoIt help. There is a fairly good reference in there for constructing string matching patterns. An on-line regex primer will get you started in understanding the reference.

Link to comment
Share on other sites

Well, to start with, you can use FileOpen (& later FileClose).

Then depending on file size, either FileRead (text as one lump) or FileReadLine.

That last is probably best, maybe in conjunction with _FileCountLines (or you will need to split the text up based on carriage returns and line feeds).

Each line as it is read, can then be passed through StringSplit to get an array of what you want.

You can if needed, then do StringIsDigit checks, etc for your numbered sections.

And like has been suggested, you could use StringRegExp ... if you can grasp its usage, which is not for beginners.

If you whip up some code, and if needed, put it in the proper section (General Help & Support) instead of Dev Chat, then you should receive plenty of help to continue on from there.

The Help file is your friend though, and should be your first port of call.

Edited by TheSaint

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

What you want to do is relatively easy bread and butter to do with AutoIt. You will get more precise help if you post a small sample of one of your input files (20 lines should be enough) together with another file with the input data changed to how you want the output csv to appear.

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

Another route would be to directly insert captured and filtered data into the database itself directly, instead of going thru a CSV file in between.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks to All for the kind replay.

As far I know, the better result are made looking in the help file.

So I've come on this result (is not quick as I thought, but it helps anyway):

; conversione file datalogger - COPYRIGHT 2013 Ronchese Carlo

; escludere le righe vuote

; converte sequenze di spazio in punto e virgola (delimitatori)

; se la riga inizia con 'Created on' i 10 caratteri successivi devono essere copiati nella prima colonna

; se il 2^ carattere di ogni riga contiene '0' '1' '2' allora importa la riga

#include <File.au3>

$file = "c:252_12.txt" ;posizione e nome del file del logger

Local $d ;data

Local $t ;tempo

Local $i ;linea numero

Local $a ;?

Local $mid ;secondo carattere di ogni riga

Local $s ;stringa da memorizzare

Local $h ;intestazioni di colonna

$h = "Date;Timestamp;Taste;MotorPosition;DeltaN1;DeltaN2;AlfaRif;dTIME_MAC1;dTIME_MAC2;MAC_POT1;MAC_POT2;MaxMotorPos;AlfaRun;MotorOnRinse;MotorOnTurnOff;Rise_time;Down_time;Erog_time;"

Local $out = FileOpen("c:test.txt", 1) ;crea un file in scrittura per accodare le linee

FileOpen($file, 0) ;legge il file di origine dal logger

FileWriteLine($out,$h) ;scrive l'intestazione delle colonne

For $i = 1 to _FileCountLines($file)

$line = FileReadLine($file, $i) ;legge la riga numero...

$line = StringStripWS($line, 4) ;toglie gli spazi multipli

$line = StringReplace($line," ",";") ;gli spazi singoli vengono sostituiti con il punto e virgola

$line = StringReplace($line,".",":") ;il punto viene cambiato con i due punti

$t = StringMid($line,1,9) ;memorizza l'ora

$mid = StringMid($line, 2, 1) ;legge il secondo carattere della riga

If $mid = "r" Then $d=StringMid($line, 12, 10) ;se contiene la lettera 'r' allora memorizza la data

$s = $d & $t & StringMid($line,14) ;la linea da memorizzare

ToolTip($s,100,100,"Scrivendo...") ;visualizza la riga che va a scrivere

Switch $mid

Case "0"

FileWriteLine($out,$s)

;ControlSend("[Class:Notepad]","","Edit1",$d & $t & StringMid($line,14))

;ControlSend("[Class:Notepad]","","Edit1",@CR)

Case "1"

FileWriteLine($out,$s)

;ControlSend("[Class:Notepad]","","Edit1",$d & $t & StringMid($line,14))

;ControlSend("[Class:Notepad]","","Edit1",@CR)

Case "2"

FileWriteLine($out,$s)

;ControlSend("[Class:Notepad]","","Edit1",$d & $t & StringMid($line,14))

;ControlSend("[Class:Notepad]","","Edit1",@CR)

EndSwitch

Next

FileClose($file) ;chiude il file del logger

FileClose("test.txt") ;chiude il file convertito

That's all.

Because of large amount of data (before conversion about 13Mb, after conversion about 3,5Mb) I want to leave the .csv as backup of logging. With VBA (yep, really simple) and MsAccess I'll use a temp table where import the data, do some filtering with queryes, and export the result on reports.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...