Jump to content

Tsv to Csv question about performance


rootx
 Share

Recommended Posts

Hi guys, which is the fastest way to convert Tsv Database to Csv? I write on the fly this.... If you need a DB test try this https://datasets.imdbws.com/

#include <Array.au3>
#include <File.au3>
#include <MsgBoxConstants.au3>
#include <Date.au3>

Local $hTimer = TimerInit()

Local $rArr,$nFile = @ScriptDir&"\new.csv"

$F = _FileReadToArray(@ScriptDir&"\data.tsv",$rArr)

FileOpen($nFile)

For $x = 0 to UBound($rArr)-1
    FileWrite($nFile,StringReplace(StringReplace($rArr[$x],@TAB,","),"\N","")&@CRLF)
    ConsoleWrite(StringReplace(StringReplace($rArr[$x],@TAB,","),"\N","")&@CRLF)
Next
FileClose($nFile)

Local $fDiff = TimerDiff($hTimer)

MsgBox("Finito","Time: ",_Convert($fDiff))

Func _Convert($ms)
   Local $day, $hour, $min, $sec
   _TicksToTime($ms, $hour, $min, $sec)
   If $hour > 24 Then
       $day = $hour/24
       $hour = Mod($hour, 24)
   EndIf
   Return StringFormat("DAY:%02i,HOUR:%02i,MIN:%02i,SEC:%02i", $day, $hour, $min, $sec)
EndFunc

 

Link to comment
Share on other sites

You could try using my _CSVsplit(), changing the delimiter to TAB. Then use _ArrayToCSV() to create the CSV. It probably won't be very fast, but syntax issues are less likely to occur.
https://www.autoitscript.com/forum/topic/155748-csvsplit/

Edited by czardas
Link to comment
Share on other sites

Would this work for you, provided you're positive that input file exists?

Local $nFile = @ScriptDir & "\new.csv"
FileDelete($nFile)
FileWrite($nFile, StringRegExpReplace(FileRead(@ScriptDir & "\data.tsv"), "\t", ","))

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thx, you forgot \N :bye:

Lines 4729498,  382Mb dimension, on my I7-6700HQ 2.60Ghz  only 12 seconds! Nice

#include <File.au3>
#include <MsgBoxConstants.au3>
#include <Date.au3>

Local $hTimer = TimerInit()

Local $nFile = @ScriptDir & "\new.csv"
FileDelete($nFile)
FileWrite($nFile, StringRegExpReplace(FileRead(@ScriptDir & "\data.tsv"), "[\t\\N]", ","))
Local $fDiff = TimerDiff($hTimer)

MsgBox("Finito","Time: ",_Convert($fDiff))

Func _Convert($ms)
   Local $day, $hour, $min, $sec
   _TicksToTime($ms, $hour, $min, $sec)
   If $hour > 24 Then
       $day = $hour/24
       $hour = Mod($hour, 24)
   EndIf
   Return StringFormat("DAY:%02i,HOUR:%02i,MIN:%02i,SEC:%02i", $day, $hour, $min, $sec)
EndFunc

 

Link to comment
Share on other sites

In a NFC-conforming .TSV file there is no need to handle \ nor N nor \N specifically. Perhaps your input format uses a non-standard convention.

I wonder what you mean by \N as in previous posts (using StringReplace) it would mean the ASCII sequence \N literally, while with your last modification (using StringRegexReplace) it means a backslash or character N.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I have a lot of records like this

tt0000004,short,Un bon bock,Un bon bock,0,1892,,\N,,\N,,,Animation,Short

so it must be deleted to have an empty field.

tt0000004,short,Un bon bock,Un bon bock,0,1892,,,,,,,Animation,Short

 

Link to comment
Share on other sites

That's a peculiar need indeed.

Then previous codes won't work as expected. This shoud work and still run fast:

Local $hTimer = TimerInit()

Local $nFile = @ScriptDir & "\new.csv"
FileDelete($nFile)
Local $sText = FileRead(@ScriptDir & "\data.tsv")
$sText = StringReplace($sText, "\N", "", 0, $STR_NOCASESENSEBASIC)
FileWrite($nFile, StringReplace($sText, @TAB, ",", 0, $STR_NOCASESENSEBASIC))
Local $fDiff = TimerDiff($hTimer)

MsgBox("Finito","Time: ",_Convert($fDiff))

Func _Convert($ms)
   Local $day, $hour, $min, $sec
   _TicksToTime($ms, $hour, $min, $sec)
   If $hour > 24 Then
       $day = $hour/24
       $hour = Mod($hour, 24)
   EndIf
   Return StringFormat("DAY:%02i,HOUR:%02i,MIN:%02i,SEC:%02i", $day, $hour, $min, $sec)
EndFunc

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

OK Thx again, my Db source is IMDB and you can download it from the official website https://datasets.imdbws.com/ . Another question about Sqlite limits.. https://www.sqlite.org/limits.html  , do you think it is possible to manage a database with 5 million records in Sqlite without slowing down? Only simple search queries of the movie name field, I need a portable solution.

Link to comment
Share on other sites

The largest SQLite DB I've had to do work with was greater than 120 Tb years ago and was hit 24/7 by hundreds of users. OK, it ran on fine tuned not-so-special hardware server (yet with loads of RAM) with very fined-tuned SQLite code, and I do mean very, very optimized code. But the competitive "solution" by Oracle was 25 times more expensive with inferior results.

Remember that your GPS is most probably using a GIS system sit on an SQLite DB. Realize how many polygons of all kinds such a DB must hold to represent your country roads, plus all extra information.

In your simple use case (5M rows is a baby DB), all you have to do is carefully plan your DB layout according to your actual needs and think twice about the queries you'll find most useful next year. You may want to create an FTS5 virtual table to lookup partial titles or words from titles for instance, support Unicode search for foreign characters, a.s.o. Deduce from SQLite "EXPLAIN QUERY PLAN ..." which index will help your actual queries, but don't rely on what you believe will help.

I recommend you use a good SQLite DB manager (e.g. SQLite Expert) to start playing with a simple design and adjust it to satisfy your needs. Don't rush coding anything in AutoIt first, do this last.

If you need guidance for a skeleton setup, just ask.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

THX THX THX, My goal is learn Sqlite3 Db because is portable multiplatform and free, I'll explain my problem, I wrote some applications that run queries on gmaps api with reverse geocode etc .. there is a limit to the queries 4day  I would like to preserve the results and use two searches, the first on my local db the second on gmaps in case it fails. ( PS... I know other solutions like open map, but they are not as accurate as gmaps) To practice I used a table of a baby DB from IMDB...

SQLite Expert is Free 4 personal use and work fine and thx 4 FTS5 approach.

My last question how can I expand memory limit 4 Sqlite?? Like query_cache_limit etc... in Mysql?

Link to comment
Share on other sites

You can increase default cache size for a given session by using a pragma (see "PRAGMA schema.cache_size = pages;") and DB page size is defined at DB creation or changeable after that (see PRAGMA schema.page_size = bytes;").

Select a DB page size which works well for your use case and a cache size reasonnably wide enough. Don't waste too much time trying to fine-tune this until much later).

You may also benefit from the json support built in SQLite, in case your API yields json.

The reference site for documentation and more is of course http://www.sqlite.org/

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

SQLite json1 extension first appeared as an optional loadable module in version 3.9.0 dated 2015-10-14.
It's now part of the amalgamation (a large monolithic C source with most used defaults) yet requires a compile directive to be built into the resulting DLL.

It's easy to use it without messing with any compiler toolchain: current SQLite Expert version 5 offers it out of the box, so you can play with json functions right after installing it. Expert v5 is now stable enough to be used for almost any purpose, even if I have some griefs, features requests and small bugs to submit to Bogdan (the author).

As a first step, familiarize yourself with the documentation and try the examples there, by using the SQL tab of Expert:
http://www.sqlite.org/json1.html

Once you feel comfortable with most/all json functions, confront with your actual use case(s). It works like a charm for both efficiently producing json-formatted output suitable for external consumption (e.g. to web server) or manipulating stored json data, all within SQLite fast code.

A number of loadable extensions appeared alongside mainstream SQLite code, as you can see in the release history:
http://www.sqlite.org/changes.html
Search the page for the word "extension".

Enjoy.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...