Jump to content

Help make this faster :)


JRowe
 Share

Recommended Posts

I'm building a database generator for a hobby project. It takes a ginormous parts of speech text file and turns it into a sqlite database. I'm going to do some statistical text parsing and some other things that might generate some neat results.

Anyway, I have the generator coded and everything is good, except that it's slow. I've got the main loop encapsulated in For...Next, no sleeps... is there anything else I can do, aside from removing the GUICtrlSetData stuff, which I want, to monitor progress?

CODE
#include <ButtonConstants.au3>

#include <EditConstants.au3>

#include <GUIConstantsEx.au3>

#include <ProgressConstants.au3>

#include <StaticConstants.au3>

#include <WindowsConstants.au3>

#include <SQLite.au3>

#include <SQLite.dll.au3>

#Region ### START Koda GUI section ### Form=

$Form1 = GUICreate("Parts of Speech Database Generator", 455, 197, 192, 124)

$Progress1 = GUICtrlCreateProgress(8, 168, 438, 17)

$Label1 = GUICtrlCreateLabel("Lines Processed:", 168, 144, 85, 17)

$Label2 = GUICtrlCreateLabel("12345", 264, 144, 34, 17)

$Input1 = GUICtrlCreateInput("Input1", 64, 8, 361, 21)

$Label3 = GUICtrlCreateLabel("Line:", 32, 8, 27, 17)

$Label4 = GUICtrlCreateLabel("Phrase Size:", 32, 56, 63, 17)

$Label5 = GUICtrlCreateLabel("Homonyms:", 36, 72, 59, 17)

$Label6 = GUICtrlCreateLabel("Word or Phrase:", 14, 40, 81, 17)

$Button1 = GUICtrlCreateButton("Pause", 152, 104, 75, 25, 0)

$Button2 = GUICtrlCreateButton("Resume", 232, 104, 75, 25, 0)

$Label7 = GUICtrlCreateLabel("Label7", 99, 40, 36, 17)

$Label8 = GUICtrlCreateLabel("Label8", 99, 56, 36, 17)

$Label9 = GUICtrlCreateLabel("Label9", 99, 72, 36, 17)

$Label10 = GUICtrlCreateLabel("HomonymID:", 199, 40, 65, 17)

$Label11 = GUICtrlCreateLabel("Part Of Speech:", 184, 71, 80, 17)

$Label12 = GUICtrlCreateLabel("Label12", 272, 40, 42, 17)

$Label13 = GUICtrlCreateLabel("Label13", 272, 71, 42, 17)

GUISetState(@SW_SHOW)

#EndRegion ### END Koda GUI section ###

;Load Text File

$file = FileOpen("part-of-speech.txt", 0)

If $file = -1 Then

MsgBox(0, "Error", "Unable to open file.")

Exit

EndIf

MsgBox(0,"File Loaded", "Loading Complete")

;LoadDataBase

_SQLite_Startup()

$database =_SQLite_Open("POSDatabase.db")

$CurrentPos = 1

For $x = 0 to 295172 Step 1

$nMsg = GUIGetMsg()

Switch $nMsg

Case $GUI_EVENT_CLOSE

_SQLite_Close()

_SQLite_Shutdown()

Exit

Case $Button1

Pause()

EndSwitch

ProcessLine($CurrentPos)

$CurrentPos +=1

GUICtrlSetData($Progress1, Round(($CurrentPos/295172)*100)+1)

GuiCtrlSetData($Label2, $CurrentPos)

Next

MsgBox(0, "Wow", "All Done!")

_SQLite_Close()

_SQLite_Shutdown()

Exit

Func ProcessLine($position)

$line = FileReadLine($file, $position)

$a = StringRegExp($line, "[\w!'-.]+", 3)

$b = UBound($a)-1

$word = $a[0]

For $i = 1 to $b-1 step 1

$word = $word & " " & $a[$i]

Next

;Get an array of all parts of speech, even if only 1

$MeaningsArray = StringSplit($a[$b], "")

$NumMeanings = UBound($MeaningsArray)-1

$HomonymID = $b

For $loops = 1 to $NumMeanings Step 1

$currentMeaning = $MeaningsArray[$loops]

GUICtrlSetData($Label12, $loops)

GUICtrlSetData($Label13, $currentMeaning)

_SQLite_Exec($database,'INSERT INTO PartsOfSpeech (HomonymID,NumMeanings,PartOfSpeech,PhraseSize,Word) VALUES ("'&$loops&'","'&$NumMeanings&'","'&$currentMeaning&'","'&$b&'","'&$word&'");')

Next

GUICtrlSetData($Input1, $line)

GUICtrlSetData($Label7, $word)

GUICtrlSetData($Label8, $:(

GUICtrlSetData($Label9, $NumMeanings)

EndFunc

Func Pause()

While 1

$nMsg = GUIGetMsg()

Switch $nMsg

Case $Button2

ExitLoop

EndSwitch

WEnd

EndFunc

Any ideas on what to do to make it faster? I don't mind running it forever, but if there's something that will speed it up significantly, I would greatly appreciate advice. :)

There's about 295172 +/- another 50k entries overall. At the end of the process, I'll be able to make sentence diagrams.

yay. really. :mellow:

I figured a single table approach will be fine this early, later on I'll isolate and refine any query performance issues. Anyway, thanks for taking a look!

Link to comment
Share on other sites

Will running this compiled instead of in scite make it any faster? (probably, and it's probably dumb of me to have to ask, but I've not encountered this before, lol.)

Link to comment
Share on other sites

Will running this compiled instead of in scite make it any faster? (probably, and it's probably dumb of me to have to ask, but I've not encountered this before, lol.)

Nope.

Scite = Autoit --> Script

"Compiled" = .exe --> Script -- >Autoit

Or something like that.

Link to comment
Share on other sites

I thought site might have a debug functionality hooked in there somewhere. Anyway, at least I can mess with the script and just leave the compiled one running in the background. 8300 done so far :mellow:

Link to comment
Share on other sites

Well, 17 hours later, I'm at 230k, and it's getting progressively slower. It's down to about 3 entries every 2 seconds.

Is there a way to speed up either the SQL or any other aspect? Including setting higher process priority or giving AutoIt more RAM... anything would help at this point.

I don't want to mess with the priority or RAM settings unless I know what the effects will be. Thanks :mellow:

According to my quick and dirty calculations, it will take around 18 hours to finish the last 65k entries, and if I could speed that up I'd be really happy :(

Edited by jrowe
Link to comment
Share on other sites

Without the file or part of I can't test this

#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <ProgressConstants.au3>
#include <StaticConstants.au3>
#include <WindowsConstants.au3>
#include <SQLite.au3>
#include <SQLite.dll.au3>

#Region ### START Koda GUI section ### Form=
$Form1 = GUICreate("Parts of Speech Database Generator", 455, 197, 192, 124)
$Progress1 = GUICtrlCreateProgress(8, 168, 438, 17)
$Label1 = GUICtrlCreateLabel("Lines Processed:", 168, 144, 85, 17)
$Label2 = GUICtrlCreateLabel("12345", 264, 144, 34, 17)
$Input1 = GUICtrlCreateInput("Input1", 64, 8, 361, 21)
$Label3 = GUICtrlCreateLabel("Line:", 32, 8, 27, 17)
$Label4 = GUICtrlCreateLabel("Phrase Size:", 32, 56, 63, 17)
$Label5 = GUICtrlCreateLabel("Homonyms:", 36, 72, 59, 17)
$Label6 = GUICtrlCreateLabel("Word or Phrase:", 14, 40, 81, 17)
$Button1 = GUICtrlCreateButton("Pause", 152, 104, 75, 25, 0)
$Button2 = GUICtrlCreateButton("Resume", 232, 104, 75, 25, 0)
$Label7 = GUICtrlCreateLabel("Label7", 99, 40, 36, 17)
$Label8 = GUICtrlCreateLabel("Label8", 99, 56, 36, 17)
$Label9 = GUICtrlCreateLabel("Label9", 99, 72, 36, 17)
$Label10 = GUICtrlCreateLabel("HomonymID:", 199, 40, 65, 17)
$Label11 = GUICtrlCreateLabel("Part Of Speech:", 184, 71, 80, 17)
$Label12 = GUICtrlCreateLabel("Label12", 272, 40, 42, 17)
$Label13 = GUICtrlCreateLabel("Label13", 272, 71, 42, 17)
GUISetState(@SW_SHOW)
#EndRegion ### END Koda GUI section ###


;Load Text File
$file = FileOpen("part-of-speech.txt", 0)
If $file = -1 Then
MsgBox(0, "Error", "Unable to open file.")
Exit
EndIf
MsgBox(0,"File Loaded", "Loading Complete")


;LoadDataBase
_SQLite_Startup()
$database =_SQLite_Open("POSDatabase.db")

$CurrentPos = 1

$Str = FileRead($file)

$aFile = StringSplit($Str,@crlf,1)

_SQLite_Exec($database,"begin")
For $x = 1 to Ubound($aFile) -1;295172 Step 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
        _SQLite_Exec($database,"End")
        _SQLite_Close()
        _SQLite_Shutdown()
        Exit
        Case $Button1
        Pause()
    EndSwitch
    ProcessLine($aFile[$x])
;ProcessLine($CurrentPos)
    $CurrentPos +=1
    GUICtrlSetData($Progress1, Round(($CurrentPos/295172)*100)+1)
    GuiCtrlSetData($Label2, $CurrentPos)
    
    If StringRight($x,3) = "000" then;Every 1000 entries make it save the journal file
        _SQLite_Exec($database,"End")
        _SQLite_Exec($database,"Begin")
    EndIf
    
Next

_SQLite_Exec($database,"End")
MsgBox(0, "Wow", "All Done!")
_SQLite_Close()
_SQLite_Shutdown()
Exit

Func ProcessLine($line);line is now already text rather than a line position $position 
;$line = FileReadLine($file, $position)
    $a = StringRegExp($line, "[\w!'-.]+", 3)
    $b = UBound($a)-1
    $word = $a[0]
    
    For $i = 1 to $b-1 step 1
        $word = $word & " " & $a[$i]
    Next 
;Get an array of all parts of speech, even if only 1
    $MeaningsArray = StringSplit($a[$b], "")
    $NumMeanings = UBound($MeaningsArray)-1
    $HomonymID = $b
    For $loops = 1 to $NumMeanings Step 1
        $currentMeaning = $MeaningsArray[$loops]
        GUICtrlSetData($Label12, $loops)
        GUICtrlSetData($Label13, $currentMeaning)
        _SQLite_Exec($database,'INSERT INTO PartsOfSpeech (HomonymID,NumMeanings,PartOfSpeech,PhraseSize,Word) VALUES ("'&$loops&'","'&$NumMeanings&'","'&$currentMeaning&'","'&$b&'","'&$word&'");')
    Next
    GUICtrlSetData($Input1, $line)
    GUICtrlSetData($Label7, $word)
    GUICtrlSetData($Label8, $b)
    GUICtrlSetData($Label9, $NumMeanings)
EndFunc

Func Pause()
    _SQLite_Exec($database,"End")
    While 1
        $nMsg = GUIGetMsg()
        Switch $nMsg
            Case $Button2
            ExitLoop
        EndSwitch
    WEnd
    _SQLite_Exec($database,"Begin")
EndFunc
Edited by ChrisL
Link to comment
Share on other sites

Ahh, sorry, it's at http://downloads.sourceforge.net/wordlist/pos-1.zip , and an empty database is here, just rename to .db POSDatabase.txt.

Thanks. I'll give it a go, I'll be generating more databases. Not sure how much is left on this one, it's been going for 33 hours now, lol. As soon as I check the other computer I'll post an update.

Link to comment
Share on other sites

Ahh, sorry, it's at http://downloads.sourceforge.net/wordlist/pos-1.zip , and an empty database is here, just rename to .db POSDatabase.txt.

Thanks. I'll give it a go, I'll be generating more databases. Not sure how much is left on this one, it's been going for 33 hours now, lol. As soon as I check the other computer I'll post an update.

My version seems to go like chuff!

On my laptop it is doing 1000 lines in about 14 seconds

Edited by ChrisL
Link to comment
Share on other sites

Added estimated time based on the last 1000 entries (although maths was never my strong point so I may have got this arse about face!) and removed the counter you don't need

#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <ProgressConstants.au3>
#include <StaticConstants.au3>
#include <WindowsConstants.au3>
#include <SQLite.au3>
#include <SQLite.dll.au3>
#Include <Date.au3>

#Region ### START Koda GUI section ### Form=
$Form1 = GUICreate("Parts of Speech Database Generator", 455, 197, 192, 124)
$Progress1 = GUICtrlCreateProgress(8, 168, 438, 17)
$Label1 = GUICtrlCreateLabel("Lines Processed:", 168, 144, 85, 17)
$Label2 = GUICtrlCreateLabel("12345", 264, 144, 84, 17)
$Input1 = GUICtrlCreateInput("Input1", 64, 8, 361, 21)
$Label3 = GUICtrlCreateLabel("Line:", 32, 8, 27, 17)
$Label4 = GUICtrlCreateLabel("Phrase Size:", 32, 56, 63, 17)
$Label5 = GUICtrlCreateLabel("Homonyms:", 36, 72, 59, 17)
$Label6 = GUICtrlCreateLabel("Word or Phrase:", 14, 40, 81, 17)
$Button1 = GUICtrlCreateButton("Pause", 152, 104, 75, 25, 0)
$Button2 = GUICtrlCreateButton("Resume", 232, 104, 75, 25, 0)
$Label7 = GUICtrlCreateLabel("Label7", 99, 40, 36, 17)
$Label8 = GUICtrlCreateLabel("Label8", 99, 56, 36, 17)
$Label9 = GUICtrlCreateLabel("Label9", 99, 72, 36, 17)
$Label10 = GUICtrlCreateLabel("HomonymID:", 199, 40, 65, 17)
$Label11 = GUICtrlCreateLabel("Part Of Speech:", 184, 71, 80, 17)
$Label12 = GUICtrlCreateLabel("Label12", 272, 40, 42, 17)
$Label13 = GUICtrlCreateLabel("Label13", 272, 71, 42, 17)
$Label_timeRemaining = GuiCtrlCreateLabel("", 10, 134, 105, 27)
GUISetState(@SW_SHOW)
#EndRegion ### END Koda GUI section ###

Local $Secs, $Mins, $Hour

;Load Text File
$file = FileOpen("part-of-speech.txt", 0)
If $file = -1 Then
MsgBox(0, "Error", "Unable to open file.")
Exit
EndIf
MsgBox(0,"File Loaded", "Loading Complete")


;LoadDataBase
_SQLite_Startup()
$database =_SQLite_Open("POSDatabase.db")

;$CurrentPos = 1;Dont need this

$Str = FileRead($file)

$aFile = StringSplit($Str,@crlf,1)

$str =""

_SQLite_Exec($database,"Begin")
$qty = Ubound($aFile) -1
$iTimer = TimerInit()
For $x = 1 to $qty;295172 Step 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
        _SQLite_Exec($database,"End")
        _SQLite_Close()
        _SQLite_Shutdown()
        Exit
        Case $Button1
        Pause()
    EndSwitch
    ProcessLine($aFile[$x])
;ProcessLine($CurrentPos)
;$CurrentPos +=1;Dont need this
    GUICtrlSetData($Progress1, Round(($x/$qty)*100)+1)
    GuiCtrlSetData($Label2, $x & "/" & $qty)
    
    If StringRight($x,3) = "000" then;Every 1000 entries make it save the journal file
        $Diff = Int(TimerDiff($iTimer))
        $TicksRemaining = (($qty - $x) / 1000 ) * $Diff 
        _TicksToTime($TicksRemaining, $Hour, $Mins, $Secs)
        GuiCtrlSetData($Label_timeRemaining,"Estimated remaining " & StringFormat("%02i:%02i:%02i", $Hour, $Mins, $Secs))
        _SQLite_Exec($database,"End")
        _SQLite_Exec($database,"Begin")
        $iTimer = TImerInit()
    EndIf
    
Next

_SQLite_Exec($database,"End")
MsgBox(0, "Wow", "All Done!")
_SQLite_Close()
_SQLite_Shutdown()
Exit

Func ProcessLine($line);line is now already text rather than a line position $position 
;$line = FileReadLine($file, $position)
    $a = StringRegExp($line, "[\w!'-.]+", 3)
    $b = UBound($a)-1
    $word = $a[0]
    
    For $i = 1 to $b-1 step 1
        $word = $word & " " & $a[$i]
    Next 
;Get an array of all parts of speech, even if only 1
    $MeaningsArray = StringSplit($a[$b], "")
    $NumMeanings = UBound($MeaningsArray)-1
    $HomonymID = $b
    For $loops = 1 to $NumMeanings Step 1
        $currentMeaning = $MeaningsArray[$loops]
        GUICtrlSetData($Label12, $loops)
        GUICtrlSetData($Label13, $currentMeaning)
        _SQLite_Exec($database,'INSERT INTO PartsOfSpeech (HomonymID,NumMeanings,PartOfSpeech,PhraseSize,Word) VALUES ("'&$loops&'","'&$NumMeanings&'","'&$currentMeaning&'","'&$b&'","'&$word&'");')
    Next
    GUICtrlSetData($Input1, $line)
    GUICtrlSetData($Label7, $word)
    GUICtrlSetData($Label8, $b)
    GUICtrlSetData($Label9, $NumMeanings)
EndFunc

Func Pause()
    _SQLite_Exec($database,"End")
    While 1
        $nMsg = GUIGetMsg()
        Switch $nMsg
            Case $Button2
            ExitLoop
            Case $GUI_EVENT_CLOSE
            _SQLite_Exec($database,"End")
            _SQLite_Close()
            _SQLite_Shutdown()
            Exit
        EndSwitch
    WEnd
    _SQLite_Exec($database,"Begin")
EndFunc

Edit: Set the $str back to nothing after it is an array just to free some memory

Edited by ChrisL
Link to comment
Share on other sites

Awesome, I appreciate that. So, what you did was save the database every 1000 entries, then restart, so the journal didn't get too large, and tweaked some of the data stuff.

The other version finished, at least :( I'll use this version for the definitions database, and then the wordnet links and so on. Much thanks! :mellow:

Link to comment
Share on other sites

Awesome, I appreciate that. So, what you did was save the database every 1000 entries, then restart, so the journal didn't get too large, and tweaked some of the data stuff.

The other version finished, at least :( I'll use this version for the definitions database, and then the wordnet links and so on. Much thanks! :mellow:

Well yeah that was part of it, the real speed increase was to load the file in to memory and split it in to an array of lines. Doing a file read line each time was crippling your speed. But writing chuncks in to the database is also faster

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...