Jump to content
Sign in to follow this  
deezed

Multiple instances of autoit writing to the same file simultaneously.....

Recommended Posts

deezed

Hey guys,

Ok ill try to describe my issue as best i can, here goes :

So i have a script that simply

- reads a RANDOM line in rather large file of ~30,000 lines

- does it's thing with the line it picked

- after it's done, i use the _FileWriteToLine("file", $linenumber, "", 1) command to delete the line i just used, because it's not needed anymore.

-program is then looped to begin the same process again

Now, if i leave it at that, then obivously it works fine without a hitch. However, i thought it'd be faster if i just compiled the script and ran multiple copies of it.... it shouldn't ever read the same line across copies because i delete the lines i use, right? A multi-threading solution of sorts.

WRONG.

The issue i come across is that the file starts getting chopped down (sometimes by thousands of lines at a time) very quickly. I assume its becuase i have multiple handles of the file being written to at the same time or something along those lines (intended pun), and it just bugs out. Now im stuck thinking of a solution.

I've tried multiple 'filedelete' and 'filewrite' functions posted on this board to no avail, the textfile still gets messed up in a short amount of time.

Im guessing i need something in the script that 'checks' if the file is currently being 'write' accessed, and to wait until the file is not accessed anymore to proceed with the line deletion.

My goal is to run at least 5 copies simultaneously to get the job done quicker.

Any tips? Or is there no solution to my conundrum?

Share this post


Link to post
Share on other sites
czardas

Hmm, not at all sure about this. If anything, I suspect that running multiple instances would make the whole process slower. It isn't going to create any more RAM. With dual core you can run two instances to gain (I imagine) double speed, and with a quad core four instances to gain even more speed. Don't ask me how it works though. I suggest you rethink your idea. Writing to the same file simultaneously is not recommended even with multi-core.

Edited by czardas

Share this post


Link to post
Share on other sites
water

I would do it in a single script and try to do it all in memory. So don't read a line, process it and write/delete the line. Use _FileReadToArray to read the whole file into memory, process the array and write the results back to disk.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
deezed

But then how would i go about speeding it up using multiple compiled exes? I can run the script by itself without any problems whatsoever, its just slow. When i try to run multiple instances of the script is where i run into problems, i dont see how each of them having their own seperate memory would solve my problem as i want them all to read/write to one single file. I was under the impression that multiple running autoit compiled scripts can't all share / modify one array thats in memory, they'd each have their own seperate ones..... which defeats my purpose

Im *assuming* that my problem would be solved by having some sort of function that checks / detects if a file is being presently written to, to make the script sleep until the file isn't being written to anymore, and then proceed to write to that file when its 'free'.....

My current theory as to why this isnt working is because multiple instances of my compiled exe script are trying to write to a file at the same time and thats causing my bugs.

Here's a very bastardized version of what i figure the solution *MIGHT* be

while 1

global $howmanylines= _FileCountLines("file.txt") ;read how many lines there are

global $k = random(1,$howmanylines,1) ; take a random line number

global $fileline= filereadline("text.txt",$k) ; read that line number


.....
....
bunch of irrelevant code
....
....

;If $file is currently being written to, then                                ; this is my 'theory' on the solution, obviously not actual code
;sleep until file isn't being written to anymore
;endif                                                  

_FileWriteToLine("file.txt", $linenumber, "", 1) ; Delete the line we parsed from the beggining of the script

wend

something like that would maybe solve it? I have no idea since im a newb.

I hope i made it clear.

Just to re-iterate, i can run my sript (in scite or compiled) just fine, its that when i try to run *multiple* instances of said script is when i run into problems with it butchering file.txt, presumably due to multiple concurrent attempts to write to the same exact file.

Edited by deezed

Share this post


Link to post
Share on other sites
water

Your script is slow because it does so many I/O operations.

Read the file in one operation using _FileReadToArray, process all records in the array and write the result in one operation to disk.

That makes it a lot faster and doesn't require multiple instances.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
deezed

I understand what you're saying water, but my script isnt limited to reading/deleting text, thats a very small part of it (and ironically the part thats giving me the most trouble). It involves connecting to my webserver and sending over/receiving data as well, which is obviously a tad slower. A run through a loop of my script is ~10 seconds for example. So the bottleneck isn't the I/O operations as those are done relatively quickly. Im just trying to have it 'multi threaded' in the most sloppy and quickest way possible, and i thought it would work.

edit: made a boo-boo in my above code.

global $fileline= filereadline("text.txt",$k) ; read that line number

Should read

global $fileline= filereadline("file.txt",$k) ; read that line number

Edited by deezed

Share this post


Link to post
Share on other sites
water

Doing _FileCountLines for every record you process is very slow because the function reads thw whole file and counts the lines.

As I have only seen the file handling portion of your script I don't know what else slowes down the script.

Before doing some optimization I would recommend to do some time measurement.

Use TimerInit and TimerDiff to see how long each part of your script takes and where to enhance it.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
deezed

making the GET/POST requests to my webserver is what take the most time, i can tell you that for sure. those things take 95-99% of the time.

Optimizing these portions of my script wont make a big enough difference, i just want multiple instances working concurrently without bugs, thats it.

This is already getting out of hand, im asking relatively simple questions and im getting told something completely different here.

Share this post


Link to post
Share on other sites
water

On this forum we try to tell a man how to fish and not hand feed him for the rest of his life.

The "relatively simple question" you ask has already been answered in the first post: "Writing to the same file simultaneously is not recommended even with multi-core"

  • Like 1

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
deezed

I dont know if you're trolling me or if you lack reading comprehension.

Anyways we're in agreement here : I DONT WANT TO WRITE TO THE FILE SIMULTANEOUSLY, Infact, thats the thing im trying to *avoid*. So a function that detects whether a file is currently being written to, and then sleeps until that file isn't being written to anymore, then proceed with the script is *EXACTLY* what i want, as shown in my code example above. As i've stated, the bottleneck isnt the reading/writing, it's the web-stuff that im doing. Im probably going to have to solve this one myself because the so called mvps seem to be useless around here, i literally couldn't make it any clearer. Can't post until OCT 6 - 12:33 am due to the 5 post limit on new accounts, but ill try to have a solution and post it after that time, to help other people in the future.

Edited by deezed

Share this post


Link to post
Share on other sites
water

I dont know if you're trolling me or if you lack reading comprehension.

I suggest you slow down a bit!

Everyone who posted on this thread tries to help you! This isn't my first thread where the user asks for a solution which he can't implement himself but after some discussion and having an overall look at the script design the final solution looked completely different.

What I would do:

  • You need a script (Script A) that starts multiple instances (e.g. 10) of your processing script (Script B )
  • Let script A split the input file into multiple files (e.g. 10) with only a few thousand records
  • Pass each filename to the instance of Script B
  • This way every instance has its own file to work with. No multiple write access needed any more
Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
hannes08

Another solution would be to use a small Database instead of a plain text file.

Anyway listen to what water says as he knows what he's talking about. :)

  • Like 1

Regards,Hannes[spoiler]If you can't convince them, confuse them![/spoiler]

Share this post


Link to post
Share on other sites
water

Anyway listen to what water says as he knows what he's talking about. :)

Thanks for the compliment :>

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
czardas

I have no idea what this is about - only 30,000 lines. That can be handled in memory. Random line selection - very slow - shuffle the array using one of the excellent random array snippets somewhere around here. Loop through the shuffled array [bunch of irrelevant code] and dump the resulting lines into a file.

Share this post


Link to post
Share on other sites
water

The OP describes in post #8 that the time consuming processing is done with GET/POST requests to his webserver. That's why he wants to run multiple instances of his script.

(That's what I understand with my lack of reading comprehension ;) )


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
czardas

Yeah I read that, but I still don't understand how that changes things, or why it requires multiple instances of the script.

Edited by czardas

Share this post


Link to post
Share on other sites
water

The OP hasn't provided any timing information but I can imagine that processing 30000 records and interact with a webserver for each of this records can take quite long.

So if he can run the webserver part in parallel that should reduce the total run time.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
czardas

Well I would be interested to hear the actual justification behind this, if it is really is necessary to spawn multiple processes. I suppose it depends on the nature and the order of tasks. I can imagine it in some circumstances.

Edited by czardas

Share this post


Link to post
Share on other sites
water

This is the next question I would ask. But the OP seems to be quite reluctant to give us additional information.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-09-01 - Version 1.3.4.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
czardas

Well water, I didn't like the way he responded to you in post No 10, yet you still tried to help him. That's admirable!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×