Stew Posted February 12, 2019 Posted February 12, 2019 (edited) (Edited from original. Please note that I AM NOT AN AUTOIT EXPERT. I write code using Autoit frequently but I am no expert, especially when it comes to I/O. So any remarks that start with "Why did you..." can be answered by referring to the first sentence. This project was done in Autoit because of an interface I built to display the data.) Attached is a program and ascii input file I wrote to read stock price data, convert it to binary and then read it back into the program in binary. The goal was to show increased performance for reading the files in binary and provide a demo on how to read/write binary for int32, int64, double and strings for anyone who might find it helpful. The results on my PC show the following: Time to read ascii file only: 456.981951167202 Ascii read & process time: 6061.83075631701 Binary write file time: 14787.9184635239 Time just to read binary file: 42.418867292311 Binary read and process time: 4515.16129830537 A couple things to note: 1) The 32 MB ascii file took 10x longer to read than the 15 MB binary file. Not entirely sure why. Both were read into a buffer. 2) The Binary write takes a long time but I made no effort to optimize this because the plan was to write this file one time only so I don't mind if it takes longer to write this file. I care much more about how long it takes to read the file because I will be reading it many times. 3) There was a modest gain in converting the ascii file to binary in terms of file size and reading speed. So big picture... not sure it's worth the effort to convert the files to binary even though most of the data is numerical data in the binary file. That was actually surprising as I expected there would be more of a difference. Any ideas on how to get the binary data to read at a faster rate would be great. binary.au3 2019_02_08.zip Edited February 14, 2019 by Stew new version coffeeturtle 1
jchd Posted February 13, 2019 Posted February 13, 2019 (edited) Sorry but this is useless bloatware. Edited February 15, 2019 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Stew Posted February 13, 2019 Author Posted February 13, 2019 5 hours ago, jchd said: Sorry but this is useless bloatware. Based upon this naive response I'm guessing you never had to deal with massive amounts of data. For anyone having to read large data files it is very important to be able to read/write binary. I have massive amounts of ascii stock price data that takes long periods to read as ascii files. By converting the files to binary my software can now read them in a fraction of the time it previously required. Having seen how many posts there are on these forums for programmers trying to figure out how to read/write binary using Autoit I expect it will be helpful to many others as well. If not, no problem. Few photons and electrons were injured in the creation of this forum posting.
Earthshine Posted February 13, 2019 Posted February 13, 2019 (edited) he is referring to the fact that this functionality already exists in AutoIt proper. BinaryToString and StringToBinary are already an established thing. the Help file also details how to use them. Anyone can write a loop that reads in data and writes out binary with such functions. This was kind of like reinventing the wheel I guess. still, thanks for the effort and, if it works well for you, then you must be doing something right! happy programming. Edited February 13, 2019 by Earthshine My resources are limited. You must ask the right questions
Stew Posted February 13, 2019 Author Posted February 13, 2019 Thanks for your response. I think a detailed explanation may be helpful. Let's go through an example that may make it clear the importance of not treating numerical data as a string. Let's say I have a large number: 2141231231 (10 characters) I have two options to write that number to a file... as a string of 11 bytes (written either in ascii or binary with a deliminator) or as an Int32 which is 4 bytes (written in binary). The BinaryToString and StringToBinary will treat this as 11 bytes so the file size is larger but admittedly dependent on the number of characters in the number. The file size could potentially be smaller if the numerical data on average was made up of numbers from -9 to 99 (remember, still need a 1 char deliminator). But that's not the real problem. After reading in this data as a string you still need to convert this string to a number so that it can be used in mathematical calculations. The string is of no value for calculations. That requires several steps with a computational penalty for each step. For every data point I read in binary as a string I first have to use BinaryToString to get the data in string format, I then have to use StringSplit to separate the large string into individual strings for each number and finally I have to use the Number algorithm to convert the string into a number. When you have millions of data points this is very slow and unnecessary. Much better to write the data in binary as an Int32, Int64 or Double and read it in binary as the same data type. By doing that I eliminate BinaryToString, StringSplit and Number from the process. For small file sizes this is a non-issue in regards to time savings although I think it's actually easier to read/write these files in the method I outlined than use multiple steps to convert between strings and numbers. But for large datasets the time savings in reading these files in the way I outlined is significant. You definitely don't want to read/write numbers as a string whenever dealing with a lot of data. Hope that helps. TheDcoder and coffeeturtle 2
Earthshine Posted February 13, 2019 Posted February 13, 2019 I didn't misunderstand you--there are other functions to change string to int or floats too before writing out to binary. I understand what you are doing but it can be done other ways, but whatever--you do your thing My resources are limited. You must ask the right questions
Moderators JLogan3o13 Posted February 13, 2019 Moderators Posted February 13, 2019 2 hours ago, Stew said: Based upon this naive response I'm guessing you never had to deal with massive amounts of data. Based on this naive response, you have made everyone on this forum who knows the depth of jchd's technical knowledge laugh. user4157124 1 "Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball How to get your question answered on this forum!
Stew Posted February 13, 2019 Author Posted February 13, 2019 Well maybe jchd's technical knowledge is good but communication skills could use some work. I'm guessing based upon his/her response that I probably didn't hurt their feelings too much but it's very nice that you stood up for them. Earthshine, actually I think there is still some misunderstanding... I am not converting strings to int or floats before writing out to binary. My data is already in int and float format in the computer memory as that is the form I need it in to do computations. I just want to write it out in that format without going through a string in the process as that is an unnecessary and computationally expensive step. The suggestion to use BinaryToString and StringToBinary would introduce this unnecessary step. I want to read/write as Int32, Int64 and Double directly and that is the subject of the code I posted. If anyone has a better way to read/write numerical data in the most computationally efficient manner possible, please post it here. My only goal is to minimize computational and i/o time in reading large numerical datasets. If I'm re-inventing the wheel then I (and I suspect many others) have missed the posting of the wheel somewhere on this forum or in the help files. Please tell me where you've hidden the wheel. I'd be thrilled if someone had an even faster way to read/write large files of numerical data than the method I came up with so please post code if you have it. I for one would use your code. TheDcoder 1
jchd Posted February 13, 2019 Posted February 13, 2019 I must apologize for my cryptic answer, hastily writen. It wasn't meant to be rude to you, just pointing out that such code isn't a valuable, efficient solution. Quote I have massive amounts of ascii stock price data that takes long periods to read as ascii files. By converting the files to binary my software can now read them in a fraction of the time it previously required. I feel this now turns out as a code optimization request, rather than a failed demo that plain vanilla code is inferior. I gently suggest you open a new thread in General Help forum with a significant sample of input data and your requirements for processing. I strongly suspect there are faster regular ways to process text (ASCII) data than convert the wole thing to binary then extract the wanted pieces out of a large chunk of converted binary. The goal of most of the fora here is to maximize usefulness of AutoIt-based code. So we're fully spot-on and there is a number of seasonned contributors here willing to provide as much help as possible. Earthshine and mLipok 2 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Earthshine Posted February 13, 2019 Posted February 13, 2019 Memory Mapped files comes to mind as a way to handle huge ascii files very quickly My resources are limited. You must ask the right questions
Stew Posted February 13, 2019 Author Posted February 13, 2019 jchd, thanks for your comments. I'm actually not requesting any code optimization. I'm also not suggesting the plain vanilla code is inferior. Autoit is an incredible tool as is. I was just trying to contribute some code that other programmers might find helpful. I'm certain you could make autoit read ascii files faster with some tweaking but it won't compete with reading the same data in binary format. It's simply impossible to do when you have to convert one using several steps and you don't have to convert the other. So I would not submit anything to the General Help forum that I know has no chance at success. Hopefully everyone recognizes this and won't waste time trying to do the impossible. Reading numerical data in binary will ALWAYS be much faster than reading it in ascii. So hopefully this series of blog posts makes things clear. The sample code I submitted in the first post is ideally suited for instances where: 1) The data files are large 2) The data is primarily numerical 3) The data files need to be read several times These set of criteria are actually quite common in with people doing research (like myself) and If these criteria are met then it's better to convert the files to binary one time and only read them in binary format going forward. The sample code I submitted can be used to accomplish this goal. That being said, the code is incredibly simple and can be used even if the files are not large and are not entirely numerical. As long as you are careful to read and write the data in the same way it will work for any file and ALWAYS be faster than reading the same set of data in an ascii file. Again, I was just trying to help other programmers. I suspect there will be programmers who need this for their research but even if I'm wrong about that at least I finally figured it out so I can use it in my own research. I have yet to see a forum post or example code explaining how to read and write all types of data in binary using autoit. Now there is 1.
Stew Posted February 14, 2019 Author Posted February 14, 2019 Forgot another application. If you are not dealing with large amounts of data but do have an application that does a lot of I/O to disk and some of the data is numerical data then you want to do that I/O in binary due to the increased speed. If the data is entirely string data then you will probably not see a significant change in speed between ascii and binary I/O.
Earthshine Posted February 14, 2019 Posted February 14, 2019 ASCII is all text My resources are limited. You must ask the right questions
jchd Posted February 14, 2019 Posted February 14, 2019 If/when I have to process a huge lot of numerical data provided in ASCII (for instance gazillions of points and other data from some geodesic source) I build a database from the source. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
argumentum Posted February 14, 2019 Posted February 14, 2019 On 2/12/2019 at 6:43 PM, Stew said: I wrote this sample script to help anyone trying to read and write binary files I believe this is what you ended up using as a solution for the question of your first posting back in '15. A file is a file that is a file, if that makes any sense. Changing the delimitation format to a more useful one for the case makes sense. As the above post say, using a standard database may be better. Standardized formats are peer reviewed and helps oneself from reinventing wheels.PS: By posting, one clarifies concepts. We all learn at the end. Thanks for sharing your ideas and findings. Earthshine 1 Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting.
Stew Posted February 14, 2019 Author Posted February 14, 2019 argumentum, you are correct. I haven't looked at this code in some time and decided this week to tackle the problem again and this was the solution. I have just been reading the ascii files instead. If you search the forum you will find I am not alone in trying to read/write binary numerical data. There are a several posts on this topic over several years. I have updated the original post above with the full code I use to read the stock price data in ascii, write it in binary and read it back in using binary. I included an ascii file to read in the event anyone wants to mess with the code.
argumentum Posted February 14, 2019 Posted February 14, 2019 3 hours ago, Stew said: If you search the forum you will find I am not alone in trying to read/write binary numerical data Again, a file is a file that is a file, if that makes any sense. And it usually does not. Even a folder is a file with a "Directory" attribute. All files are binary. Period. The ASCII table is an agreed standard. The file you provided is a comma delimited file, that could have been written in a way to know that is a string or number, by encapsulating the strings with " symbols and those not encapsulated would then be numbers. But the file generator did not do this, as is not really needed unless there is a comma in the string. So if you have a zero, in binary you'd write 0x00 ?, what if it is a real ?. CPUs work with integers. Long story short, your concept of binary read/write and what a character represents, is misconceptualized. If you feel you would welcome help, help is afoot. 4 hours ago, Stew said: I am not alone in trying to read/write binary numerical data There are many that are clueless. Read up. Right now, you may not welcome this posting but once you incorporate the proper knowing, it'll make sense. I write this in humble. I've had your views, before. Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting.
Stew Posted February 14, 2019 Author Posted February 14, 2019 Although I'm no Autoit expert, I've been writing complex software for computational fluid dynamics, 3D computer modeling, computer-aided diagnosis, natural language processing, etc. for 30 years. I've also written code that manipulates bits for 3D modeling, switch endian for reading DICOM files, etc. So I have a very good understanding of the way data is represented in a computer's memory and stored on a hard drive. We can get into esoteric discussions about whether data is real, binary, etc., etc. but obviously it's all binary. Most of us simply refer to the type of data we are working with and I'm primarily working with real numbers (float, double) and integers (int, long) for this project. If you really want to be helpful then suggest a faster method to do I/O in this application and I'll test your ideas. Or post some code. That would be even better! I don't care if you want to write ascii (text, strings or whatever you want to call it), binary, martian, etc. I just want it to be as fast as possible getting the data from the hard drive into the RAM and associated with the proper variables. I acknowledge that shoving all the data into a database for access is probably the best solution but this is just a fun project for me so I want to keep it as simple as possible. I'm not excited about introducing a database.
makeithere Posted December 26, 2019 Posted December 26, 2019 Here is one quick DICOM image reading folder opening folder reading software. Because of the endian the tags are reversed. If images are not in a subfolder, that is opened instead of the folder. The idea is to convert the binary parts to hex, and use regex to search and replace. There is probably a better way to do this, but this has worked for me, hope this helps
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now