Ghost_Line Posted 23 hours ago Posted 23 hours ago Hello, In a very near future, I will have to compare one 1.5M lines file with a 9M lines file (both CSV files), in order to see which of the first exists in the second. Usually, I tend to use loops for checking value per value from the first array with the second (by escaping my verification loop when the value is found), but considering the excessive amount of combinations here, I'm asking for advices. (to be cristal clear, it's about references who need to be deactivated: the main file of 9M is the total of ref in the main system, and there's only 1.5M refs in the second system, who need to be cleaned up by deactivation items who are not sold in store anymore) Thanks for the help !
argumentum Posted 22 hours ago Posted 22 hours ago I'd put all that in a SQLite DB and let the DB engine do the heavy lifting. Make a query and that should make it manageable. Maybe make a 512 hash of each line will simplify the DB query. Or to be able to compare 1D arrays. My 2 cents SOLVE-SMART and Ghost_Line 2 Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting.
SOLVE-SMART Posted 22 hours ago Posted 22 hours ago Hi @Ghost_Line, how your CSV structure (columns) look like? Which values do you want to compare? Without further information I simply suggest this: Bulk the data in a database (like SQLite). Do your comparison based on your criterias with SQL. Best regards Sven Ghost_Line 1 ==> AutoIt related: 🔗 Organization AutoIt Community, 🔗 GitHub, 🔗 Discord Server, 🔗 Cheat Sheet, 🔗 autoit-webdriver-boilerplate Spoiler 🌍 Au3Forums 🎲 AutoIt (en) Cheat Sheet 📊 AutoIt limits/defaults 💎 Code Katas: [...] (comming soon) 🎭 Collection of GitHub users with AutoIt projects 🐞 False-Positives 🔮 Me on GitHub 💬 Opinion about new forum sub category 📑 UDF wiki list ✂ VSCode-AutoItSnippets 📑 WebDriver FAQs 👨🏫 WebDriver Tutorial (coming soon)
Ghost_Line Posted 22 hours ago Author Posted 22 hours ago Just now, argumentum said: I'd put all that in a SQLite DB and let the DB engine do the heavy lifting. Make a query and that should make it manageable. Maybe make a 512 hash of each line will simplify the DB query. Or to be able to compare 1D arrays. My 2 cents Hi Argumentum, Tahnks for this quick answer ! By any chance, maybe you've a comprehensive example somewhere (one who is up to date with the state of the library as it exists now) ?
SOLVE-SMART Posted 22 hours ago Posted 22 hours ago 2 minutes ago, argumentum said: I'd put all that in a SQLite DB and let the DB engine do the heavy lifting Oh 😁 , same idea, but I am a bit too late @argumentum. Best regards Sven argumentum 1 ==> AutoIt related: 🔗 Organization AutoIt Community, 🔗 GitHub, 🔗 Discord Server, 🔗 Cheat Sheet, 🔗 autoit-webdriver-boilerplate Spoiler 🌍 Au3Forums 🎲 AutoIt (en) Cheat Sheet 📊 AutoIt limits/defaults 💎 Code Katas: [...] (comming soon) 🎭 Collection of GitHub users with AutoIt projects 🐞 False-Positives 🔮 Me on GitHub 💬 Opinion about new forum sub category 📑 UDF wiki list ✂ VSCode-AutoItSnippets 📑 WebDriver FAQs 👨🏫 WebDriver Tutorial (coming soon)
Solution SOLVE-SMART Posted 22 hours ago Solution Posted 22 hours ago 2 minutes ago, Ghost_Line said: By any chance, maybe you've a comprehensive example somewhere (one who is up to date with the state of the library as it exists now) ? There are many examples regarding SQLite handling. Maybe this example could help as a start. Best regards Sven Ghost_Line 1 ==> AutoIt related: 🔗 Organization AutoIt Community, 🔗 GitHub, 🔗 Discord Server, 🔗 Cheat Sheet, 🔗 autoit-webdriver-boilerplate Spoiler 🌍 Au3Forums 🎲 AutoIt (en) Cheat Sheet 📊 AutoIt limits/defaults 💎 Code Katas: [...] (comming soon) 🎭 Collection of GitHub users with AutoIt projects 🐞 False-Positives 🔮 Me on GitHub 💬 Opinion about new forum sub category 📑 UDF wiki list ✂ VSCode-AutoItSnippets 📑 WebDriver FAQs 👨🏫 WebDriver Tutorial (coming soon)
Ghost_Line Posted 22 hours ago Author Posted 22 hours ago (edited) 3 minutes ago, SOLVE-SMART said: Hi @Ghost_Line, how your CSV structure (columns) look like? Which values do you want to compare? Without further information I simply suggest this: Bulk the data in a database (like SQLite). Do your comparison based on your criterias with SQL. Best regards Sven Hi Sven, It will probably looks like something like that (shown in Excel for readability): (the first row is the ID of the item, the second one the color, the third one the size and the last one the barcode) I plan to compare on a combination of the 3 first rows (because of some internal issue on the barcode value in some cases). Edited 22 hours ago by Ghost_Line
argumentum Posted 22 hours ago Posted 22 hours ago 1 minute ago, Ghost_Line said: ...comprehensive... Yes..., no. Am anything but comprehensive But you've got the idea. Give it a try. Compare timings. @SOLVE-SMART has a good head, follow his advice Ghost_Line and SOLVE-SMART 2 Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting.
Nine Posted 22 hours ago Posted 22 hours ago Another solution is to have a small ASM injection code into the script. That way you should perform the comparaison in less than 1 sec. SOLVE-SMART 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy Interface Object based on Tag
SOLVE-SMART Posted 22 hours ago Posted 22 hours ago Off-topic: Spoiler 1 minute ago, argumentum said: @SOLVE-SMART has a good head, follow his advice Thanks for the 💐 . You deserve the same praise, because you had the idea a few seconds before me 😅 . Topic: 2 minutes ago, Nine said: Another solution is to have a small ASM injection code into the script. That way you should perform the comparaison in less than 1 sec. Wow, I could not do that - no experience so far with such a approach. But very interesting. May I ask for resources/references to learn about this @Nine? Best regards Sven ==> AutoIt related: 🔗 Organization AutoIt Community, 🔗 GitHub, 🔗 Discord Server, 🔗 Cheat Sheet, 🔗 autoit-webdriver-boilerplate Spoiler 🌍 Au3Forums 🎲 AutoIt (en) Cheat Sheet 📊 AutoIt limits/defaults 💎 Code Katas: [...] (comming soon) 🎭 Collection of GitHub users with AutoIt projects 🐞 False-Positives 🔮 Me on GitHub 💬 Opinion about new forum sub category 📑 UDF wiki list ✂ VSCode-AutoItSnippets 📑 WebDriver FAQs 👨🏫 WebDriver Tutorial (coming soon)
Nine Posted 22 hours ago Posted 22 hours ago 2 minutes ago, SOLVE-SMART said: May I ask for resources/references to learn about this This is what I use for creating the code : https://defuse.ca/online-x86-assembler.htm The instruction reference : https://www.felixcloutier.com/x86/ You will need to search for a more comprehensive tutorial about coding in assembler. There is quite a number of examples made here in the forum, that could help you get started. Good luck. SOLVE-SMART 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy Interface Object based on Tag
Ghost_Line Posted 22 hours ago Author Posted 22 hours ago (edited) 2 hours ago, Nine said: Another solution is to have a small ASM injection code into the script. That way you should perform the comparaison in less than 1 sec. Salut Nine, Merci pour la proposition, mais je suis bcp trop c*n pour comprendre comment faire une chose pareille 😅 (Thanks for the proposal, but I'm far too d*mb to understand how to do such a thing) Edited 20 hours ago by Ghost_Line
Nine Posted 22 hours ago Posted 22 hours ago @Ghost_Line It's always fun to try doing assembler when the comparaison is not too hefty. The good thing about this solution is that it does not require any external support out of the script. Give us an example of the two files, the precise rules and expected output. Someone here may decide to jump into it. Once you got the basic code, it will be quite easy for you to modify it if the rules change over the time. Ghost_Line 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy Interface Object based on Tag
Ghost_Line Posted 21 hours ago Author Posted 21 hours ago If you insist I will arrange the compared data in order to have what I have in this example on both side: I was thinking of concatenating part (or all) the rows into one unique value, to have something more easy to compare.
jchd Posted 20 hours ago Posted 20 hours ago As others have said, you can use SQLite to perform such task. Since both files are .CSV, you don't even have to write code at all: use the SQLite command-line tool to load each .CSV in its own table and perform the comparison. Something like: SELECT GA_CODEARTICLE1, GA_DESIGNATION1, GA_CODEDIM1 FROM <SMALLER FILE> INTERSECT SELECT GA_CODEARTICLE1, GA_DESIGNATION1, GA_CODEDIM1 FROM <LARGER FILE> It woiuld greatly help to create on index on those 3 column on each table before launching the comparison. Various CLI options let you work with memory- or disk-based DBs. You can as well obtain the output in .CSV in other format (e.g. HTML, JSON, ...). SOLVE-SMART and Ghost_Line 2 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Ghost_Line Posted 20 hours ago Author Posted 20 hours ago (edited) 38 minutes ago, jchd said: As others have said, you can use SQLite to perform such task. Since both files are .CSV, you don't even have to write code at all: use the SQLite command-line tool to load each .CSV in its own table and perform the comparison. Something like: SELECT GA_CODEARTICLE1, GA_DESIGNATION1, GA_CODEDIM1 FROM <SMALLER FILE> INTERSECT SELECT GA_CODEARTICLE1, GA_DESIGNATION1, GA_CODEDIM1 FROM <LARGER FILE> It woiuld greatly help to create on index on those 3 column on each table before launching the comparison. Various CLI options let you work with memory- or disk-based DBs. You can as well obtain the output in .CSV in other format (e.g. HTML, JSON, ...). Merci pour le coup de main, je vais creuser ça (thanks for your help, I'll explore this) Edited 20 hours ago by Ghost_Line
argumentum Posted 20 hours ago Posted 20 hours ago 7 minutes ago, Ghost_Line said: je vais creuser ça oh French. The language of love ... 🤔 Careful how you look into it ? Note to self: learn French Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting.
Ghost_Line Posted 20 hours ago Author Posted 20 hours ago 3 minutes ago, argumentum said: oh French. The language of love ... 🤔 Careful how you look into it ? Note to self: learn French 😄
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now