MrCheese Posted July 26, 2018 Posted July 26, 2018 Hi guys, See attached for an array example. to simplify what i want to achieve, I want to split this array into 9 different csv files. the first file would contain the list of "key" and the corresponding "ID1", the second would have "key" and the "ID2", the third would have "key" and "ID3" However, I want to remove all the rows that don't have an ID recorded in the respective ID2, ID3 4...5...6 etc, so the file only contains row items with a key and the ID. Would be the best way to loop through the rows and delete the row if the array field is blank - would I then need to repeat that row ID to check that the row that its replaced is also empty (ie the one after the one I just deleted)? I see this getting messy. or _arraySort, and delete everything below the last filled row? <-- this might be best? Or should I use the excel UDF, apply a filter (not selecting the blanks), then create/export to the array->csv? Super keen to hear your thoughts. thanks! IDArray.csv
FrancescoDiMuro Posted July 26, 2018 Posted July 26, 2018 Hi @MrCheese What if you read the file with _FileReadToArray(), and loop through the array to remove what you don't need, and then write the "cleaned" array back to the file with _FileWriteFromArray() ? Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
MrCheese Posted July 26, 2018 Author Posted July 26, 2018 (edited) so, i currently do that, to build the array in its entirety, its the looping to clean that I'm not sure about. Noting that the full data set has 200k rows. Will this be a problem for an array? Edited July 26, 2018 by MrCheese
FrancescoDiMuro Posted July 26, 2018 Posted July 26, 2018 I don't know how much time could it takes to process all the rows, but for sure, the size of array is still in the limit ( 16,777,216 elements ). Just give it a try Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
MrCheese Posted July 26, 2018 Author Posted July 26, 2018 ok thats fine then. So... how to clean out the partially empty row, and keep the full row of key and IDX?
FrancescoDiMuro Posted July 26, 2018 Posted July 26, 2018 @MrCheese The sample file you did attach has the ID1 column always filled... So, what are you trying to do? It's not necessary to split the file in 9 files... Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
MrCheese Posted July 26, 2018 Author Posted July 26, 2018 yes, ID1 is always filled - but the other IDs are not. So ID1.csv will have KEY,ID1 ID2.csv will have KEY,ID2 ID3.csv will have KEY,ID3 IDn.csv will have KEY,IDn no rows with a blank ID field.
FrancescoDiMuro Posted July 26, 2018 Posted July 26, 2018 What did you try so far? Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
MrCheese Posted July 26, 2018 Author Posted July 26, 2018 I wanted some opinions on methods/ functions to test out first. Once i am on my way usually i can figure out the code. I thought about things logically earlier, but wanted some input before I went forth.
FrancescoDiMuro Posted July 26, 2018 Posted July 26, 2018 6 minutes ago, MrCheese said: I thought about things logically earlier, but wanted some input before I went forth. _FileReadToArray(), array variables, For...Next loop with Step -1 ( from the "bottom" to the "top" of the array ), and some If conditions I've already made a solution, but let's see yours Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
mikell Posted July 26, 2018 Posted July 26, 2018 For the fun $s = FileRead("IDArray.csv") For $i = 0 to 8 $k = StringRegExpReplace($s, '(?imx)^(\d+|key) , (?:\w*,){' & $i & '} , .*$\R?', "") $k = StringRegExpReplace($k, '(?imx)^(\d+|key) , (?:\w*,){' & $i & '} (\w+) .*$', "$1,$2") ; Msgbox(0,"", $k) FileWrite("IDArray_ID" & $i+1 & ".csv", $k) Next Will this work on a 200k rows text ? I don't know ... FrancescoDiMuro 1
FrancescoDiMuro Posted July 26, 2018 Posted July 26, 2018 @mikell One day, I hope to understand this amazing function called StringRegExp! Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
MrCheese Posted July 26, 2018 Author Posted July 26, 2018 4 hours ago, FrancescoDiMuro said: _FileReadToArray(), array variables, For...Next loop with Step -1 ( from the "bottom" to the "top" of the array ), and some If conditions I've already made a solution, but let's see yours Thanks! yes, starting from the bottom! Removes the issue I was wondering about. I'll have a solution tomorrow morning.
mikell Posted July 26, 2018 Posted July 26, 2018 (edited) Well, I was curious and just tested my regex thing on a 3800k lines file (100k * the provided csv) I get the 9 resulting csv files in 40 sec. Not so bad Edited July 26, 2018 by mikell 380 instead of 3800 is a terrible typo !
iamtheky Posted July 26, 2018 Posted July 26, 2018 (edited) also, if the ID# portion itself does not matter, just the contents $s = FileRead("IDArray.csv") msgbox(0, '' , stringregexpreplace(StringTrimLeft($s , 41) , ",+" , ",")) Edited July 26, 2018 by iamtheky ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
AutoBert Posted July 26, 2018 Posted July 26, 2018 1 hour ago, mikell said: Will this work on a 200k rows text ? I don't know ... i am nearly sure as none off these limits: Quote VAR_SUBSCRIPT_ELEMENTS 16,777,216 Maximum number of elements for an array. VAR_SUBSCRIPT_MAX 64 Maximum number of subscripts for an array. is reached (by my interepreting the opening post).
mikell Posted July 26, 2018 Posted July 26, 2018 (edited) 2 hours ago, AutoBert said: i am nearly sure as none off these limits is reached Hmm. This $t = TimerInit() $s = FileRead("IDArray.csv") ; currently 38 lines Local $r For $i = 1 to 100000 $r &= $s & @crlf Next $s = $r For $i = 0 to 8 $k = StringRegExpReplace($s, '(?imx)^(\d+|key) , (?:\w*,){' & $i & '}, .*$\R?', "") $k = StringRegExpReplace($k, '(?imx)^(\d+|key) , (?:\w*,){' & $i & '}(\w+) .*$', "$1,$2") ; Msgbox(0,"", $k) ;;;;;;; FileWrite("IDArray_ID" & $i+1 & ".csv", $k) Next Msgbox(0,"", TimerDiff($t)/1000 ) works, but if I try more then I get "error allocating memory" Anyway more than 3800k lines means a huge amount of data, so it would surely be a much better way to manage this using a SQLite database Edited July 26, 2018 by mikell
FrancescoDiMuro Posted July 26, 2018 Posted July 26, 2018 (edited) @mikell It could be definitely a better way to remove blank rows! @MrCheese Take a look here about _SQLite* functions There are very flexible and easy functions to work with Edited July 26, 2018 by FrancescoDiMuro Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
mikell Posted July 26, 2018 Posted July 26, 2018 5 minutes ago, FrancescoDiMuro said: It could be definitely a better way to remove blank rows! It depends on the data, the needed use in the script, and so on The regex way is probably much faster than using the _Array* funcs. On the example above I didn't try the latter but it should give a little more than 40s I guess FrancescoDiMuro 1
AutoBert Posted July 26, 2018 Posted July 26, 2018 13 minutes ago, mikell said: Anyway more than 3800k lines means a huge amount of data, Yes, and 3800K * 5 elements per line= 19000k breaks the limit, only 3355443 lines are possible.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now