Jump to content
Sign in to follow this  
MrCheese

Array extract - removing rows that has a blank field

Recommended Posts

Hi guys,

See attached for an array example.

to simplify what i want to achieve,  I want to split this array into 9 different csv files.

the first file would contain the list of "key" and the corresponding "ID1", the second would have "key" and the "ID2", the third would have "key" and "ID3"

However, I want to remove all the rows that don't have an ID recorded in the respective ID2, ID3 4...5...6 etc, so the file only contains row items with a key and the ID.

Would be the best way to loop through the rows and delete the row if the array field is blank - would I then need to repeat that row ID to check that the row that its replaced is also empty (ie the one after the one I just deleted)? I see this getting messy.

or _arraySort, and delete everything below the last filled row? <-- this might be best?

Or should I use the excel UDF, apply a filter (not selecting the blanks), then create/export to the array->csv?

 

Super keen to hear your thoughts.

thanks!

 

 

 

 

IDArray.csv

Share this post


Link to post
Share on other sites

Hi @MrCheese:)
What if you read the file with _FileReadToArray(), and loop through the array to remove what you don't need, and then write the "cleaned" array back to the file with _FileWriteFromArray() ? 


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

so, i currently do that, to build the array in its entirety, its the looping to clean that I'm not sure about. Noting that the full data set has 200k rows. Will this be a problem for an array?

Edited by MrCheese

Share this post


Link to post
Share on other sites

I don't know how much time could it takes to process all the rows, but for sure, the size of array is still in the limit ( 16,777,216 elements ).
Just give it a try :)


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

@MrCheese 
The sample file you did attach has the ID1 column always filled...
So, what are you trying to do?
It's not necessary to split the file in 9 files... :)

 


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

yes, ID1 is always filled - but the other IDs are not.

So ID1.csv will have KEY,ID1

ID2.csv will have KEY,ID2

ID3.csv will have KEY,ID3

IDn.csv will have KEY,IDn

no rows with a blank ID field.

Share this post


Link to post
Share on other sites

What did you try so far? :)


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

I wanted some opinions on methods/ functions to test out first.

Once i am on my way usually i can figure out the code.

I thought about things logically earlier, but wanted some input before I went forth.

Share this post


Link to post
Share on other sites
6 minutes ago, MrCheese said:

I thought about things logically earlier, but wanted some input before I went forth.

_FileReadToArray(), array variables, For...Next loop with Step -1 ( from the "bottom" to the "top" of the array ), and some If conditions :)
I've already made a solution, but let's see yours :)


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

For the fun  :)

$s = FileRead("IDArray.csv")

For $i = 0 to 8
  $k = StringRegExpReplace($s, '(?imx)^(\d+|key) , (?:\w*,){' & $i & '} , .*$\R?', "")
  $k = StringRegExpReplace($k, '(?imx)^(\d+|key) , (?:\w*,){' & $i & '} (\w+) .*$', "$1,$2")
 ; Msgbox(0,"", $k)
  FileWrite("IDArray_ID" & $i+1 & ".csv", $k)
Next

Will this work on a 200k rows text ? I don't know ...     o:)

Share this post


Link to post
Share on other sites

@mikell
One day, I hope to understand this amazing function called StringRegExp! :D


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites
4 hours ago, FrancescoDiMuro said:

_FileReadToArray(), array variables, For...Next loop with Step -1 ( from the "bottom" to the "top" of the array ), and some If conditions :)
I've already made a solution, but let's see yours :)

Thanks! yes, starting from the bottom! Removes the issue I was wondering about. I'll have a solution tomorrow morning.

Share this post


Link to post
Share on other sites

Well, I was curious and just tested my regex thing on a 3800k lines file (100k * the provided csv)
I get the 9 resulting csv files in 40 sec. Not so bad  :D

Edited by mikell
380 instead of 3800 is a terrible typo !

Share this post


Link to post
Share on other sites

also, if the ID# portion itself does not matter, just the contents

$s = FileRead("IDArray.csv")

msgbox(0, '' , stringregexpreplace(StringTrimLeft($s , 41) , ",+" , ","))

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
1 hour ago, mikell said:

Will this work on a 200k rows text ? I don't know ...   

i am nearly sure as none off these limits:

Quote
VAR_SUBSCRIPT_ELEMENTS 16,777,216 Maximum number of elements for an array.
VAR_SUBSCRIPT_MAX 64 Maximum number of subscripts for an array.

is reached (by my interepreting the opening post).

Share this post


Link to post
Share on other sites
2 hours ago, AutoBert said:

i am nearly sure as none off these limits is reached

Hmm. This

$t = TimerInit()
$s = FileRead("IDArray.csv") ; currently 38 lines
Local $r
For $i = 1 to 100000
  $r &= $s & @crlf
Next
$s = $r

For $i = 0 to 8
  $k = StringRegExpReplace($s, '(?imx)^(\d+|key) , (?:\w*,){' & $i & '}, .*$\R?', "")
  $k = StringRegExpReplace($k, '(?imx)^(\d+|key) , (?:\w*,){' & $i & '}(\w+) .*$', "$1,$2")
;  Msgbox(0,"", $k)
;;;;;;;  FileWrite("IDArray_ID" & $i+1 & ".csv", $k)
Next
Msgbox(0,"", TimerDiff($t)/1000 )

works, but if I try more then I get "error allocating memory"
Anyway more than 3800k lines means a huge amount of data, so it would surely be a much better way to manage this using a SQLite database  :)

Edited by mikell

Share this post


Link to post
Share on other sites

@mikell

It could be definitely a better way to remove blank rows! :)

@MrCheese

Take a look here about _SQLite* functions :)

There are very flexible and easy functions to work with :)

Edited by FrancescoDiMuro

Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites
5 minutes ago, FrancescoDiMuro said:

It could be definitely a better way to remove blank rows!

It depends on the data, the needed use in the script, and so on  :)
The regex way is probably much faster than using the _Array* funcs. On the example above I didn't try the latter but it should give a little more than 40s I guess  ;)

Share this post


Link to post
Share on other sites
13 minutes ago, mikell said:

Anyway more than 3800k lines means a huge amount of data,

Yes, and 3800K * 5 elements per line= 19000k breaks the limit, only 3355443 lines are possible.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Jangal
      Hello friends
      This app is slow
      How to increase its speed?
       
      #include <Array.au3> #include <StringConstants.au3> #include <File.au3> #include <String.au3> Global $aWord[][2]  = [[1, "google"],[2,"hello"]]


        Global $sFileName = @ScriptDir & "\1.txt" ; 2MB Text File Local $sFileRead = FileRead($sFileName) Local $res = StringRegExp($sFileRead, "(*UCP)\b[\pL\d]{2,}", 3) _ArrayDisplay($res)   for $sWord in $res     $iIndex = _ArraySearch($aWord, $sWord, 0, 0, 0, 0, 1, 2)     ;MsgBox(0,0,$iIndex)     if $iIndex == -1 Then         Local $aFill = [[0,$sWord]]         _ArrayAdd($aWord,$aFill)        ;      Else         $aWord[$iIndex][0] +=1     EndIf   Next _ArrayDisplay($aWord)

      1.txt
    • By Colduction
      Hi dear friends!, i'm sorry for creating a new thread (a new problem), i have over than 9 lists that i want to combine them to be this (in this example, there are 3 test files):


      I've written a little code for splitting main information, but i really confused how to make results as "Output.txt", here is that code:
       
      $sRegex_1 = StringRegExp(FileRead("1.txt"), '(?s:(?<=\=\=\r\n)(.*?)(?=\r\n\=\=))', 3) $sRegex_2 = StringRegExp(FileRead("2.txt"), '(?s:(?<=\=\=\r\n)(.*?)(?=\r\n\=\=))', 3) $sRegex_3 = StringRegExp(FileRead("3.txt"), '(?s:(?<=\=\=\r\n)(.*?)(?=\r\n\=\=))', 3) For $i = 0 To UBound($sRegex_1) - 1 ConsoleWrite($sRegex_1[$i] & @CRLF) For $j = 0 To UBound($sRegex_2) - 1 ConsoleWrite($sRegex_2[$j] & @CRLF) For $k = 0 To UBound($sRegex_3) - 1 ConsoleWrite($sRegex_3[$k] & @CRLF) Next Next Next  
    • By nacerbaaziz
      hello evrybody
      here is an example about how to split your texts using a delimiter with the ability to select how much of delimiters shows in each colum  with $i_number
      e.g
      you have a long text and you want to split it in an array
      that evry colum have a number (n) of lines
      i made a function that do that for you
      just call it with a three params
      $s_text
      your text
      $i_number
      the number that you want to put in each col
      $s_siparator
      the siparator
      default is "|"
      here is the function with example
      i hope that it will be useful for you
       
      ****
       
      #include <Array.au3> $s_txt = "some text1some text2|some text3|some text4|some text5|some text6" $array = splitText($s_txt, 2) _ArrayDisplay($array) Func splitText($s_text, $i_number, $s_siparator = "|") Local $a_TXT = StringSplit($s_text, $s_siparator) Local $a_Return[$a_TXT[0] + 1] If ($a_TXT[0] <= $i_number) Or ($i_number <= 0) Then ReDim $a_Return[2] $a_Return[0] = 1 $a_Return[1] = $s_text Return $a_Return EndIf Local $i_Processed = 1, $i_arrayProcessed = 1 Do For $i = $i_Processed To ($i_Processed + $i_number) - 1 If ($a_TXT[0] < $i) Then ExitLoop If Not ($a_Return[$i_arrayProcessed]) Then $a_Return[$i_arrayProcessed] = $a_TXT[$i] Else $a_Return[$i_arrayProcessed] &= $s_siparator & $a_TXT[$i] EndIf $i_Processed += 1 Next $i_arrayProcessed += 1 Until ($a_TXT[0] < $i_Processed) ReDim $a_Return[$i_arrayProcessed] $a_Return[0] = $i_arrayProcessed - 1 Return $a_Return EndFunc ;==>splitText
      accept my greetings
      thanks to
      @Dan_555
      for his notes
       
    • By MesterPerfect
      good morning
      this is the first post here in the autoit forums
      i hope that you can help me in my problem
      i have a JSON encoded
      it a map of my forums
      where i want to make a treeview that have the same type of map
      e.g
      a system (as category)
      windows (as sub category)
      software (as an child item in the windows category)
      .....
      i don't know how to do that
      so, i know that i can do that using the json functions
      but i need your help about how we can do it as the type that i told you
      by the way i need to put the sub info for each item in an array that give me the ability to manage my items
      e.g
      can post thread
      can reply
      message cound ...
      you just give me a small example and i can continue.
      am sorry if this against the rules of the forum.
      but i realy searched a lot but i couldn't
      i hope some one give me the way.
      thank you very much in advance
       
      here is the link of json forum
      https://www.autoitscript.com/forum/topic/148114-a-non-strict-json-udf-jsmn/
      and here is my encoded json file
       
      { "tree_map": { "0": [ 1, 5, 6, 7 ], "1": [ 2 ], "2": [ 4 ], "5": [ 3 ], "6": [ 8 ], "8": [ 9, 10 ] }, "nodes": [ { "breadcrumbs": [], "description": "", "display_in_list": true, "display_order": 1, "node_id": 1, "node_name": null, "node_type_id": "Category", "parent_node_id": 0, "title": "Main category", "type_data": {} }, { "breadcrumbs": [ { "node_id": 1, "title": "Main category", "node_type_id": "Category" } ], "description": "", "display_in_list": true, "display_order": 1, "node_id": 2, "node_name": null, "node_type_id": "Forum", "parent_node_id": 1, "title": "Main forum", "type_data": { "allow_poll": true, "allow_posting": true, "can_create_thread": true, "can_upload_attachment": true, "discussion_count": 0, "last_post_date": 0, "last_post_id": 0, "last_post_username": "", "last_thread_id": 0, "last_thread_prefix_id": 0, "last_thread_title": "", "message_count": 0, "min_tags": 0, "require_prefix": false } }, { "breadcrumbs": [ { "node_id": 1, "title": "Main category", "node_type_id": "Category" }, { "node_id": 2, "title": "Main forum", "node_type_id": "Forum" } ], "description": "", "display_in_list": true, "display_order": 1, "node_id": 4, "node_name": null, "node_type_id": "Forum", "parent_node_id": 2, "title": "my forums1", "type_data": { "allow_poll": true, "allow_posting": true, "can_create_thread": true, "can_upload_attachment": true, "discussion_count": 0, "last_post_date": 0, "last_post_id": 0, "last_post_username": "", "last_thread_id": 0, "last_thread_prefix_id": 0, "last_thread_title": "", "message_count": 0, "min_tags": 0, "require_prefix": false } }, { "breadcrumbs": [], "description": "", "display_in_list": true, "display_order": 2, "node_id": 5, "node_name": null, "node_type_id": "Category", "parent_node_id": 0, "title": "Perfect", "type_data": {} }, { "breadcrumbs": [ { "node_id": 5, "title": "Perfect", "node_type_id": "Category" } ], "description": "", "display_in_list": true, "display_order": 2, "node_id": 3, "node_name": null, "node_type_id": "Forum", "parent_node_id": 5, "title": "ahmed", "type_data": { "allow_poll": true, "allow_posting": true, "can_create_thread": true, "can_upload_attachment": true, "discussion_count": 0, "last_post_date": 0, "last_post_id": 0, "last_post_username": "", "last_thread_id": 0, "last_thread_prefix_id": 0, "last_thread_title": "", "message_count": 0, "min_tags": 0, "require_prefix": false } }, { "breadcrumbs": [], "description": "", "display_in_list": true, "display_order": 3, "node_id": 6, "node_name": null, "node_type_id": "Forum", "parent_node_id": 0, "title": "autoit", "type_data": { "allow_poll": true, "allow_posting": true, "can_create_thread": true, "can_upload_attachment": true, "discussion_count": 0, "last_post_date": 0, "last_post_id": 0, "last_post_username": "", "last_thread_id": 0, "last_thread_prefix_id": 0, "last_thread_title": "", "message_count": 0, "min_tags": 0, "require_prefix": false } }, { "breadcrumbs": [ { "node_id": 6, "title": "autoit", "node_type_id": "Forum" } ], "description": "", "display_in_list": true, "display_order": 3, "node_id": 8, "node_name": null, "node_type_id": "Forum", "parent_node_id": 6, "title": "examples", "type_data": { "allow_poll": true, "allow_posting": true, "can_create_thread": true, "can_upload_attachment": true, "discussion_count": 0, "last_post_date": 0, "last_post_id": 0, "last_post_username": "", "last_thread_id": 0, "last_thread_prefix_id": 0, "last_thread_title": "", "message_count": 0, "min_tags": 0, "require_prefix": false } }, { "breadcrumbs": [ { "node_id": 6, "title": "autoit", "node_type_id": "Forum" }, { "node_id": 8, "title": "examples", "node_type_id": "Forum" } ], "description": "", "display_in_list": true, "display_order": 3, "node_id": 9, "node_name": null, "node_type_id": "Forum", "parent_node_id": 8, "title": "GUI", "type_data": { "allow_poll": true, "allow_posting": true, "can_create_thread": true, "can_upload_attachment": true, "discussion_count": 0, "last_post_date": 0, "last_post_id": 0, "last_post_username": "", "last_thread_id": 0, "last_thread_prefix_id": 0, "last_thread_title": "", "message_count": 0, "min_tags": 0, "require_prefix": false } }, { "breadcrumbs": [ { "node_id": 6, "title": "autoit", "node_type_id": "Forum" }, { "node_id": 8, "title": "examples", "node_type_id": "Forum" } ], "description": "", "display_in_list": true, "display_order": 31, "node_id": 10, "node_name": null, "node_type_id": "Forum", "parent_node_id": 8, "title": "windowEX", "type_data": { "allow_poll": true, "allow_posting": true, "can_create_thread": true, "can_upload_attachment": true, "discussion_count": 0, "last_post_date": 0, "last_post_id": 0, "last_post_username": "", "last_thread_id": 0, "last_thread_prefix_id": 0, "last_thread_title": "", "message_count": 0, "min_tags": 0, "require_prefix": false } }, { "breadcrumbs": [], "description": "", "display_in_list": true, "display_order": 4, "node_id": 7, "node_name": null, "node_type_id": "Category", "parent_node_id": 0, "title": "vbs", "type_data": {} } ] }  
    • By nooneclose
      I need to dynamically resize my 2d array while looping. 
      I know this code:
      ReDim $rArray[UBound($rArray) + 1] works for the rows, however, I also need to increase the columns. How would i go about increasing both rows and columns while looping? 
×
×
  • Create New...