czardas Posted June 4, 2010 Share Posted June 4, 2010 (edited) I figured this was the best forum to post this topic.I have a 2D array of approx 10,000 elements. It contains about 8,800 unassigned elements. My concern is that testing all these unassigned elements during a repeated array search seems long winded and tedious. So why not create a smaller array with only 1,000 assigned elements and search that. Scaled down reproducerexpandcollapse popup#include <Array.au3> Local $data[14] = [4,5,2,3,0,1,7,6,9,8,12,11,10,13] Local $2D_array[8][7] ; This array is needed for another process. ; The 1st column is a key which determines the number of entries in each row. ; Numbers in this 1st column are excluded from the search results. $2D_array[2][0] = 1 $2D_array[3][0] = 1 $2D_array[4][0] = 6 $2D_array[5][0] = 2 $2D_array[6][0] = 3 $2D_array[7][0] = 1 ; Populate the remaining columns of the 2D array. $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] $2D_array[$i][$j] = $data[$a] $a += 1 Next Next ; NONE OF THE CODE ABOVE CAN NOT BE ALTERED. ; Breakng convention. Local $index_matching_array[14] ; Array created to avoid testing many empty cells during the search (for sparce population in a large array). $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] $index_matching_array[$a] = $i $a += 1 Next Next find() Func find() Local $x = Random(0,13,1) ; Whatever element we want to find in the 2D array. Local $data_index = _ArraySearch($data, $x, 0, 0, 0, 1) ; This search will be repeated thousands of times. Local $return_value = $index_matching_array[$data_index] ; Display result => MsgBox(0, "Index in 2D_array", $x & " was found at " & @CRLF & "array index " & $return_value) _ArrayDisplay($2D_array) EndFuncSo now I am breaking convention by duplicating information. I know this topic has been discussed before, and that data duplication is considered bad practice. But is it always so bad? Maybe another dumb ass question. Edited June 4, 2010 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
evilertoaster Posted June 4, 2010 Share Posted June 4, 2010 My concern is that testing all these unassigned elements during a repeated array search seems long winded and tediousIt usually is. If you know a bounds or range limit to the set you're searching, use it.So why not create a smaller array with only 1,000 assigned elements and search that.You could... it should be unnecessary. Like I mentioned, it's better to simply limit the range of the search.So now I am breaking convention by duplicating information. I know this topic has been discussed before, and that data duplication is considered bad practice. But is it always so bad?I'd hesitate to say it's always bad to duplicate information... In this case I would think it is bad though, since there should be no reason too. I thought the _ArraySearch allowed you to specify a starting a stopping point for the search (aka the 'range' to search) I'm not sure how it works with 2d arrays, but you could most certainly write your own 2D array searching function if speed is your concern. Link to comment Share on other sites More sharing options...
jvanegmond Posted June 4, 2010 Share Posted June 4, 2010 (edited) See for yourself.Also, $a is always the same as $i - 2.expandcollapse popup#include <Array.au3> Local $data[14] = [4,5,2,3,0,1,7,6,9,8,12,11,10,13] Local $2D_array[8][7] ; This array is needed for another process. ; The 1st column is a key which determines the number of entries in each row. ; Numbers in this 1st column are excluded from the search results. $2D_array[2][0] = 1 $2D_array[3][0] = 1 $2D_array[4][0] = 6 $2D_array[5][0] = 2 $2D_array[6][0] = 3 $2D_array[7][0] = 1 ; Populate the remaining columns of the 2D array. $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] $2D_array[$i][$j] = $data[$a] $a += 1 Next Next ; NONE OF THE CODE ABOVE CAN NOT BE ALTERED. $a2 = 0 $b2 = 0 For $i = 0 to 10000 $n = Random(0, 13, 1) ; This is the value we want to find $a1 = TimerInit() $a = createAndFind($n) $a2 += TimerDiff($a1) $b1 = TimerInit() $b = justFind($n) $b2 += TimerDiff($b1) Next ConsoleWrite("Create array and find total time: " & $a2 & @CRLF) ConsoleWrite("Just find total time: " & $b2 & @CRLF) Func justFind($x) $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] if $i = $x Then Return $a EndIf $a += 1 Next Next Return $a EndFunc Func createAndFind($x) ; Breaks convention Local $index_matching_array[14] $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] $index_matching_array[$a] = $i $a += 1 ; $a is always the same as $i - 2 Next Next $index = _ArraySearch($data, $x, 0, 0, 0, 1) Return $index_matching_array[$index] EndFuncEdit: And when a programmer tells you that it's bad to duplicate data, then he's talking about databases. Edited June 4, 2010 by Manadar github.com/jvanegmond Link to comment Share on other sites More sharing options...
czardas Posted June 4, 2010 Author Share Posted June 4, 2010 (edited) I am beginning to think perhaps I ought to have posted this in Help and Support. Manadar, you make everything seem so easy. Thank you for this illustration. I'm sure it will be quicker regardless of the array size. For some reason I was fixated on using the built in _arraySearch function. A classic case of I couldn't see the wood for the trees. Still I thought the topic may be of general interest, even if just to discover how others would approach such a problem. And your solution is so very much to the point. Great! Thanks also to evilertoaster for your comments. Edit: I just spotted one thing about it I need to test. The second array $index_matching_array only needs to be created once.expandcollapse popup#include <Array.au3> Local $data[14] = [4,5,2,3,0,1,7,6,9,8,12,11,10,13] Local $2D_array[8][7] ; This array is needed for another process. ; The 1st column is a key which determines the number of entries in each row. ; Numbers in this 1st column are excluded from the search results. $2D_array[2][0] = 1 $2D_array[3][0] = 1 $2D_array[4][0] = 6 $2D_array[5][0] = 2 $2D_array[6][0] = 3 $2D_array[7][0] = 1 ; Populate the remaining columns of the 2D array. $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] $2D_array[$i][$j] = $data[$a] $a += 1 Next Next ; NONE OF THE CODE ABOVE CAN NOT BE ALTERED. Local $index_matching_array[14] $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] $index_matching_array[$a] = $i $a += 1 Next Next $a2 = 0 $b2 = 0 For $i = 0 to 10000 $n = Random(0, 13, 1) ; This is the value we want to find $a1 = TimerInit() $a = Find($n) $a2 += TimerDiff($a1) $b1 = TimerInit() $b = justFind($n) $b2 += TimerDiff($b1) Next ConsoleWrite("Create array and find total time: " & $a2 & @CRLF) ConsoleWrite("Just find total time: " & $b2 & @CRLF) Func justFind($x) $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] if $i = $x Then Return $a EndIf $a += 1 Next Next Return $a EndFunc Func Find($x) $index = _ArraySearch($data, $x, 0, 0, 0, 1) Return $index_matching_array[$index] EndFuncUpdate: I took the array creation out of the loop and ran the test again. It appears that your search function => justFind() is still faster than using _arraysearch() on the smaller 1D array. Edited June 5, 2010 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jvanegmond Posted June 4, 2010 Share Posted June 4, 2010 (edited) Hey! I was working on that. It's only _ArraySearch that makes it slow. If your data never changes, you can create the special array once, and then search through it manually. This is faster (not including the time creating the array) expandcollapse popup#include <Array.au3> Local $data[14] = [4,5,2,3,0,1,7,6,9,8,12,11,10,13] Local $2D_array[8][7] ; This array is needed for another process. ; The 1st column is a key which determines the number of entries in each row. ; Numbers in this 1st column are excluded from the search results. $2D_array[2][0] = 1 $2D_array[3][0] = 1 $2D_array[4][0] = 6 $2D_array[5][0] = 2 $2D_array[6][0] = 3 $2D_array[7][0] = 1 ; Populate the remaining columns of the 2D array. $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] $2D_array[$i][$j] = $data[$a] $a += 1 Next Next ; NONE OF THE CODE ABOVE CAN NOT BE ALTERED. Local $index_matching_array[14] $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] $index_matching_array[$a] = $i $a += 1 ; $a is always the same as $i - 2 Next Next $a2 = 0 $b2 = 0 For $i = 0 to 10000 $n = Random(0, 13, 1) ; This is the value we want to find $a1 = TimerInit() $a = Find($n) $a2 += TimerDiff($a1) $b1 = TimerInit() $b = justFind($n) $b2 += TimerDiff($b1) Next ConsoleWrite("Find from array total time: " & $a2 & @CRLF) ConsoleWrite("Just find total time: " & $b2 & @CRLF) Func justFind($x) $a = 0 For $i = 2 To 7 For $j = 1 To $2D_array[$i][0] if $i = $x Then Return $a EndIf $a += 1 Next Next Return $a EndFunc Func Find($x) For $i = 0 to 13 if $data[$i] = $x then return $index_matching_array[$i] EndIf Next EndFunc Edited June 4, 2010 by Manadar github.com/jvanegmond Link to comment Share on other sites More sharing options...
czardas Posted June 4, 2010 Author Share Posted June 4, 2010 (edited) If your data never changes, you can create the special array once, and then search through it manually. This is faster (not including the time creating the array)Let's see if I got this right? Duplication of data is possibly a good idea sometimes. Wow!Thanks again. Much appreciated. Edited June 5, 2010 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
evilertoaster Posted June 4, 2010 Share Posted June 4, 2010 Let's see if I got this right? Duplication of data is possibly a good idea sometimes. Wow!Humm, I don't see how we're getting to this conclusion...The goal is finding the index of the value isn't it?Comparing a function searches a 1d array for an index to a function that searches a 2d array? In the 2n array you'd need to return 2 values anyways, even if the 1d search goes faster you still haven't resolved it to the original (2d) index... (or if the original is not important why not just sort it and to a 1d binary search to begin with?)*ramble ramble*... o well, if you found your answer congrats Link to comment Share on other sites More sharing options...
czardas Posted June 4, 2010 Author Share Posted June 4, 2010 (edited) In the 2n array you'd need to return 2 values anywaysIf that was the case then I would have to agree with you. However I am only interested in the returned row index value, and not the column in which the match was found. The 2D array stores information to be used elsewhere for a different process, otherwise it would be superfluous. I had originally planned to delete the original data and only search the 2D array. On reflection, with some redesigning, I should be able to avoid this duplication. But I can imagine circumstances where I might need the 2D array, as well as calling on the original data to more quickly reproduce these values. If any of that makes sense. It is a consequence of my attempts to avoid duplicating data, that I ran into this scenario in the first place. I certainly agree that duplication should be avoided. But it occasionally seems more straight forward to simply allow it. Perhaps that's just me being lazy, or too short sighted to see another solution.I also would like to point out that: I would not post a topic like this, if I did not value people's opinions. What it boils down to is whether duplication avoidance should be treated as a general guideline or as a definitive rule written in stone. Edited June 5, 2010 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
czardas Posted June 5, 2010 Author Share Posted June 5, 2010 (edited) The more I think about this, the more I keep thinking there must be a better way. After a while of pondering I have come up with a solution. I only need to populate the first column of the 2D array. I can still find the index values as if I had searched a fully populated array. Only if and when required should I populate the array with more data. Bingo! Edit: Bah! This still doesn't actually avoid duplicating data. Once the array is populated, the data will be duplicated. I guess I'll just have to follow my intuition on this one. Manadar's speed test is swaying my decision. Edited June 5, 2010 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted June 6, 2010 Moderators Share Posted June 6, 2010 Why not use SQLite to manage such big arrays and dupe issues. You'd avoid all your speed issue concerns. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
evilertoaster Posted June 6, 2010 Share Posted June 6, 2010 There are some cases where 'duplicating' (or preprocessing) data is fine as far as I'm concerned. Sorting lists before processing is common for speed purposes. If the 2-D array you're searching doesn't have any quantifiable bounds and you're going to be doing a lot of intense processing over a small subset (which you can't logicality describe for looping) it would make sense to pre-arrange the data... Then again, depending on how important speed really is for you, AutoIt isn't always the best option in general. Link to comment Share on other sites More sharing options...
AndyG Posted June 6, 2010 Share Posted June 6, 2010 If you have trouble with the speed in searching arrays, and the "Double Data", why use arrays?There are much faster ways sometimes if you use a lists, a queue, a stack or a hash-table instead of an array.In this forum I have not searched intensively for examples, because in the german forum is a little "tutorial" with some examples. One of them might help you...http://www.autoit.de/index.php?page=Thread&postID=24276#post24276 Link to comment Share on other sites More sharing options...
czardas Posted June 6, 2010 Author Share Posted June 6, 2010 (edited) Thanks everyone for your input. Lots of interesting suggestions and ideas to chew over. I ought to explain what I'm up to. The concept is very simple but the implementation I find quite tricky. The code is part of a fully functional encryption program which has undergone multiple tests and rebuilds over the last few months. The architecture has been clearly defined for a while now, but from the perspective of another programmer looking at it, the code needs further attention. Fortunately I am not confined by a deadline. I actually need to rip the heart out of it and make it presentable. It is vitaly important that the data be held in RAM, so using a database is out of the question. Holding the information in an array apparently simplifies several tasks, though I am very interested to look into your suggested alternatives. The size of the 2D array is not really so large (256 rows x 54 columns), but the number of repeat searches will depend entirely on the length of the encrypted data. Edited June 6, 2010 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
jchd Posted June 7, 2010 Share Posted June 7, 2010 It is vitaly important that the data be held in RAM, so using a database is out of the question.You miss the point about SQLite flexibility: it will happily create a memory database which you can then search with all the power of SQL yet using fast SQLite C code.Depending on your precise requirements, you can create indexes on whatever combination of column / type you need and still enjoy very fast answer.I routinely use it for temporary storage and searches and when you have to repeatedly lookup the data with non-trivial conditions, it quickly becomes much faster than AutoIt loops.As a bonus, you can load an SQLite DB from disk and save it back if you need it a some later point in time, all of this without even closing it. See here.That is exactly one of the reasons for which SQLite is so widely used: high-efficiency embedded application data storage / retrieval. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
czardas Posted June 7, 2010 Author Share Posted June 7, 2010 (edited) Yeah, not sure what you meant by bad. When I say bad, I mean that it is considered bad to duplicate data because it generally hinders performance. In this case the duplication appears to be enhancing performance, which perhaps seems counter-intuative. I am therefore reluctant to change it. That doesn't necessarily mean that it will be the final solution: though probably. You miss the point about SQLite flexibility: it will happily create a memory database which you can then search with all the power of SQL yet using fast SQLite C code. Ah yes, I missunderstood that point. Thanks. However I am not so much concerned with speed, as I am with presenting my method in code that is well written and easy to understand. I am finding that to be a bit of a challenge. Also I'm not keen on bundling 3rd party software (I'm pressuming that would be required) with a portable application: which is how I intend to release it. However it sounds like I ought to investigate the functionality of SQLite at some point. It sounds very interesting. Any further enhancements to performance can be made later, if it turns out to be a worthwhile project. Changing the subject slightly Nobody spotted the mistake yet? Edited June 7, 2010 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Valik Posted June 7, 2010 Share Posted June 7, 2010 Performance entails more than just "speed". Resource usage (memory for example) is also a metric in performance. Duplicating data may allow speed to improve under certain circumstances but it does affect memory usage adversely. It's a trade-off and you must determine whether the increase in usage justifies the speed. Link to comment Share on other sites More sharing options...
GEOSoft Posted June 7, 2010 Share Posted June 7, 2010 czardas, you are aware that there is a SQLite UDF included with AutoIt are you not? Nothing more to be installed. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
wraithdu Posted June 7, 2010 Share Posted June 7, 2010 If you just need to find an index, you could use a Scripting.Dictionary and store the array of data as the data element. Link to comment Share on other sites More sharing options...
czardas Posted June 7, 2010 Author Share Posted June 7, 2010 (edited) czardas, you are aware that there is a SQLite UDF included with AutoIt are you not? Nothing more to be installed.Now that's a totally differnt story. Very useful indeed. Thanks again to everyone for sharing your knowledge with me, and to Valik for clarifying my understanding of the subject. Edited June 7, 2010 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now