
phatzilla
Active Members-
Posts
117 -
Joined
-
Last visited
Recent Profile Visitors
The recent visitors block is disabled and is not being shown to other users.
phatzilla's Achievements

Adventurer (3/7)
0
Reputation
-
Getting unique elements from 2D arrays
phatzilla replied to phatzilla's topic in AutoIt General Help and Support
Yes, last one as it appears in the text file. Im currently trying the solutions presented in this thread, will report back update1: kylomas, your script's performance (based on 3M lines file) Time to load array from file = 21.6615 Time to load dictionary = 95.1497 Time to load unique array = 23.6044 Not bad ;), i thought it'd take longer! -
Getting unique elements from 2D arrays
phatzilla replied to phatzilla's topic in AutoIt General Help and Support
unfortunately his solution doesn't necessarily apply to the problem i presented in this thread, rather the problem you had already helped me solve in the other thread -
Getting unique elements from 2D arrays
phatzilla replied to phatzilla's topic in AutoIt General Help and Support
Hi mikell, It appears your solution works, however i tested with a dataset of ~6k lines, and it took about 25 seconds.... Im working with over 1M lines here, so i don't know if its viable. Is there anyway to greatly optimize this, or is it just a fact i'll have to deal with? I guess in theory i will have to scan 1,000,000 * 1,000,000 array elements -- which is quite a bit, so maybe im screwed. Unless there's a way to systematically break it down to smaller arrays of 1000 * 1000 and then combine or something... Another thing that's worth mentioning is that column 1 contains no duplicates, it's only column 2 which does. -
Getting unique elements from 2D arrays
phatzilla replied to phatzilla's topic in AutoIt General Help and Support
Actually, this is a separate issue from the other thread -
I'm back begging for more help. say i have a 2d array (read from file)with ":" as a delimeter 87000026:abc 87000090:def 87000021:ghi 87000027:def 87000089:abc 87000028:abc 87000094:def The output should be 87000021:ghi 87000028:abc 87000094:def So i want to get rid of the all duplicates in column 2 (except the LAST one), while maintaining the corresponding string in column 1. _arrayunique only creates a new 1d array of either column 1 or 2, not both... Keep in mind this will be used on files with over 1M lines...
-
Problem is solved!(for now). Thank you for everyones help.... Got the time down to ~10 seconds max for the biggest files
- 18 replies
-
- array
- performance
-
(and 3 more)
Tagged with:
-
phatzilla reacted to a post in a topic: Optimizing array difference search speed (or other solutions welcome!)
-
Actually, i just adjusted my script based on mikells suggestion.... global $a = FileReadToArray("info-file(200k lines).txt") global $b = FileReadToArray("compare-file(100k lines).txt") Local $sda = ObjCreate("Scripting.Dictionary") Local $sdb = ObjCreate("Scripting.Dictionary") Local $sdc = ObjCreate("Scripting.Dictionary") For $i In $a $sda.Item($i) Next For $i In $b $sdb.Item($i) Next For $i In $a If $sdb.Exists($i) Then $sdc.Item($i) Next $asd3 = $sdc.Keys() For $i In $asd3 If $sda.Exists($i) Then $sda.Remove($i) If $sdb.Exists($i) Then $sdb.Remove($i) Next $asd1 = $sda.Keys() $asd2 = $sdb.Keys() _ArrayDisplay($asd1, "$asd1") _ArrayDisplay($asd2, "$asd2") _ArrayDisplay($asd3, "$asd3") Now this is much faster... $ASD1 is the only array that i need as it contains the lines that are *ONLY* found in "info" file (because the "other" compare file will never contain any lines that aren't found in info file). Is there any way to speed this script up? I dont really need the other 2 new arrays asd2,asd3. asd1 contains all the information i need.
- 18 replies
-
- array
- performance
-
(and 3 more)
Tagged with:
-
They are text files. Each line is about 100-120 chars in length alphanumeric
- 18 replies
-
- array
- performance
-
(and 3 more)
Tagged with:
-
BrewManNH, Will _arrayunique remove ALL duplicate entries? If i recall, it simply keeps one of the entries. If you run: 1 3 3 2 through _arrayunique, you'll get 1 3 2 I need it to be 1 2 What i need is a new array that's the difference between "main file" and "other file". So if main file array is 2 1 4 5 3 and other file array is 2 3 1 new array should be 4 5 In other words, i dont want to preserve any instances of the duplicate lines. the _Diff function achieves this, but i need to get it a bit faster, if possible... Or, failing that, a new solution.
- 18 replies
-
- array
- performance
-
(and 3 more)
Tagged with:
-
Hi gang, I'm in a bit of a pickle here, the gist of it is that i have one "main" file with ~200k lines, and literally hundreds of other files each ranging from 1k - 100k lines. I need to go through each of the "other" files and (separately) compare them to the "main" file, and save the differences between the two (no duplicates) The issue is that each comparison (especially if the "other" file has 50+ k lines) takes over a minute each... Is there anyway to cut this time down? As far as i know im using the most optimized array difference script Here's a rough mockup of the script im currently using note: the main file has all unique lines, and the "other" files wont ever have any lines that *DO NOT* appear in the main file, if that helps #include <array.au3> #include <file.au3> global $info_file global $compare_file global $Differece $info_file = FileReadToArray("info-file(200k lines).txt") $compare_file = FileReadToArray("compare-file(100k lines).txt") $Differece = _Diff($info_file, $compare_file, 0) ; get the difference between 2 arrays, NO duplicates _ArrayDisplay($Differece) ;================================================= ; Function Name: _Diff($Set1, $Set2 [, $GetAll=0 [, $Delim=Default]]) ; Description:: Find values in $Set1 that do not occur in $Set2 ; Parameter(s): $Set1 set 1 (1D-array or delimited string) ; $Set2 set 2 (1D-array or delimited string) ; optional: $GetAll 0 - only one occurence of every difference are shown (Default) ; 1 - all differences are shown, allowing duplicates ; optional: $Delim Delimiter for strings (Default use the separator character set by Opt("GUIDataSeparatorChar") ) ; Return Value(s): Succes 1D-array of values from $Set1 that do not occur in $Set2 ; Failure -1 @error set, that was given as array, isn't 1D-array ; Note: Comparison is case-sensitive! - i.e. Number 9 is different to string '9'! ; Author(s): BugFix (bugfix@autoit.de) Modified by ParoXsitiC for Faster _Diff (Formally _GetIntersection) ;================================================= Func _Diff(ByRef $Set1, ByRef $Set2, $GetAll = 0, $Delim = Default) Local $o1 = ObjCreate("System.Collections.ArrayList") Local $o2 = ObjCreate("System.Collections.ArrayList") Local $oDiff1 = ObjCreate("System.Collections.ArrayList") Local $tmp, $i If $GetAll <> 1 Then $GetAll = 0 If $Delim = Default Then $Delim = Opt("GUIDataSeparatorChar") If Not IsArray($Set1) Then If Not StringInStr($Set1, $Delim) Then $o1.Add($Set1) Else $tmp = StringSplit($Set1, $Delim, 1) For $i = 1 To UBound($tmp) - 1 $o1.Add($tmp[$i]) Next EndIf Else If UBound($Set1, 0) > 1 Then Return SetError(1, 0, -1) For $i = 0 To UBound($Set1) - 1 $o1.Add($Set1[$i]) Next EndIf If Not IsArray($Set2) Then If Not StringInStr($Set2, $Delim) Then $o2.Add($Set2) Else $tmp = StringSplit($Set2, $Delim, 1) For $i = 1 To UBound($tmp) - 1 $o2.Add($tmp[$i]) Next EndIf Else If UBound($Set2, 0) > 1 Then Return SetError(1, 0, -1) For $i = 0 To UBound($Set2) - 1 $o2.Add($Set2[$i]) Next EndIf For $tmp In $o1 If Not $o2.Contains($tmp) And ($GetAll Or Not $oDiff1.Contains($tmp)) Then $oDiff1.Add($tmp) Next If $oDiff1.Count <= 0 Then Return 0 Local $aOut[$oDiff1.Count] $i = 0 For $tmp In $oDiff1 $aOut[$i] = $tmp $i += 1 Next Return $aOut EndFunc ;==>_Diff
- 18 replies
-
- array
- performance
-
(and 3 more)
Tagged with:
-
Thanks, jchd, that seems to have worked. I didn't realize that the client follows the servers command regarding keeping/closing a connection. Is there any other way to forcefully close it without altering the headers that you're sending over to the server? I realize that the server will probably still have their side of the connection open, but that's inconsequential to me.
-
Hello Trancexx, I have an interesting conundrum concerning HTTP(S) requests, Here is a sample script #include "WinHttp.au3" global $schunk="" global $sdata="" while 1 global $hOpen = _WinHttpOpen('Mozilla/5.0 (Windows NT 6.1; rv:6.0.1) Gecko/20100101 Firefox/6.0.1') _WinHttpSetTimeouts($hOpen,3000,3000,3000,3000) global $hconnect = _WinHttpConnect($hOpen, "www.maxmind.com",$INTERNET_DEFAULT_HTTPS_PORT) ;Https connection global $hrequest = _WinHttpOpenRequest($hconnect,"GET","/geoip/v2.0/city_isp_org/me",default,Default,default,$WINHTTP_FLAG_SECURE) ; Https connection. _WinHttpAddRequestHeaders($hrequest, "Accept: */*") _WinHttpAddRequestHeaders($hrequest, "Accept-Language: en-US,en;q=0.8") _WinHttpAddRequestHeaders($hrequest, "Referer: http://www.maxmind.com/en/locate_my_ip") _WinHttpAddRequestHeaders($hrequest, "X-Requested-With: XMLHttpRequest") ;_WinHttpSetCredentials($hRequest, $WINHTTP_AUTH_TARGET_SERVER, $WINHTTP_AUTH_SCHEME_BASIC, "test", "test") ; *****for some reason, this set credentials option lets the connection close***** ;Send request _WinHttpSendRequest($hrequest) ;Wait for the response _WinHttpReceiveResponse($hrequest) ; See if there is data to read If _WinHttpQueryDataAvailable($hrequest) Then ; Read While 1 $sChunk = _WinHttpReadData($hrequest) If @error Then ExitLoop $sData &= $sChunk WEnd endif msgbox(4096,'','The connection should still be open, check netstat') _WinHttpCloseHandle($hRequest) _WinHttpCloseHandle($hConnect) _WinHttpCloseHandle($hOpen) $sdata="" $schunk="" msgbox(4096,'','The connection should now be closed, check netstat') WEnd The issue i'm having is that the initial connection stays open, despite closing all the handles, and if i were to allow the script to continue looping, it would continue sending requests through that same open connection instead of closing it / opening a new one. I have found a couple of 'hacks' to remedy this 1) comment out the WinHttpQueryDataAvailable section 2) Enable the _WinHttpSetCredentials option Both of these workarounds seem to force the connection to close properly. What gives?
-
I have a string "abc123def432ghi678". I'd like to replace each digit in this string with another DIFFERENT digit. Here's what i have so far $string="abc123def432ghi678" if StringRegExp($string,"[0-9]") Then $newstring = StringRegExpReplace($string,"[0-9]",random(0,9,1)) EndIf msgbox(4096,'',$newstring) The problem with this is two fold 1) it replaces each digit with ONE digit that it chooses, so for example, the newstring will turn out to be: abc111def111ghi111 or abc555def555ghi555 etc etc 2) There's no guarentee that it wont replace the digit with a different one, since its just using a random function. How would i go about replacing each digit in the given string, with a different new digit?
-
Counting and sorting duplicate lines
phatzilla replied to phatzilla's topic in AutoIt General Help and Support
To keep it short, what i have is basically for $l = 1 to ubound($unique_array) - 1 global $replace = StringReplace($trend_array_string,$unique_array[$l],"ReplacementString") global $numreplacements = @extended ConsoleWrite("The number of replacements done was : " & $unique_array[$l] & " : " & $numreplacements & @CRLF) Next And the output The number of replacements done was : #SheNeverLeft : 2 The number of replacements done was : #LL2014 : 1 The number of replacements done was : "Between Two Ferns" : 4 The number of replacements done was : #askgrizfolk : 1 The number of replacements done was : #LameApocalypses : 13 The number of replacements done was : "Gold Glove" : 1 The number of replacements done was : "Diwali in India" : 2 The number of replacements done was : "Jeanie Buss" : 4 The number of replacements done was : #1Dbigannouncement : 3 The number of replacements done was : "Avengers 2" : 8 The number of replacements done was : #OnTheRoadAgain1D : 1 The number of replacements done was : #OttawaShooting : 14 The number of replacements done was : #FastFoodSlogans : 1 The number of replacements done was : "Darnell Coles" : 1 The number of replacements done was : #MostClutch : 1 The number of replacements done was : #OnTheRoadAgainTour : 1 The number of replacements done was : "Happy Mole Day" : 6 The number of replacements done was : #OTRATour : 1 The number of replacements done was : #AYASummit : 2 The number of replacements done was : "Lisa Ann" : 1 The number of replacements done was : #StartSitESPN : 1 The number of replacements done was : #BryantAndNashNewVideo : 5 The number of replacements done was : "Kevin Vickers" : 1 The number of replacements done was : #poptech : 1 The number of replacements done was : #AvengersAgeOfUltron : 13 The number of replacements done was : #Engage2014 : 1 The number of replacements done was : Halloween : 9 The number of replacements done was : Canada : 7 The number of replacements done was : Christmas : 3 The number of replacements done was : Scorpio : 1 The number of replacements done was : #indysm : 1 The number of replacements done was : #HappyBirthdayGrandpaGrande : 2 The number of replacements done was : #StealMyGIF : 1 The number of replacements done was : #ICryAtRavesWhen : 6 The number of replacements done was : #AgeofUltron : 5 The number of replacements done was : "White House" : 2 The number of replacements done was : #PandaFunkFamily : 2 The number of replacements done was : "Happy Diwali" : 6 The number of replacements done was : "Frank Ocean" : 1 The number of replacements done was : #ZachGrandtourage : 1 The number of replacements done was : Ottawa : 15 The number of replacements done was : "Kim Possible" : 1 The number of replacements done was : "Lizzie McGuire" : 2 The number of replacements done was : #Paperwork : 1 The number of replacements done was : "Even Stevens" : 1 The number of replacements done was : "Jessica Lange" : 2 The number of replacements done was : "That's So Raven" : 1 The number of replacements done was : "Thinking About You - Frank Ocean" : 2 The number of replacements done was : "Gods & Monsters" : 2 The number of replacements done was : "Edward Mordrake" : 2 The number of replacements done was : "Lurie Poston" : 1 The number of replacements done was : #thankyouvessel : 3 The number of replacements done was : Viscant : 1 The number of replacements done was : "Gods and Monsters" : 1 The number of replacements done was : #WorldSeriesGame2 : 5 The number of replacements done was : "S Club 7" : 1 The number of replacements done was : "Mark Jackson" : 1 The number of replacements done was : "Legally Blonde" : 1 The number of replacements done was : #DontAskBeau : 1 The number of replacements done was : "Nick Swisher" : 1 The number of replacements done was : "Zach Mettenberger" : 2 The number of replacements done was : "Teaser Trail" : 2 The number of replacements done was : PrincAss : 1 The number of replacements done was : #AskBeau : 1 The number of replacements done was : Strickland : 1 The number of replacements done was : #VoightsRage : 1 The number of replacements done was : #tiannaQA : 1 The number of replacements done was : Dora : 1 The number of replacements done was : Patti : 1 The number of replacements done was : #ReplaceAnAnimeTitleWithAss : 3 The number of replacements done was : "One in 5,000" : 1 The number of replacements done was : #CrawfordsNewVideo : 1 The number of replacements done was : #100Things : 1 The number of replacements done was : "My Cinnamon Twist" : 1 The number of replacements done was : Kunitz : 1 The number of replacements done was : "Paranormal Activity 3" : 1 The number of replacements done was : #BabyDaddyChat : 1 The number of replacements done was : Ultron : 19 The number of replacements done was : #NYGovDebate : 1 The number of replacements done was : "Joe Torre" : 1 The number of replacements done was : "Key & Peele" : 1 The number of replacements done was : "Watching Casper" : 1 The number of replacements done was : "Oliver and Thea" : 1 The number of replacements done was : #willmakesushappy : 1 The number of replacements done was : #ignitethegrind : 1 The number of replacements done was : #AskSierraDallas : 1 The number of replacements done was : "James Spader" : 1 The number of replacements done was : Drumline : 2 The number of replacements done was : "Young Thug" : 1 The number of replacements done was : #ASKLOHANTHONY : 1 The number of replacements done was : #Z100Rules : 1 The number of replacements done was : "Nathan Cirillo" : 2 The number of replacements done was : "Michael Zehaf-Bibeau" : 2 The number of replacements done was : "Jersey Shore" : 1 The number of replacements done was : #ListenToGhostOnYouTube : 1 The number of replacements done was : #5SOSAmnesiaLyrics : 2 The number of replacements done was : #BigTimeLyrics : 1 The number of replacements done was : "Ben Bradlee" : 2 The number of replacements done was : Makonnen : 1 The number of replacements done was : Inbox : 3 The number of replacements done was : #ANDvAFC : 1 The number of replacements done was : #yesboo : 1 The number of replacements done was : #SELFIEFORSEB : 1 The number of replacements done was : "Liverpool 0-3 Real Madrid" : 1 The number of replacements done was : Poldi : 1 The number of replacements done was : Podolski : 1 The number of replacements done was : "WHY IS FOOD SO GOOD" : 1 The number of replacements done was : Olympiacos : 1 The number of replacements done was : #AskZachAttack : 2 The number of replacements done was : #LiverpoolVsRealMadrid : 2 The number of replacements done was : #IfICouldTimeTravel : 2 The number of replacements done was : Reus : 1 The number of replacements done was : "Google Inbox" : 2 The number of replacements done was : Coutinho : 1 The number of replacements done was : Mignolet : 1 The number of replacements done was : "You'll Never Walk Alone" : 1 The number of replacements done was : "David J. Stern Sports Scholarship" : 1 The number of replacements done was : Reds : 1 The number of replacements done was : Parliament : 1 So now i have the unique list, with the corresponding amount of occurences. How would i extract the top X lines?