Jump to content

compare how many items are equal in twox strings


Recommended Posts

darkshark,

I figured out enough about regex to make this thing work. Here's what I have:

2 arrays, each consisting of 5000 rows, each row containing 15 discrete numbers from 1 to 25, not sorted

The code will count the number of times 11,12,13,14,15 are in both arrays at the same offset.

EDIT:

This has been changed to count the # of time 1 through 25 occur. Run time increased to .25 seconds.

The code runs in 200-400 milliseconds depending on processor load (that includes writing to 3 files and gathering debug data).

As soon as I can figure out how to post the code I will.

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

darkshark,

Here's the code (patience, these forums and I do NOT get along).

I've included the code that generates the test data as well. It's obvious how this works when you see it.

#include <date.au3>
#include <array.au3>
#include <timers.au3>

; generate test data
;gen_test_data()

; run match test
test_data()

func test_data()

    local $a10  = stringsplit(fileread("d:\atest10.txt"),@crlf,3)
    local $a20  = stringsplit(fileread("d:\atest20.txt"),@crlf,3)
    local $mtch = 0, $out,$mtcho = ''

    local $st = timerinit()

    ; set upper limit to smallest array in case they are not equal

    if ubound($a10) > ubound($a20) then
        $ulimit = ubound($a20)
    else
        $ulimit = ubound($a10)
    endif

    ; darkshark
    ; this for loop is the guts of the match.  It requires two 0 based arrays, how you get there is up to you
    ; the string that I am constructing is to check accuracy (output to file at end of loop).

    for $i = 0 to $ulimit - 1

        for $j = 1 to 25
            if StringRegExp($a10[$i],$j) then
                if StringRegExp($a20[$i],$j) then
                    $mtch += 1                      ;<---- if you are in this part of the code you do something with the match
                    $mtcho &= ' ' & $j              ;<---- I am just counting the matches and constructing a string for output
                EndIf
            EndIf
        next

        $out &= 'offset(' & $i & ') = ' & $mtch & ' matches   / matched on ' & $mtcho & @crlf
        $mtch  = 0
        $mtcho = ''

    next

    consolewrite(int(timerdiff($st)) / 1000 & ' sec to run ' & $ulimit & ' entries' & @crlf)

    filedelete('d:\atest30.txt')
    filewrite('d:\atest30.txt',$out)

endfunc

func gen_test_data()

    if fileexists("d:\atest10.txt") then filedelete("d:\atest10.txt")
    if fileexists("d:\atest20.txt") then filedelete("d:\atest20.txt")


    local $cnt = 5000, $a10[1],$a20[1],$tmpnum = 0
    local $st = _Timer_Init()

    for $i = 0 to $cnt
        for $j = 1 to 15
            while stringinstr($a10[$i],$tmpnum)
                $tmpnum = random(1,25,1)
            wend
            $a10[$i] &= $tmpnum
            if $j < 15 then $a10[$i] &= ','
        Next
        redim $a10[ubound($a10)+1]
    Next
    filewrite("d:\atest10.txt",_arraytostring($a10,@crlf))

    for $i = 0 to $cnt
        for $j = 1 to 15
            while stringinstr($a20[$i],$tmpnum)
                $tmpnum = random(1,25,1)
            wend
            $a20[$i] &= $tmpnum
            if $j < 15 then $a20[$i] &= ','
        Next
        redim $a20[ubound($a20)+1]
    Next
    filewrite("d:\atest20.txt",_arraytostring($a20,@crlf))

    consolewrite('Time to generate test arrays = ' & (int(timerdiff($st))) / 1000 & " Secs" & @crlf)

endfunc

I hope that worked, and Good Luck!

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Here is SQLite version.

#include <array.au3>
#include <SQLite.au3>
#include <SQLite.dll.au3>

Global $FinArray, $iRows, $iColumns, $sql_cmd

$string1 = "01,02,03,04,05,06,07,08,09,10,11,12,13,14,15"
$string2 = "06,07,08,09,10,11,12,13,14,15,16,17,18,19,20"

$sArray1 = StringSplit ($string1 , ",")
$sArray2 = StringSplit ($string2 , ",")

_SQLite_Startup ()
_SQLite_Open () ; Open a :memory: database
_SQLite_Exec (-1, "BEGIN;")
_SQLite_Exec (-1, "CREATE TABLE work (value);")
_SQLite_Exec (-1, "CREATE INDEX ix1 on work (value);")

; group 1
For $i = 1 to $sArray1[0]
    $sql_cmd &= "INSERT INTO work VALUES ('" & $sArray1[$i] & "');"
Next
_SQLite_Exec (-1, $sql_cmd)
$sql_cmd = ''
    
; group 2
For $i = 1 to $sArray2[0]
    $sql_cmd &= "INSERT INTO work VALUES ('" & $sArray2[$i] & "');"
Next
_SQLite_Exec (-1, $sql_cmd)

_SQLite_Exec (-1, "COMMIT;")
; two ways, the same result
_SQLite_GetTable2d (-1, "SELECT value, count(*) FROM work GROUP BY value HAVING count(*) > 1;", $FinArray, $iRows, $iColumns) 
;~ _SQLite_GetTable2d (-1, "SELECT distinct value FROM work t1 WHERE (SELECT count(*) FROM work t2 WHERE t2.value = t1.value) > 1 ORDER BY value;", $FinArray, $iRows, $iColumns)
_SQLite_Close()
_SQLite_Shutdown()

_ArrayDisplay ($FinArray, 'Equals number: ' & $iRows) ; number is in title, the same as ubound($FinArray)-1

EDIT: It can be optimized by using StringRegExpReplace instead of StringSplit + FOR NEXT loop for preparing SQL INSERT statements.

EDIT2: Advantage of this way will be evident when you have big number of items/numbers.

Edited by Zedna
Link to comment
Share on other sites

@Zedna: I'm absolutely not familiar with SQLite. Very nice example to learn more about SQLite!

Thanks for sharing!

Br,

UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

How do you copy code with color?

Go to Test thread

http://www.autoitscript.com/forum/topic/98043-test-thread/

and try to use full edit and there is AU3 icon button. Just select your code and press this buttton

Or you may do it "by hand":

[autoit ] your code here [/autoit ]

note there is extra space inside brackets just to show syntax here

Link to comment
Share on other sites

I tested speed of scripts posted in this topic compared together:

#include <array.au3>
#include <Math.au3>
#include <SQLite.au3>
#include <SQLite.dll.au3>

Global $count = 10000
Global $number 
Global $string1, $string2
Global $FinArray, $iRows, $iColumns, $sql_cmd

; construct random arrays
For $i = 1 to $count 
    $string1 &= Random(1, 25, 1) & ','
Next
For $i = 1 to $count 
    $string2 &= Random(1, 25, 1) & ','
Next

$sArray1 = StringSplit ($string1 , ",")
$sArray2 = StringSplit ($string2 , ",")

; ******************************

$start = TimerInit()
_SQLite_Startup ()
_SQLite_Open () ; Open a :memory: database
_SQLite_Exec (-1, "BEGIN;")
_SQLite_Exec (-1, "CREATE TABLE work (value);")
_SQLite_Exec (-1, "CREATE INDEX ix1 on work (value);")

; group 1
For $i = 1 to $sArray1[0]
    $sql_cmd &= "INSERT INTO work VALUES ('" & $sArray1[$i] & "');"
Next
_SQLite_Exec (-1, $sql_cmd)
$sql_cmd = ''
    
; group 2
For $i = 1 to $sArray2[0]
    $sql_cmd &= "INSERT INTO work VALUES ('" & $sArray2[$i] & "');"
Next
_SQLite_Exec (-1, $sql_cmd)

_SQLite_Exec (-1, "COMMIT;")
; two ways, the same result
_SQLite_GetTable2d (-1, "SELECT value, count(*) FROM work GROUP BY value HAVING count(*) > 1;", $FinArray, $iRows, $iColumns) 
;~ _SQLite_GetTable2d (-1, "SELECT distinct value FROM work t1 WHERE (SELECT count(*) FROM work t2 WHERE t2.value = t1.value) > 1 ORDER BY value;", $FinArray, $iRows, $iColumns)
_SQLite_Close()
_SQLite_Shutdown()
$time = TimerDiff($start)

ConsoleWrite('Test 1 size:' & $count & ' Number:' & $iRows & ' time:' & Round($time,0) & @CRLF)
; number and time is in title, the same as ubound($FinArray)-1
;~ _ArrayDisplay ($FinArray, 'Number:' & $iRows & ' time:' & Round($time,0)) 
msgbox (0 , 'Test 1 size:' & $count, 'Number:' & $iRows  & ' time:' & Round($time,0))

; ******************************

$number = 0
Global $iIndex1, $iIndex2 = 1

$start = TimerInit()

_ArraySort($sArray1, 0, 1)
_ArraySort($sArray2, 0, 1)

For $iIndex1 = 1 To $sArray1[0]
    For $iIndex2 = $iIndex2 To $sArray2[0]
        If $sArray1[$iIndex1] < $sArray2[$iIndex2] Then
            $iIndex2 = _Max($iIndex2 - 1, 1)
            ExitLoop
        EndIf
        If $sArray1[$iIndex1] = $sArray2[$iIndex2] Then
            $number += 1
            ExitLoop
        EndIf
    Next
Next

$time = TimerDiff($start)

ConsoleWrite('Test 2 size:' & $count & ' Number:' & $number & ' time:' & Round($time,0) & @CRLF)
msgbox (0 , 'Test 2 size:' & $count , 'Number:' & $number & ' time:' & Round($time,0))

; ******************************

Dim $FinArray[1]

$start = TimerInit()

For $i = 1 to $sArray1[0]
    For $k = 1 to $sArray2[0]
        If $sArray1[$i] = $sArray2[$k] Then
            _ArrayAdd ($FinArray , $sArray1[$i])
            ExitLoop
        Endif
    Next
Next

;~ _ArrayDelete ($FinArray , 0)
$number = UBound($FinArray) - 1
$time = TimerDiff($start)
;~ _ArrayDisplay ($FinArray)

ConsoleWrite('Test 3 size:' & $count & ' Number:' & $number & ' time:' & Round($time,0) & @CRLF)
msgbox (0 , 'Test 3 size:' & $count , 'Number:' & $number & ' time:' & Round($time,0))

; ******************************

RESULTS: Winner is of course SQLite

size is Ubound of both arrays

time is in miliseconds

number is result

Test 1 size:1000 Number:26 time:119

Test 2 size:1000 Number:1001 time:431

Test 3 size:1000 Number:1001 time:2032

Test 1 size:10000 Number:26 time:1142

Test 2 size:10000 Number:10001 time:5585

Test 3 size:10000 Number:10001 time:210665

Test 1 size:100000 Number:26 time:12583

Test 2 size:100000 Number:100001 time:90013

Test 3 size:100000 don't try this :-)

Edited by Zedna
Link to comment
Share on other sites

Zedna,

your test is against arrays that look like this:

arr1[1] = 12 arr2[1] = 5

arr1[2] = 19 arr2[2] = 4

. .

. .

. .

arr1[1000] = 3 arr2[1000] = 11

I believe that darkshark wants to compare strings that look like:

arr1[1] = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 arr2[1] = 25,24,23,22,21,20,19,18,17,16,15,14,13,12,11

. .

. .

. .

arr1[1000] = 1,25,2,24,3,23,4,22,5,21,6,20,7,19,8 arr2[1000] = 1,3,5,7,9,11,13,15,17,19,21,23,25,2,4

and he has indicated that there are no dups within any single element.

The example code that I have compares each respective element within the two arrays for dups. E.G.

arr1[3] is compared to arr3[3]

arr1[2504] is compared to arr2[2504]

Numbers in common within respective array elements are counted/reported.

The test case I use is 2 arrays with 5000 elements each. Each element is comprised of 15 discrete numbers.

Test times using this data vary from .22 to .25 seconds, depending on cpu load.

@darkshark - let us know if either of these solutions fit your needs.

thanks,

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

sorry guys, yesterday i hadleave and had no time to warn

Well, thank you all for the replies, I am very happy for helping me!

Zedna thank you for SQLITE, I had never really used and is very good for many arrays with rows!

and kylomas, thank you so much for your code, I found it well thought out, I had not thought of doing so before!

Thank you to everyone who replied to the questions!

i love this forum :P:x

Link to comment
Share on other sites

Here a benchmark with my code:

#include <Array.au3>

$max = 50000
$string1 = "11,12,13,14,15"
Dim $aString[$max + 1][5] = [[11,12,13,14,15]]
Local $sArray[$max + 1]
$sArray[0] = $max

For $i = 1 To $max
    For $j = 0 To 14
        $sArray[$i] &= Random(1, 25, 1) & ","
    Next
    $sArray[$i]  = StringLeft($sArray[$i], StringLen($sArray[$i]) - 1)
Next

$benchmark = TimerInit()
$aS1 = StringSplit($string1, ",", 2)
For $h = 1 To $sArray[0]
    $string = ""
    $c = 0
    For $i = 0 To UBound($aS1) - 1
        If StringInStr("," & $sArray[$h] & ",", "," & $aS1[$i] & ",") Then
            StringReplace("," & $sArray[$h] & ",",   "," & $aS1[$i] & ",",    "," & $aS1[$i] & ",")
            $aString[$h][$i] = @extended
            $c += 1
        EndIf
    Next
Next
ConsoleWrite(Round(TimerDiff($benchmark) / 1000 , 4) & " seconds" & @CRLF)
_ArrayDisplay($aString)

It will create 50000 random entries with numbers from 1 to 25 randomly per line and compare each line with the numbers 11-15.

The result is a new array with the count of numbers found per line.

It took on my notebook (Intel Core i5 540M @2.5 GHz) on Win7 x64 approx. 3.1362 seconds.

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

I modified the code appropriately for the numbers 1-25.

If you look to the array $aString you will see the amount of numbers (11-15) found. Each line of the array $aString reflects the original array whereas the values within the cells are the amount of numbers found. $aString[0][0] is the header with the numbers 11-15.

Br,

UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

I'm going to leave this alone.  Obviously I don't understand what the issue is.  If I've wasted your time, I apologize.

My SQLite example probably doesn't fullfill his exact specification

but it very clearly show principles how to search for duplicates in arrays using SQLite.

So if he wants he can accomodate my example for his needs accordingly.

Edited by Zedna
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...