Sign in to follow this  
Followers 0
ame1011

Comparison of Bodies of Text

4 posts in this topic

#1 ·  Posted (edited)

Hi there, to give a little background, I work for a computer store and we sell parts to other companies and we also have a retail store. My job is 90% web design and the rest of the time I am able to venture out and create apps (web or otherwise, read: autoit) to make our lives easier here at the office. Our website contains our whole part database and each part should have an image attached to it, a large quantity of parts do not have images at the moment because the system that was in place before I arrived required the company to manually insert all images.

Having said that, I have an app here that I created using autoit that works well but I'm really anal when it comes to programming and I want everything to work efficiently and quickly.

This app in question reads our parts database from an excel file and gathers the name/description field of a certain part and if it's an asus part, it looks it up on the asus site. Yes, the name and description field are combined into one field, it was done incorrectly from the start and there are too many items in there to start doing it the proper way now. What my program currently does to look up the part is take the first word of our description (which is usually the name of the item or part of the name of the item) and do a search on the asus site. It then stores the names, descriptions and images for all items returned by the search query and iterates through them to determine which one is the best match to my description. It then downloads the proper corresponding Image for my part and stores it locally.

My calculate scores method is shown below, it decides which item from the search query best matches my part, can anyone see a better way of doing this? Better meaning both faster and more accurate? If both can be achieved at once.

Func calculateScores($sDesc)
    ;Compare with asus descs and titles, calculate scores
    displayStatusTraytip("Calculating best match for:" & $sDesc)
    for $x = 1 to UBound($w_titles)-1
        ;if no image found for this item, continue the loop
        if NOT $w_images[$x] then ContinueLoop
        ;break our desc into array, 1 word per element
        $aOurDesc = StringSplit($sDesc, " ")
        ;break their title and desc into array, 1 word per element
        $aWTitle = StringSplit($w_titles[$x], " ")
        $aWDetails = StringSplit($w_details[$x], " ")
        
        ;determine how many of the words in their title are found in our desc
        $scoreSub = 0
        $scoreTotal = 0
        for $y = 1 to $aWTitle[0]
            for $z = 1 to $aOurDesc[0]
                If $aWTitle[$y] == $aOurDesc[$z] Then $scoreSub += 1
            Next
        Next
        ;mult score by 2 for title to give it more strength because it's more important
        $scoreSub *= 2
        ;divide by number of words in our desc
        $scoreSub /= $aOurDesc[0]
        ;set total score to value
        $scoreTotal = $scoreSub
        ;reset subtotal score
        $scoreSub = 0
        
        ;determine how many of the words in their desc are found in our desc
        for $y = 1 to $aWDetails[0]
            for $z = 1 to $aOurDesc[0]
                If $aWDetails[$y] == $aOurDesc[$z] Then $scoreSub += 1
            Next
        Next
        ;divide by number of words in our desc
        $scoreSub /= $aOurDesc[0]
        ;add to total score
        $scoreTotal += $scoreSub
        ;increase scores array size
        ReDim $w_scores[UBound($w_scores)+1]
        ;add score info to scores array
        $w_scores[UBound($w_scores)-1] = $scoreTotal
    Next
    ;best match
    $chosenMatch = _ArrayMaxIndex($w_scores,1,1)
    Return $chosenMatch 
EndFunc
Edited by ame1011

[font="Impact"] I always thought dogs laid eggs, and I learned something today. [/font]

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I've heard of algorithms that calculate the number of operations required to convert one string so that it is exactly identical to another and giving it a score based on those calculations. Anyone heard of this? Does anyone know of another string comparison algorithm?

Edited by ame1011

[font="Impact"] I always thought dogs laid eggs, and I learned something today. [/font]

Share this post


Link to post
Share on other sites

well I've altered it a bit, it does a few more comparisons. It only takes max 100ms for the function if there are many results so I don't think speed is a factor. However is there any way to increase it's accuracy?

Func calculateScores($sDesc)
    
    displayStatusTraytip("Calculating best match for:" & $sDesc)
    
    ;iterate through all search results
    for $x = 1 to UBound($w_titles)-1
        ;if no image found for this item, continue the loop
        if NOT $w_images[$x] then ContinueLoop
        ;break our desc into array, 1 word per element
        $aOurDesc = StringSplit($sDesc, " ")
        ;break their title and desc into array, 1 word per element
        $aWTitle = StringSplit($w_titles[$x], " ")
        $aWDetails = StringSplit($w_details[$x], " ")
        
        ;score vars
        $scoreSub = 0
        $scoreTotal = 0
        
        ;----------------THEIR TITLE <=> OUR DESC (EXACT)--------------
        for $y = 1 to $aWTitle[0]
            for $z = 1 to $aOurDesc[0]
                If $aWTitle[$y] == $aOurDesc[$z] Then $scoreSub += 1
            Next
        Next
        ;mult score by 2 for title to give it more strength because it's more important
        $scoreSub *= 2
        ;divide by number of words in our desc
        $scoreSub /= $aOurDesc[0]
        ;set total score to value
        $scoreTotal += $scoreSub
        ;reset subtotal score
        $scoreSub = 0
        
        ;----------------OUR DESC => THEIR TITLE (STRINGINSTR)----------------
        for $y = 1 to $aOurDesc[0]
            for $z = 1 to $aWTitle[0]
                If StringLen($aOurDesc[$y]) > 1 And StringLen($aWTitle[$z]) > 1 And StringInStr($aWTitle[$z], $aOurDesc[$y]) Then $scoreSub += .5
            Next
        Next
        ;mult score by 2 for title to give it more strength because it's more important
        $scoreSub *= 2
        ;divide by number of words in our desc
        $scoreSub /= $aOurDesc[0]
        ;set total score to value
        $scoreTotal += $scoreSub
        ;reset subtotal score
        $scoreSub = 0
        
        ;----------------THEIR TITLE => OUR DESC (STRINGINSTR)----------------
        for $y = 1 to $aWTitle[0]
            for $z = 1 to $aOurDesc[0]
                If StringLen($aWTitle[$y]) > 1 And StringLen($aOurDesc[$z]) > 1 And StringInStr($aOurDesc[$z], $aWTitle[$y]) Then $scoreSub += .5
            Next
        Next
        ;mult score by 2 for title to give it more strength because it's more important
        $scoreSub *= 2
        ;divide by number of words in their title
        $scoreSub /= $aWTitle[0]
        ;set total score to value
        $scoreTotal += $scoreSub
        ;reset subtotal score
        $scoreSub = 0
        
        ;-------------------THEIR DESC <=> OUR DESC (EXACT)-------------------
        for $y = 1 to $aWDetails[0]
            for $z = 1 to $aOurDesc[0]
                If $aWDetails[$y] == $aOurDesc[$z] Then $scoreSub += 1
            Next
        Next
        ;divide by number of words in our desc
        $scoreSub /= $aOurDesc[0]
        ;add to total score
        $scoreTotal += $scoreSub
        ;reset subtotal score
        $scoreSub = 0
        
        ;---------------------OUR DESC => THEIR DESC (STRINGINSTR)-------------
        for $y = 1 to $aOurDesc[0]
            for $z = 1 to $aWDetails[0]
                If StringLen($aOurDesc[$y]) > 1 And StringLen($aWDetails[$z]) > 1 And StringInStr($aWDetails[$z], $aOurDesc[$y]) Then $scoreSub += .5
            Next
        Next
        ;divide by number of words in our desc
        $scoreSub /= $aOurDesc[0]
        ;set total score to value
        $scoreTotal += $scoreSub
        ;reset subtotal score
        $scoreSub = 0
        
        ;----------------THEIR DESC => OUR DESC (STRINGINSTR)----------------
        for $y = 1 to $aWDetails[0]
            for $z = 1 to $aOurDesc[0]
                If StringLen($aWDetails[$y]) > 1 And StringLen($aOurDesc[$z]) > 1 And StringInStr($aOurDesc[$z], $aWDetails[$y]) Then $scoreSub += .5
            Next
        Next
        ;divide by number of words in their desc
        $scoreSub /= $aWDetails[0]
        ;set total score to value
        $scoreTotal += $scoreSub
        ;reset subtotal score
        $scoreSub = 0
        
        ;increase scores array size
        ReDim $w_scores[UBound($w_scores)+1]
        ;add score info to scores array
        $w_scores[UBound($w_scores)-1] = $scoreTotal
        
    Next
    
    ;best match
    $chosenMatch = _ArrayMaxIndex($w_scores,1,1)
    Return $chosenMatch 
    
EndFunc

[font="Impact"] I always thought dogs laid eggs, and I learned something today. [/font]

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Without getting into too much detail, shouldn't you be using StringInString() to compare things?

Unless you manufacturer has duplicate part numbers, which I doubt...Just string strip your stings so that your description string is shorter and contains the part number, the manufacturer string is complete. And then go:

If StringInStr($ManufacturerString, $OurString) Then
;There is our match
   DoStuff()
EndIf

That's what I do anyway...

Edit: Nevermind, you are doing that already....

Edited by Oldschool

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0