Jump to content

Recommended Posts

Hey guys,

So we recently moved our company Knowledge Base to an in-house solution rather than paying a monthly subscription for someone else to host it and use their features. Either way, we have moved from one KB (Knowledge Base) site to another but there is an issue. No one restricted access to the original KB site which meant anyone was able to edit the site as they pleased. This means that some of the old KB's  features (picture/video/internal links) are still be utilized and I need to find out which pages have links that inevitably go dead once we stop sending them money. 

So, I feel like in 90% of the way to getting this working correctly. Steps are fairly simple:

  1. Log into the KB
  2. Load a list of URL's
  3. Pull the HTML and search for "helpjuice"
  4. Log the URL has been checked
  5. Check for links on this page and check it against the list of URL's
  6. Log any new URL's that are missing from the list

Now, I have no idea if this is the best way of doing this process but, again, I've made it 90% of the way so far and I would like to figure this specific problem out and if anyone has a better/more effective method of doing this, please point me in the right direction!

Here is the Code i have so far.

#include <File.au3>
#include <FileConstants.au3>
#include <WinAPIFiles.au3>
#include <IE.au3>
#include <MsgBoxConstants.au3>
#include <Array.au3>


Global $aGlobalLinks[1][3] = [["Link", "Checked", "Hit"]] ;Create Array with Headers
;//Load KB and login by submit the login form.
Local $oIE = _IECreate ("https://website.com/kb/")
Local $oForm = _IEFormGetObjByName($oIE, "login-form")
Local $oEmail = _IEFormElementGetObjByName($oForm, "email")
Local $oPassword = _IEFormElementGetObjByName($oForm, "password")
_IEFormElementSetValue($oEmail, "email@email.com")
_IEFormElementSetValue($oPassword, "password")
Sleep(500)
_IEFormSubmit($oForm)
;_IEQuit($oIE)

;//Load a second window to confirm it shows we are logged in. Grab a list of links as a jumping off point if the KB_URL.txt is empty
;//Mainly used for Debugging
Local $oIE = _IECreate ("https://website.com/kb/")
$oLinks = _IELinkGetCollection($oIE)
For $oLink In $oLinks
    _ArraySearch($aGlobalLinks, $oLink.href)
    If @error = 6 Then
        Local $aFill[1][3] = [[$oLink.href, "No", "No"]]
        _ArrayAdd($aGlobalLinks, $aFill)
    EndIf
Next
;//Close second window of IE
_IEQuit($oIE)
;//Attempt to load the KB_URL.txt
LoadFile()
;//Setting $r to 1 since the txt file and the $aGlobalLinks will have a header in index 0
Global $r = 1
Do
    If $aGlobalLinks[$r][1] = "No" Then ;Skip Entries that have already been checked
        Local $oIE = _IECreate($aGlobalLinks[$r][0], 0, 1, 0) ;Create new window with the first 'unchecked' link entry
        _IELoadWait($oIE, 100, 2500) ;Wait 2.5 seconds for page to load
        If @error = 6 Then ;@error = 6 means the Timeout was met..Send {ESC} to stop the loading
            ConsoleWrite("Webpage timed out...Sending ESC..." & @CRLF)
            Send("{ESC}")
            Sleep(500)
        EndIf
        Local $sHTML = _IEDocReadHTML($oIE) ;Read the HTML and look for any trace of 'helpjuice'
        Local $result = StringRegExp($sHTML, ".*?helpjuice.*?", 0)
        ConsoleWrite("RegExt result: " & $result & @CRLF)

        If $result = 1 Then ;A result of 1 means there is a match
            $aGlobalLinks[$r][1] = "Yes" ;Set 'Checked' to 'Yes'
            $aGlobalLinks[$r][2] = "Yes" ;Set 'Hit' to 'Yes'
            ConsoleWrite("~~~~~~~~~~~~~ H I T ~~~~~~~~~~~~~ " & $aGlobalLinks[$r][0] & @CRLF)
        Else
            $aGlobalLinks[$r][1] = "Yes" ;Set 'Checked' to 'Yes'
        EndIf

        $oLinks = _IELinkGetCollection($oIE) ;Grab a list of links from the existing site
        ;_ArrayDisplay($oLinks)

        For $oLink In $oLinks ;Loop through all the links found and add any new links to the end of $aGlobalLinks
            ConsoleWrite("String: " & String($oLink) & @CRLF)
            ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
            _ArraySearch($aGlobalLinks, $oLink.href)
            If @error = 6 Then
                Local $aFill[1][3] = [[$oLink.href, "No", "No"]] ;Set 'Checked' and 'Hit' to 'No' because this link has not been checked yet
                _ArrayAdd($aGlobalLinks, $aFill)
            EndIf
        Next

        _IEQuit($oIE)

        ;This next part is to reduce the amount times we write to the file
        ;Changing the '10' to '100' means the script will save the changes every 100 entries
        If IsInt($r/10) = 1 Then
            SaveFile()
            ConsoleWrite("File Saved: " & $r & "/" & UBound($aGlobalLinks) & @CRLF)
        EndIf

        ConsoleWrite("Completed " & $r & "/" & UBound($aGlobalLinks) & @CRLF)
    EndIf

    $r += 1
Until $r > UBound($aGlobalLinks)

;_ArrayDisplay($aGlobalLinks)


Func SaveFile()
    $oFile = FileOpen(@ScriptDir & "\KB_URLs.txt", 2)
    FileClose($oFile)
    _FileWriteFromArray(@ScriptDir & "\KB_URLs.txt", $aGlobalLinks, 0, UBound($aGlobalLinks), ",")
EndFunc


Func LoadFile()
    _FileReadToArray(@ScriptDir & "\KB_URLs.txt", $aGlobalLinks, $FRTA_NOCOUNT, ",")
    _ArrayDisplay($aGlobalLinks)
EndFunc

 

The error I'm getting has to do with the function _IELinkGetCollection. I used the examples in the AutoIT Help section and there is multiple uses of $oLink.href. I haven't been able to find much on when/how to use the .href.

Here is the Console Output of the error:
 

RegExt result: 0
Looking for: https://website.com/kb/
Looking for: https://website.com/kb/207557-abc-bank-homepage#
Looking for: https://website.com/kb/
Looking for: https://website.com/kb/19011-partners-and-isos
Looking for: https://website.com/kb/46470-onsite-install-partners
Looking for: https://website.com/en/small-business/payments-and-processing/abc-merchant-services.html
Looking for: https://website.com/Clover
Looking for: https://website.com/screens/signup/?integrations_id=12345
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/appmarket/apps/Z6GMBJ5HCBEQA?clientCountry=US
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel3a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel4a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel5a
"C:\Users\Jon\Desktop\KB Scrub\HTMLDOC_Test.au3" (65) : ==> The requested action with this object has failed.:
ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
ConsoleWrite("Looking for: " & $oLink^ ERROR

Any insight is appreciated!

Link to comment
Share on other sites

Use IsObj to check that if it is a object for example:

$oLinks = _IELinkGetCollection($oIE) ;Grab a list of links from the existing site
        If IsObj($oLinks) Then
            For $oLink In $oLinks ;Loop through all the links found and add any new links to the end of $aGlobalLinks
                If IsObj($oLink) THen
                    ConsoleWrite("String: " & String($oLink) & @CRLF)
                    ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
                    _ArraySearch($aGlobalLinks, $oLink.href)
                    If @error = 6 Then
                        Local $aFill[1][3] = [[$oLink.href, "No", "No"]] ;Set 'Checked' and 'Hit' to 'No' because this link has not been checked yet
                        _ArrayAdd($aGlobalLinks, $aFill)
                    EndIf
                EndIf
             Next
         EndIf

 

Link to comment
Share on other sites

47 minutes ago, Subz said:

Use IsObj to check that if it is a object for example:

$oLinks = _IELinkGetCollection($oIE) ;Grab a list of links from the existing site
        If IsObj($oLinks) Then
            For $oLink In $oLinks ;Loop through all the links found and add any new links to the end of $aGlobalLinks
                If IsObj($oLink) THen
                    ConsoleWrite("String: " & String($oLink) & @CRLF)
                    ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
                    _ArraySearch($aGlobalLinks, $oLink.href)
                    If @error = 6 Then
                        Local $aFill[1][3] = [[$oLink.href, "No", "No"]] ;Set 'Checked' and 'Hit' to 'No' because this link has not been checked yet
                        _ArrayAdd($aGlobalLinks, $aFill)
                    EndIf
                EndIf
             Next
         EndIf

 

Ya it was just to double check the error wasnt related to the object. 

EDIT: Sorry i thought you were asking why IsObj was in there. I had that line in there before but removed it since the error is specifically when using the .href property. 

Edited by Kidney
Link to comment
Share on other sites

23 hours ago, Nine said:

You could try using :

$sHRef = $olink.getAttribute("href")
if $sHRef = "" then
    ; href not found
else
    ; href found
endif

 

Nine,

I implemented your suggestion into the Do Until loop with some console outputs. Here is the updated Do Until loop:
 

Local $r = 1
Do
    If $aGlobalLinks[$r][1] = "No" Then
        ConsoleWrite("Opening: " & $aGlobalLinks[$r][0] & @CRLF)
        Local $oIE = _IECreate($aGlobalLinks[$r][0], 0, 1, 0)
        _IELoadWait($oIE, 100, 2500)
        If @error = 6 Then
            ConsoleWrite("Webpage timed out...Sending ESC..." & @CRLF)
            Send("{ESC}")
            Sleep(500)
        EndIf
        Local $sHTML = _IEDocReadHTML($oIE)
        Local $result = StringRegExp($sHTML, ".*?helpjuice.*?", 0)
        ConsoleWrite("RegExt result: " & $result & " (0 = No Match, 1 = Match)" & @CRLF)

        If $result = 1 Then
            $aGlobalLinks[$r][1] = "Yes"
            $aGlobalLinks[$r][2] = "Yes"
            ConsoleWrite("~~~~~~~~~~~~~ H I T ~~~~~~~~~~~~~ " & $aGlobalLinks[$r][0] & @CRLF)
        Else
            $aGlobalLinks[$r][1] = "Yes"
        EndIf

        $oLinks = _IELinkGetCollection($oIE)
        _ArrayDisplay($oLinks)

        For $oLink In $oLinks
            $sHRef = $oLink.getAttribute("href")
            if $sHRef = "" then
                ConsoleWrite("This bitch empty..." & @CRLF)
            else
                ConsoleWrite("$sHRef: " & $sHRef & @CRLF)
            endif
            ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
            _ArraySearch($aGlobalLinks, $oLink.href)
            If @error = 6 Then
                Local $aFill[1][3] = [[$oLink.href, "No", "No"]]
                $lastIndex = _ArrayAdd($aGlobalLinks, $aFill)
            EndIf
        Next

        _IEQuit($oIE)

        If IsInt($r/10) = 1 Then
            SaveFile()
            ConsoleWrite("File Saved: " & $r & "/" & UBound($aGlobalLinks) & @CRLF)
        EndIf

        ConsoleWrite("Completed " & $r & "/" & UBound($aGlobalLinks) & @CRLF)
    EndIf

    $r += 1
Until $r > UBound($aGlobalLinks)

 

The resulting console output is kinda cluttered but hopefully it's easy enough to follow. The majority of the lines are to verify the $sHRef to the $oLink.href

Opening: https://website.com/kb/207557-abc-bank-homepage
RegExt result: 0 (0 = No Match, 1 = Match)
$sHRef: https://website.com/kb/
Looking for: https://website.com/kb/
$sHRef: #
Looking for: https://website.com/kb/207557-abc-bank-homepage#
$sHRef: https://website.com/kb/
Looking for: https://website.com/kb/
$sHRef: https://website.com/kb/19011-partners-and-isos
Looking for: https://website.com/kb/19011-partners-and-isos
$sHRef: https://website.com/kb/46470-onsite-install-partners
Looking for: https://website.com/kb/46470-onsite-install-partners
$sHRef: https://www.abc.com/en/small-business/payments-and-processing/abc-merchant-services.html
Looking for: https://www.abc.com/en/small-business/payments-and-processing/abc-merchant-services.html
$sHRef: /flower
Looking for: https://website.com/flower
$sHRef: https://www.website.com/screens/signup/?integrations_id=5MUL96
Looking for: https://www.website.com/screens/signup/?integrations_id=5MUL96
$sHRef: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
$sHRef: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
$sHRef: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
$sHRef: https://www.flower.com/appmarket/apps/Z6GMBJ5HCBEQA?clientCountry=US
Looking for: https://www.flower.com/appmarket/apps/Z6GMBJ5HCBEQA?clientCountry=US
$sHRef: #panel3a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel3a
$sHRef: #panel4a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel4a
$sHRef: #panel5a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel5a
$sHRef: //email@lastproblem.com
"F:\Transfer folder\KB Scrub\HTMLDOC_Test.au3" (64) : ==> The requested action with this object has failed.:
ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
ConsoleWrite("Looking for: " & $oLink^ ERROR

 

Looks like the issue is with a listed email address, which i am not interested in adding to the list of links to check anyways. Maybe i should try and implement a check for external links that go outside of our KB. The problem is that i would most likely need to check the $oLink.href to see if it is in reference to the KB. An example of what I'm talking about is in the output listed above but i'll paste it below as well:
 

$sHRef: /flower
Looking for: https://website.com/flower

Since $sHRef doesnt pull the full URL like $oLink.href does, I would need to find out what $sHRef is referencing, correct? I'm not very literate in HTML so I might need to start learning more on that front to make some headway.

Still curious if anyone has any input on the matter though! Thanks for the help!

Link to comment
Share on other sites

Well use this then

$sHRef = $oLink.getAttribute("href")
if $sHRef = "" then
   ConsoleWrite("This bitch empty..." & @CRLF)
else
   $sHref = StringLeft ($sHref, 8) = "//email@" ? "" : $oLink.href ; in other words check for the href that failed
endif
ConsoleWrite("Looking for: " & $sHref & @CRLF)

 

Link to comment
Share on other sites

16 minutes ago, Nine said:

Well use this then

$sHRef = $oLink.getAttribute("href")
if $sHRef = "" then
   ConsoleWrite("This bitch empty..." & @CRLF)
else
   $sHref = StringLeft ($sHref, 8) = "//email@" ? "" : $oLink.href ; in other words check for the href that failed
endif
ConsoleWrite("Looking for: " & $sHref & @CRLF)

 

So this error actually highlighted an issue with the porting method used from the old KB to the new internal KB. The problem seems to be the way our developers handled emails/hyperlinks that were in a table or that got put into a table. Either way, here is an example of what the page source looks like with the email entry:
 

image.png.ab7d863452fbd08a12ba1dc6663b4089.png

And here is the result of on the page:
image.png.d36e61c150e8cca88dee3f96d72f8562.png

 

 

So, what I've decided is to remove the hyperlinks for email addresses and i was able to proceed with more pages. So now i'll be able to kill 2 birds with 1 program! 😋

Edited by Kidney
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...