Jump to content

Recommended Posts

Hey guys,

So we recently moved our company Knowledge Base to an in-house solution rather than paying a monthly subscription for someone else to host it and use their features. Either way, we have moved from one KB (Knowledge Base) site to another but there is an issue. No one restricted access to the original KB site which meant anyone was able to edit the site as they pleased. This means that some of the old KB's  features (picture/video/internal links) are still be utilized and I need to find out which pages have links that inevitably go dead once we stop sending them money. 

So, I feel like in 90% of the way to getting this working correctly. Steps are fairly simple:

  1. Log into the KB
  2. Load a list of URL's
  3. Pull the HTML and search for "helpjuice"
  4. Log the URL has been checked
  5. Check for links on this page and check it against the list of URL's
  6. Log any new URL's that are missing from the list

Now, I have no idea if this is the best way of doing this process but, again, I've made it 90% of the way so far and I would like to figure this specific problem out and if anyone has a better/more effective method of doing this, please point me in the right direction!

Here is the Code i have so far.

#include <File.au3>
#include <FileConstants.au3>
#include <WinAPIFiles.au3>
#include <IE.au3>
#include <MsgBoxConstants.au3>
#include <Array.au3>


Global $aGlobalLinks[1][3] = [["Link", "Checked", "Hit"]] ;Create Array with Headers
;//Load KB and login by submit the login form.
Local $oIE = _IECreate ("https://website.com/kb/")
Local $oForm = _IEFormGetObjByName($oIE, "login-form")
Local $oEmail = _IEFormElementGetObjByName($oForm, "email")
Local $oPassword = _IEFormElementGetObjByName($oForm, "password")
_IEFormElementSetValue($oEmail, "email@email.com")
_IEFormElementSetValue($oPassword, "password")
Sleep(500)
_IEFormSubmit($oForm)
;_IEQuit($oIE)

;//Load a second window to confirm it shows we are logged in. Grab a list of links as a jumping off point if the KB_URL.txt is empty
;//Mainly used for Debugging
Local $oIE = _IECreate ("https://website.com/kb/")
$oLinks = _IELinkGetCollection($oIE)
For $oLink In $oLinks
    _ArraySearch($aGlobalLinks, $oLink.href)
    If @error = 6 Then
        Local $aFill[1][3] = [[$oLink.href, "No", "No"]]
        _ArrayAdd($aGlobalLinks, $aFill)
    EndIf
Next
;//Close second window of IE
_IEQuit($oIE)
;//Attempt to load the KB_URL.txt
LoadFile()
;//Setting $r to 1 since the txt file and the $aGlobalLinks will have a header in index 0
Global $r = 1
Do
    If $aGlobalLinks[$r][1] = "No" Then ;Skip Entries that have already been checked
        Local $oIE = _IECreate($aGlobalLinks[$r][0], 0, 1, 0) ;Create new window with the first 'unchecked' link entry
        _IELoadWait($oIE, 100, 2500) ;Wait 2.5 seconds for page to load
        If @error = 6 Then ;@error = 6 means the Timeout was met..Send {ESC} to stop the loading
            ConsoleWrite("Webpage timed out...Sending ESC..." & @CRLF)
            Send("{ESC}")
            Sleep(500)
        EndIf
        Local $sHTML = _IEDocReadHTML($oIE) ;Read the HTML and look for any trace of 'helpjuice'
        Local $result = StringRegExp($sHTML, ".*?helpjuice.*?", 0)
        ConsoleWrite("RegExt result: " & $result & @CRLF)

        If $result = 1 Then ;A result of 1 means there is a match
            $aGlobalLinks[$r][1] = "Yes" ;Set 'Checked' to 'Yes'
            $aGlobalLinks[$r][2] = "Yes" ;Set 'Hit' to 'Yes'
            ConsoleWrite("~~~~~~~~~~~~~ H I T ~~~~~~~~~~~~~ " & $aGlobalLinks[$r][0] & @CRLF)
        Else
            $aGlobalLinks[$r][1] = "Yes" ;Set 'Checked' to 'Yes'
        EndIf

        $oLinks = _IELinkGetCollection($oIE) ;Grab a list of links from the existing site
        ;_ArrayDisplay($oLinks)

        For $oLink In $oLinks ;Loop through all the links found and add any new links to the end of $aGlobalLinks
            ConsoleWrite("String: " & String($oLink) & @CRLF)
            ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
            _ArraySearch($aGlobalLinks, $oLink.href)
            If @error = 6 Then
                Local $aFill[1][3] = [[$oLink.href, "No", "No"]] ;Set 'Checked' and 'Hit' to 'No' because this link has not been checked yet
                _ArrayAdd($aGlobalLinks, $aFill)
            EndIf
        Next

        _IEQuit($oIE)

        ;This next part is to reduce the amount times we write to the file
        ;Changing the '10' to '100' means the script will save the changes every 100 entries
        If IsInt($r/10) = 1 Then
            SaveFile()
            ConsoleWrite("File Saved: " & $r & "/" & UBound($aGlobalLinks) & @CRLF)
        EndIf

        ConsoleWrite("Completed " & $r & "/" & UBound($aGlobalLinks) & @CRLF)
    EndIf

    $r += 1
Until $r > UBound($aGlobalLinks)

;_ArrayDisplay($aGlobalLinks)


Func SaveFile()
    $oFile = FileOpen(@ScriptDir & "\KB_URLs.txt", 2)
    FileClose($oFile)
    _FileWriteFromArray(@ScriptDir & "\KB_URLs.txt", $aGlobalLinks, 0, UBound($aGlobalLinks), ",")
EndFunc


Func LoadFile()
    _FileReadToArray(@ScriptDir & "\KB_URLs.txt", $aGlobalLinks, $FRTA_NOCOUNT, ",")
    _ArrayDisplay($aGlobalLinks)
EndFunc

 

The error I'm getting has to do with the function _IELinkGetCollection. I used the examples in the AutoIT Help section and there is multiple uses of $oLink.href. I haven't been able to find much on when/how to use the .href.

Here is the Console Output of the error:
 

RegExt result: 0
Looking for: https://website.com/kb/
Looking for: https://website.com/kb/207557-abc-bank-homepage#
Looking for: https://website.com/kb/
Looking for: https://website.com/kb/19011-partners-and-isos
Looking for: https://website.com/kb/46470-onsite-install-partners
Looking for: https://website.com/en/small-business/payments-and-processing/abc-merchant-services.html
Looking for: https://website.com/Clover
Looking for: https://website.com/screens/signup/?integrations_id=12345
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/appmarket/apps/Z6GMBJ5HCBEQA?clientCountry=US
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel3a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel4a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel5a
"C:\Users\Jon\Desktop\KB Scrub\HTMLDOC_Test.au3" (65) : ==> The requested action with this object has failed.:
ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
ConsoleWrite("Looking for: " & $oLink^ ERROR

Any insight is appreciated!

Link to post
Share on other sites

Use IsObj to check that if it is a object for example:

$oLinks = _IELinkGetCollection($oIE) ;Grab a list of links from the existing site
        If IsObj($oLinks) Then
            For $oLink In $oLinks ;Loop through all the links found and add any new links to the end of $aGlobalLinks
                If IsObj($oLink) THen
                    ConsoleWrite("String: " & String($oLink) & @CRLF)
                    ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
                    _ArraySearch($aGlobalLinks, $oLink.href)
                    If @error = 6 Then
                        Local $aFill[1][3] = [[$oLink.href, "No", "No"]] ;Set 'Checked' and 'Hit' to 'No' because this link has not been checked yet
                        _ArrayAdd($aGlobalLinks, $aFill)
                    EndIf
                EndIf
             Next
         EndIf

 

Link to post
Share on other sites
47 minutes ago, Subz said:

Use IsObj to check that if it is a object for example:

$oLinks = _IELinkGetCollection($oIE) ;Grab a list of links from the existing site
        If IsObj($oLinks) Then
            For $oLink In $oLinks ;Loop through all the links found and add any new links to the end of $aGlobalLinks
                If IsObj($oLink) THen
                    ConsoleWrite("String: " & String($oLink) & @CRLF)
                    ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
                    _ArraySearch($aGlobalLinks, $oLink.href)
                    If @error = 6 Then
                        Local $aFill[1][3] = [[$oLink.href, "No", "No"]] ;Set 'Checked' and 'Hit' to 'No' because this link has not been checked yet
                        _ArrayAdd($aGlobalLinks, $aFill)
                    EndIf
                EndIf
             Next
         EndIf

 

Ya it was just to double check the error wasnt related to the object. 

EDIT: Sorry i thought you were asking why IsObj was in there. I had that line in there before but removed it since the error is specifically when using the .href property. 

Edited by Kidney
Link to post
Share on other sites
23 hours ago, Nine said:

You could try using :

$sHRef = $olink.getAttribute("href")
if $sHRef = "" then
    ; href not found
else
    ; href found
endif

 

Nine,

I implemented your suggestion into the Do Until loop with some console outputs. Here is the updated Do Until loop:
 

Local $r = 1
Do
    If $aGlobalLinks[$r][1] = "No" Then
        ConsoleWrite("Opening: " & $aGlobalLinks[$r][0] & @CRLF)
        Local $oIE = _IECreate($aGlobalLinks[$r][0], 0, 1, 0)
        _IELoadWait($oIE, 100, 2500)
        If @error = 6 Then
            ConsoleWrite("Webpage timed out...Sending ESC..." & @CRLF)
            Send("{ESC}")
            Sleep(500)
        EndIf
        Local $sHTML = _IEDocReadHTML($oIE)
        Local $result = StringRegExp($sHTML, ".*?helpjuice.*?", 0)
        ConsoleWrite("RegExt result: " & $result & " (0 = No Match, 1 = Match)" & @CRLF)

        If $result = 1 Then
            $aGlobalLinks[$r][1] = "Yes"
            $aGlobalLinks[$r][2] = "Yes"
            ConsoleWrite("~~~~~~~~~~~~~ H I T ~~~~~~~~~~~~~ " & $aGlobalLinks[$r][0] & @CRLF)
        Else
            $aGlobalLinks[$r][1] = "Yes"
        EndIf

        $oLinks = _IELinkGetCollection($oIE)
        _ArrayDisplay($oLinks)

        For $oLink In $oLinks
            $sHRef = $oLink.getAttribute("href")
            if $sHRef = "" then
                ConsoleWrite("This bitch empty..." & @CRLF)
            else
                ConsoleWrite("$sHRef: " & $sHRef & @CRLF)
            endif
            ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
            _ArraySearch($aGlobalLinks, $oLink.href)
            If @error = 6 Then
                Local $aFill[1][3] = [[$oLink.href, "No", "No"]]
                $lastIndex = _ArrayAdd($aGlobalLinks, $aFill)
            EndIf
        Next

        _IEQuit($oIE)

        If IsInt($r/10) = 1 Then
            SaveFile()
            ConsoleWrite("File Saved: " & $r & "/" & UBound($aGlobalLinks) & @CRLF)
        EndIf

        ConsoleWrite("Completed " & $r & "/" & UBound($aGlobalLinks) & @CRLF)
    EndIf

    $r += 1
Until $r > UBound($aGlobalLinks)

 

The resulting console output is kinda cluttered but hopefully it's easy enough to follow. The majority of the lines are to verify the $sHRef to the $oLink.href

Opening: https://website.com/kb/207557-abc-bank-homepage
RegExt result: 0 (0 = No Match, 1 = Match)
$sHRef: https://website.com/kb/
Looking for: https://website.com/kb/
$sHRef: #
Looking for: https://website.com/kb/207557-abc-bank-homepage#
$sHRef: https://website.com/kb/
Looking for: https://website.com/kb/
$sHRef: https://website.com/kb/19011-partners-and-isos
Looking for: https://website.com/kb/19011-partners-and-isos
$sHRef: https://website.com/kb/46470-onsite-install-partners
Looking for: https://website.com/kb/46470-onsite-install-partners
$sHRef: https://www.abc.com/en/small-business/payments-and-processing/abc-merchant-services.html
Looking for: https://www.abc.com/en/small-business/payments-and-processing/abc-merchant-services.html
$sHRef: /flower
Looking for: https://website.com/flower
$sHRef: https://www.website.com/screens/signup/?integrations_id=5MUL96
Looking for: https://www.website.com/screens/signup/?integrations_id=5MUL96
$sHRef: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
$sHRef: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
$sHRef: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
Looking for: https://website.com/instruction/import-an-inventory-menu-spreadsheet/?userDevice=web
$sHRef: https://www.flower.com/appmarket/apps/Z6GMBJ5HCBEQA?clientCountry=US
Looking for: https://www.flower.com/appmarket/apps/Z6GMBJ5HCBEQA?clientCountry=US
$sHRef: #panel3a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel3a
$sHRef: #panel4a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel4a
$sHRef: #panel5a
Looking for: https://website.com/kb/207557-abc-bank-homepage#panel5a
$sHRef: //email@lastproblem.com
"F:\Transfer folder\KB Scrub\HTMLDOC_Test.au3" (64) : ==> The requested action with this object has failed.:
ConsoleWrite("Looking for: " & $oLink.href & @CRLF)
ConsoleWrite("Looking for: " & $oLink^ ERROR

 

Looks like the issue is with a listed email address, which i am not interested in adding to the list of links to check anyways. Maybe i should try and implement a check for external links that go outside of our KB. The problem is that i would most likely need to check the $oLink.href to see if it is in reference to the KB. An example of what I'm talking about is in the output listed above but i'll paste it below as well:
 

$sHRef: /flower
Looking for: https://website.com/flower

Since $sHRef doesnt pull the full URL like $oLink.href does, I would need to find out what $sHRef is referencing, correct? I'm not very literate in HTML so I might need to start learning more on that front to make some headway.

Still curious if anyone has any input on the matter though! Thanks for the help!

Link to post
Share on other sites

Well use this then

$sHRef = $oLink.getAttribute("href")
if $sHRef = "" then
   ConsoleWrite("This bitch empty..." & @CRLF)
else
   $sHref = StringLeft ($sHref, 8) = "//email@" ? "" : $oLink.href ; in other words check for the href that failed
endif
ConsoleWrite("Looking for: " & $sHref & @CRLF)

 

Link to post
Share on other sites
16 minutes ago, Nine said:

Well use this then

$sHRef = $oLink.getAttribute("href")
if $sHRef = "" then
   ConsoleWrite("This bitch empty..." & @CRLF)
else
   $sHref = StringLeft ($sHref, 8) = "//email@" ? "" : $oLink.href ; in other words check for the href that failed
endif
ConsoleWrite("Looking for: " & $sHref & @CRLF)

 

So this error actually highlighted an issue with the porting method used from the old KB to the new internal KB. The problem seems to be the way our developers handled emails/hyperlinks that were in a table or that got put into a table. Either way, here is an example of what the page source looks like with the email entry:
 

image.png.ab7d863452fbd08a12ba1dc6663b4089.png

And here is the result of on the page:
image.png.d36e61c150e8cca88dee3f96d72f8562.png

 

 

So, what I've decided is to remove the hyperlinks for email addresses and i was able to proceed with more pages. So now i'll be able to kill 2 birds with 1 program! 😋

Edited by Kidney
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Mbee
      Hi!
      I'm using Maps (and I love 'em!), so I have to use the latest AutoIt Beta. I've switched to Beta mode, and the graphical debugger doesn't even show up in the Explorer context menu (not surprising). So I'd like to use _Dbug (rather than a large number of MsgBox statements), but it fails when it encounters a Map function (such as MapExists). I've made a request for a version that will work with the Beta, but going on past history, I don't expect an answer anytime soon.
      I'm not necessarily asking for someone else to modify _Dbug, because I can probably do it myself, as long as I know how to adapt it to use the Beta. Can someone please enlighten me as to how to adapt a UDF or other function to use AutoIt Beta?
      Thanks!
    • By mLipok
      Here is just some usefull script analyzer:
      #include <ColorConstants.au3> #include <GUIConstantsEx.au3> #include <MsgBoxConstants.au3> #include <StaticConstants.au3> #include <TreeViewConstants.au3> #include <WindowsConstants.au3> #include <File.au3> #include <Array.au3> ; just put a FileFullPath to one of your project _UsedInclude_API() _GetAllDependencies("c:\Program Files (x86)\AutoIt3\SciTE\SciTE Jump\SciTE Jump.au3") _UsedInclude_API() _GetAllDependencies("c:\Program Files (x86)\AutoIt3\SciTE\AutoIt3Wrapper\AutoIt3Wrapper.au3") _UsedInclude_API() _GetAllDependencies("c:\Program Files (x86)\AutoIt3\SciTE\SciTEConfig\SciteConfig.au3" ) _UsedInclude_API() _GetAllDependencies(@ScriptFullPath) Func _GetAllDependencies($sFileToCheck) GUICreate("My GUI with treeview", 500, @DesktopHeight - 40) Local $idTreeview = GUICtrlCreateTreeView(6, 6, 488, @DesktopHeight - 40 - 12, BitOR($TVS_HASBUTTONS, $TVS_HASLINES, $TVS_LINESATROOT, $TVS_DISABLEDRAGDROP, $TVS_SHOWSELALWAYS), $WS_EX_CLIENTEDGE) Local $idDisplayitem = GUICtrlCreateTreeViewItem($sFileToCheck, $idTreeview) GUICtrlSetColor(-1, $COLOR_GREEN) __UsedIncludeToTreeView($sFileToCheck, $idDisplayitem) Local $hItem = GUICtrlGetHandle($idDisplayitem) GUICtrlSendMsg($idTreeview, $TVM_EXPAND, $TVE_TOGGLE, $hItem) GUISetState(@SW_SHOW) Local $idMsg ; Loop until the user exits. While 1 $idMsg = GUIGetMsg() Select Case $idMsg = $GUI_EVENT_CLOSE ExitLoop EndSelect WEnd EndFunc ;==>_GetAllDependencies Func _GetUsedIncludeToArray($sAU3Content) Local $aIncludes = StringRegExp($sAU3Content, '(?im)^\s*#include\s?[''""<](.*)\.au3[''"">]', 3) If @error Then Return SetError(@error, @extended, '') Else Return SetError(0, 0, $aIncludes) EndIf EndFunc ;==>_GetUsedIncludeToArray Func __UsedIncludeToTreeView($sFileToCheck, $idTreeview_ref) $hFile = FileOpen($sFileToCheck, $FO_READ) $sAU3Content = FileRead($hFile) FileClose($hFile) Local $aIncludes = _GetUsedIncludeToArray($sAU3Content) If @error Then Return SetError(@error, @extended, '') Else Local $idDisplayitem, $iNumberOfOccurrences = 0 For $iInclude_Idx = 0 To UBound($aIncludes) - 1 $iNumberOfOccurrences = _UsedInclude_API($aIncludes[$iInclude_Idx]) If $iNumberOfOccurrences = 0 Then $idDisplayitem = GUICtrlCreateTreeViewItem($aIncludes[$iInclude_Idx], $idTreeview_ref) __UsedIncludeToTreeView(_GetDir($sFileToCheck) & $aIncludes[$iInclude_Idx] & '.au3', $idDisplayitem) ElseIf $iNumberOfOccurrences = 1 Then $idDisplayitem = GUICtrlCreateTreeViewItem($aIncludes[$iInclude_Idx] & ' (Was used before: once)', $idTreeview_ref) GUICtrlSetColor(-1, $COLOR_RED) ElseIf $iNumberOfOccurrences = 2 Then $idDisplayitem = GUICtrlCreateTreeViewItem($aIncludes[$iInclude_Idx] & ' (Was used before: twice)', $idTreeview_ref) GUICtrlSetColor(-1, $COLOR_PURPLE) ElseIf $iNumberOfOccurrences > 2 Then $idDisplayitem = GUICtrlCreateTreeViewItem($aIncludes[$iInclude_Idx] & ' (Was used before more then twice)', $idTreeview_ref) GUICtrlSetColor(-1, $COLOR_BLUE) EndIf Next EndIf EndFunc ;==>__UsedIncludeToTreeView Func _UsedInclude_API($sIncludeFileName = Default) Local Static $sIncludeAPI_Static = '|' ; reset If $sIncludeFileName = Default Then $sIncludeAPI_Static = '|' Return EndIf StringReplace($sIncludeAPI_Static, '|' & $sIncludeFileName & '|', '|' & $sIncludeFileName & '|') Local $iNumberOfReplacements = @extended $sIncludeAPI_Static &= $sIncludeFileName & '|' Return SetError(0, 0, $iNumberOfReplacements) EndFunc ;==>_UsedInclude_API Func _GetDir($sFileFullPath) Local $sDrive = "", $sDir = "", $sFileName = "", $sExtension = "" Local $aPathSplit = _PathSplit($sFileFullPath, $sDrive, $sDir, $sFileName, $sExtension) Return $sDrive & $sDir EndFunc ;==>_GetDir  
      Have fun.
      mLipok
       
    • By mati
      Hi,
      I wonder if
      AutoIt3Wrapper_run_debug_mode = Y can include run information about functions which are called within the main code. So far I noticed in the console that the function itself is called but no specific run information from  inside the function is provided. Is there a chance to show the entire debug log including functions in the console?
      I furthermore noticed a stop of information when a loop is entered. What could be the reason for that?
       
      Thanks for help.
    • By van_renier
      ================================
      #include <IE.au3>
      #include <MsgBoxConstants.au3>

      Local $oIE = _IE_Example("basic")
      Local $oLinks = _IELinkGetCollection($oIE)
      Local $iNumLinks = @extended

      Local $sTxt = $iNumLinks & " links found" & @CRLF & @CRLF
      For $oLink In $oLinks
      $sTxt &= $oLink.href & @CRLF
      Next
      MsgBox($MB_SYSTEMMODAL, "Link Info", $sTxt)
      ================================
      How do I find out what the other properties are?
      i.e.  $oLink.href
      href is a property that the script is looping through. How do I find what other properties exist ? ( I'm interested in correlating the text/image for each href, how do I find out what the name of the property for the text is?
      Thanks,
      Van
×
×
  • Create New...