Sign in to follow this  
Followers 0
GEOSoft

Find Links In Html Files

6 posts in this topic

#1 ·  Posted (edited)

I know, there are lots of apps out there that will check links in a file. The problem is that none of them find links in the <head> section (as in Javascript links). This one will do it. It finds all local and external links including bookmarks. This is just the beginning of a project. By tomorrow it will also find links to image files and you will be able to use checkboxes to select the type of links to extract. In the end the idea is to have it verify and repair local links as well as write all selected link types to an INI file so that you can modify external links and import them back into your HTML files.

http://dundats.mvps.org/autoit/

Right now the text file that is the output will be created in the same filder as your initial file but I might change that by allowing the user to select the folder where the file should be saved. The output file also shows the line number that the link was found on (so that it will be easier to push a modified link back into the file.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites



Here is the update

  • The script now extracts image source links
  • The type of links to extract is now selectable
  • Results are written to an INI file instead of a txt file
The last Item was changed primarily for the next stage (verify/repair) of the project. I'll get busy on that in a day or so. I just have to figure out how to read relative links like ../../myfile.htm. Take a look and give me some feedback please. Remember that this script is primarily for use on HTML files.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Well the Verification now works. It will find and verify all links on a page. External, local, image and bookmark. And the Links are all written to an INI file by LineNum as is a list of the broken links. But now the real fun starts. trying to figure out how to replace a string in a line by line number. We really need a FileWriteLineNum(file,LineNum) function. :think: The only way I see right now is going to be FileReadLine(File1,LineNum) FileWriteLine($File2,LineNum) (replacing each broken link along the way) with a line counter starting at 0. I haven't looked at it that closely yet but I have a hunch I might have to run that against the page 4 times (1 for each section of the INI file).

The INI structure is

[section]

Line#=Link

Line#=Link

etc=etc

and then on to the next section.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

We really need a FileWriteLineNum(file,LineNum) function.

;========================================
;Function name:    _FileWriteToLine
;Description:        Write text to specified line in a file
;Parameters:
;                    $sFile - The file to write to
;                    $iLine - The line number to write to
;                    $sText - The text to write
;                    $fOverWrite - if set to 1 will overwrite the old line
;                    if set to 0 will not overwrite
;Requirement(s):      None
;Return Value(s):    On success - 1
;                     On Failure - 0 And sets @ERROR
;                               @ERROR = 1 - File has less lines than $iLine
;                               @ERROR = 2 - File does not exist
;                               @ERROR = 3 - Error opening file
;                               @ERROR = 4 - $iLine is invalid
;                               @ERROR = 5 - $fOverWrite is invalid
;                               @ERROR = 6 - $sText is invalid
;Author(s):        cdkid
;Note(s):
;=========================================
Func _FileWriteToLine($sFile, $iLine, $sText, $fOverWrite = 0)
    If $iLine <= 0 Then
        SetError(4)
        Return 0
    EndIf
    If Not IsString($sText) Then
        SetError(6)
        Return 0
    EndIf
    If $fOverWrite <> 0 And $fOverWrite <> 1 Then
        SetError(5)
        Return 0
    EndIf
    If Not FileExists($sFile) Then
        SetError(2)
        Return 0
    EndIf
    Local $filtxt = FileRead($sFile, FileGetSize($sFile))
    $filtxt = StringSplit($filtxt, @CRLF, 1)
    If UBound($filtxt, 1) < $iLine Then
        SetError(1)
        Return 0
    EndIf
    Local $fil = FileOpen($sFile, 2)
    If $fil = -1 Then
        SetError(3)
        Return 0
    EndIf
    For $i = 1 To UBound($filtxt) - 1
        If $i = $iLine Then
            If $fOverWrite = 1 Then
                If $sText <> '' Then
                    FileWrite($fil, $sText & @CRLF)
                Else
                    FileWrite($fil, $sText)
                EndIf
            EndIf
            If $fOverWrite = 0 Then
                FileWrite($fil, $filtxt[$i] & @CRLF)
            EndIf
        ElseIf $i < UBound($filtxt, 1) - 1 Then
            FileWrite($fil, $filtxt[$i] & @CRLF)
        ElseIf $i = UBound($filtxt, 1) - 1 Then
            FileWrite($fil, $filtxt[$i])
        EndIf
    Next
    FileClose($fil)
    Return 1
EndFunc  ;==>_FileWriteToLine

its in Beta\include\\File.au3

hope that helps

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

its in Beta\include\\File.au3

hope that helps

8)

tnxs Valuater

I've been looking at that UDF for a couple of days now and I might be able to work with it. There is a lot of checking in the code that I won't need so I can pare it down quite a bit. What I really need is to be able to just push a replacement string to a known line number without having to re-read the whole file. The only checking I need is already written, using FileGetTime() to make sure that the file has not been changed since the first read. If it has then a message box telling you to re-read the file before continuing (because if the file has changed then line numbers may also have changed.)

It would be nice if AI3 had a PushToLine function but that's pobably asking a bit much of the developers.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

It would be nice if AI3 had a PushToLine function but that's pobably asking a bit much of the developers.

well... now you can create a new UDF for the next Beta release

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0