Jump to content

Check PDF for broken links


Recommended Posts

Hello,

I am wondering if anyone could provide some guidance on how to check a pdf file for broken links (internal or external)?

I think the process would go something like:

1) scan the document for a link

2) click the link

3) make sure it went where it was supposed to go

4) repeat until EOF

I understand how to do this with web pages, but I need help understanding how to apply it to a pdf file.

Thanks!

Link to comment
Share on other sites

I have use "pdftotext.exe" a pdf command line tool for get links and open them in default browser.

if it can help you.Posted Image

$_PdfFilePath = @DesktopDir & '\any.pdf'
$_PdfToTextPath = @DesktopDir & '\pdftotext.exe' 
RunWait ( '"' & $_PdfToTextPath & '" "' & $_PdfFilePath & '" C:\file.tmp', '', @SW_HIDE )
$_ArrayLinks = StringRegExp ( FileRead ( 'C:\file.tmp' ), '(?s)(?i)http://(.*?) ', 3 )
For $_I = 0 To UBound ( $_ArrayLinks ) -1
    $_ArrayLinks[$_I] = "http://" & $_ArrayLinks[$_I]
    ConsoleWrite ( $_I +1 & " link : " & $_ArrayLinks[$_I] & @Crlf )
    ShellExecute ( $_ArrayLinks[$_I] )
Next

AutoIt 3.3.14.2 X86 - SciTE 3.6.0WIN 8.1 X64 - Other Example Scripts

Link to comment
Share on other sites

wakillon,

Thanks for the suggestion. However, the links inside the pdf aren't "spelled out" in the document. This means that when I run pdftotext against it, I just get the text and not the actual link.

This tool sounds like it will do what you need. It outputs a file in the format:

Link Text|Hyperlink

Link Text2|Hyperlink2

Trial version is limited but at least you can see if it does what you need before you buy...

Link to comment
Share on other sites

wakillon,

Thanks for the suggestion. However, the links inside the pdf aren't "spelled out" in the document. This means that when I run pdftotext against it, I just get the text and not the actual link.

Did you try my solution ?

It extracts all urls found in a pdf.

Edited by wakillon

AutoIt 3.3.14.2 X86 - SciTE 3.6.0WIN 8.1 X64 - Other Example Scripts

Link to comment
Share on other sites

Did you try my solution ?

It extracts all urls found in a pdf.

Yes, I did try the solution and if the full link was in the pdf it was captured. However, links that do not have the full path visible to the reader (internal links that go to another part of the document, for example) were not captured.

It turns out that all links are listed if you open the pdf in a text editor. However, I have not been able to decipher how they are mapped within the document.

Thanks again for your help.

Link to comment
Share on other sites

wakillon, I'm far below from an expert. :mellow:

ctgilbert, if the pdf is not encoded, you can do some RegExp for "/URI /URI (http://www.autoitscript.com)", for external links. For other types of links (within document, outside the document, for opening a file, for playing multimedia etc) this is much more difficult because of the multitude of action types.

Link to comment
Share on other sites

wakillon, I'm far below from an expert. :mellow:

ctgilbert, if the pdf is not encoded, you can do some RegExp for "/URI /URI (http://www.autoitscript.com)", for external links. For other types of links (within document, outside the document, for opening a file, for playing multimedia etc) this is much more difficult because of the multitude of action types.

Stop modesty, if you are not a pdf expert, who is ? Posted Image

AutoIt 3.3.14.2 X86 - SciTE 3.6.0WIN 8.1 X64 - Other Example Scripts

Link to comment
Share on other sites

  • 3 years later...

Hi All,

I have 10 pdf's x-linking to approx 1000-1500 pdf docs, (all in 1 large folder) and I need to check the links are all good, (approx 1000-1500 links). The links were created as "link to a file" NOT to a web address and the link itself does not show after converting to text using pdftotext. 

Any ideas on a tool or method to check "link to file" points to the correct document? Is the is the document there?

Also, to add to the difficulty, some links are in the bookmarks and not in the pdf content!

Cheers

Yab

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...