Jump to content
Sign in to follow this  
JonF

Modify properties of an existing PDF file

Recommended Posts

There's several topics and UDFs I've found that create PDF files, but I don't see anything that changes the properties of an existing PDF. I've tried PDFTK and neither update_info or update_info_utf8 change the properties as displayed in Acrobat.

We want to file a whole bunch of PDFs of journal articles as references, with the Title property set to the title of the paper, the Author property set to the author(s) names, the Subject property set to the journal reference in an extremely fixed format, and the Keywords set to the complete citation ready for copying and pasting. For example:

Title: Tests for skewness, kurtosis, and normality for time series data

Author: Bai J, Ng S

Subject: J Bus Econ Stat. 2005 Jan;23(1):49-60

Keywords: Bai J, Ng S. Tests for skewness, kurtosis, and normality for time series data. J Bus Econ Stat. 2005 Jan;23(1):49-60.

Then we name the file with the authors, a hyphen, and the title:

Bai J, Ng S-Tests for skewness, kurtosis, and normality for time series data.pdf

Collecting this information is a pain, and often one wants to scroll through the article while entering this info into a dialog box. Obviously an AutoIT program can create the keywords field and the filename from the other properties, and there may be opportunities to help out putting the rest of the information together.

Any way to write these properties to a PDF, or do I have to open Acrobat and the Properties dialog and punch the values in? (I do have full Acrobat).

Share this post


Link to post
Share on other sites

Mmmh,

I shortly tested it with PdfTk. It can dump out the data, but it seems the update works only for userdefined, not general, meta data / InfoKeys.

However Adobe (full Version can do - have a look at the Acro-JS help file),

mbtPdfAsm can do - you may also download the GUI = BeCyPdfAsm for quick testing -

and also BeCyPdfMetaEdit can do it. With all you can work with batch jobs.

Urls you can find with search here or in the Intranet. I'm just a little bit in hurry.

HTH, Reinhard

Share this post


Link to post
Share on other sites

IF your pdf has not outlines (those have /Title also), here's a fast hack, no error checking:

#include <Array.au3>;just for display the arrays

_Test()

Func _Test()
Local $sFile = @ScriptDir & "\PDFReference13.pdf"

Local $aOldData = _PDF_GetProperties($sFile)
_ArrayDisplay($aOldData)

Local $aNewData[6][2] = [["Title", "New Title"],["Producer", "New Producer"],["Author", "New Author"],["Creator", "New Creator"],["Subject", "New Subject"],["Keywords", "New keywords"]]

Local $sNewFile = _PDF_SetProperties($sFile, $aOldData, $aNewData)
Local $aCheck = _PDF_GetProperties($sNewFile)
_ArrayDisplay($aCheck)
EndFunc ;==>_Test

Func _PDF_GetProperties($sFile)
Local $a_Prop[6][2] = [["Title", ""],["Producer", ""],["Author", ""],["Creator", ""],["Subject", ""],["Keywords", ""]]
Local $hFile = FileOpen($sFile)
Local $sTxt = FileRead($hFile)
FileClose($hFile)
Local $title = StringRegExp($sTxt, "(?i)(/Title) {0,1}\((.*?)\)", 1)
If @error = 1 Then
$a_Prop[0][1] = "no match"
Else
$a_Prop[0][1] = $title[1]
EndIf
Local $producer = StringRegExp($sTxt, "(?i)(/Producer) {0,1}\((.*?)\)", 1)
If @error = 1 Then
$a_Prop[1][1] = "no match"
Else
$a_Prop[1][1] = $producer[1]
EndIf
Local $author = StringRegExp($sTxt, "(?i)(/Author) {0,1}\((.*?)\)", 1)
If @error = 1 Then
$a_Prop[2][1] = "no match"
Else
$a_Prop[2][1] = $author[1]
EndIf
Local $creator = StringRegExp($sTxt, "(?i)(/Creator) {0,1}\((.*?)\)", 1)
If @error = 1 Then
$a_Prop[3][1] = "no match"
Else
$a_Prop[3][1] = $creator[1]
EndIf
Local $subject = StringRegExp($sTxt, "(?i)(/Subject) {0,1}\((.*?)\)", 1)
If @error = 1 Then
$a_Prop[4][1] = "no match"
Else
$a_Prop[4][1] = $subject[1]
EndIf
Local $keywords = StringRegExp($sTxt, "(?i)(/Keywords) {0,1}\((.*?)\)", 1)
If @error = 1 Then
$a_Prop[5][1] = "no match"
Else
$a_Prop[5][1] = $keywords[1]
EndIf
Return $a_Prop
EndFunc ;==>_PDF_GetProperties

Func _PDF_SetProperties($sFile, $aOld, $aNew)
Local $hFile = FileOpen($sFile)
Local $sTxt = FileRead($hFile)
FileClose($hFile)
For $i = 0 To UBound($aOld) - 1
If $aOld[$i][1] <> "no match" Or $aOld[$i][1] <> "" Then
$sTxt = StringRegExpReplace($sTxt, "(?i)(/" & $aOld[$i][0] & ") {0,1}\((.*?)\)", "/" & $aOld[$i][0] & " (" & $aNew[$i][1] &") ", 1)
EndIf
Next
Local $sFileName = StringRegExpReplace($sFile, ".*\\(.*).{4}", "$1")
Local $sNewFile = StringReplace($sFile, $sFileName, $sFileName & "_mod.pdf")
Local $hNew = FileOpen($sNewFile, 18)
FileWrite($hNew, $sTxt)
FileClose($hNew)
Return $sNewFile
EndFunc ;==>_PDF_SetProperties

It will create a new pdf with the name original_mod.pdf and with the new properties.

I suck at RegExp, but this script is tested on several pdf's, WITHOUT outlines (in this case the Title property is altered, the rest is ok).

If you use parentheses within a field, e.g. John (Cheese) Doe, the right replacement is John \(Cheese\) Doe.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By diff
      Hello,
      still learning and trying to understand AutoIT but having problem in filling my PDF file.
       
      So my code looks like similar to this:
      Global $1 = "text text 44444444" Global $2 = "texting2 texting2" Global $3 = "newtext3 next3" ShellExecute ("C:\Users\XXX\Desktop\myPDF.pdf") WinWaitActive("MyPDF.pdf - Adobe Acrobat Reader DC") Send ("{TAB}") ClipPut($1) Send ("^v") Send ("{TAB 3}") ClipPut($2) Send("^v") Send ("{TAB}") ClipPut($3) Send("^v") So its fill my PDF form, the first field looks good, the code add the text text 4444, then second should be $2 with texting2 texting2 but for some reason the code uses for second and third field after TAB only variable $3.
      So, I receive in $2 and $3 for some reason same newtext3 next3 in both, why its skipping the variable $2? Maybe there also much better solution for instant text? Because Send writes with delay by letters which I don't like.
      Thanks!
    • By Fenzik
      Hello!
      i wrote this function as alternative to using the Com Object or Commandline version of this project, discussed also earlyer on this forum.
      Project site - http://ebstudio.info/home/xdoc2txt.html
      Advantage of this implementation is that you do not need to register Com dll, using regsvr32.
      But you still need the project Dll (xd2txlib.dll).
      Enjoy!
      ; #FUNCTION# ==================================================================================================================== ; Name ..........: _ExtractText ; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...) ; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]]) ; Parameters ....: $sFilename - a string value. ; $bProperties - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text. ; $hDll - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call. ; Return value .: String, containing the text or documment properties or empty string and Error as follows: ;1 - The file does not exists. ;2 - Error during opening xd2txlib.dll. ;3 - No text returned. ; Author ........: Fenzik ; Modified ......: ; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html ; Related .......: ; Link ..........: ; Example .......: No ; =============================================================================================================================== Func _ExtractText($sFilename, $bProperties = False, $hDll = 0) If Not FileExists($sFilename) Then Return SetError(1, "", "") Local $bLoaded = False If $hDll = 0 Then $hDll = DllOpen(@scriptdir&"\xd2txlib.dll") If $hDll = -1 Then Return SetError(2, "", "") $bLoaded = True Endif $aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "") If $aResult[0] = 0 Then Return SetError(3, "", "") If $bLoaded = True Then DllClose($hDll) Return $aResult[3] EndFunc  
       
      xd2txlib-example.zip
    • By MakzNovice
      Hello Experts,
      I have Zero experience with Autoit + Adobe Acrobat, and I really in need to get this working as PoC.
      I am trying to automate some manual actions below are the steps I would like to do.
      INPUT to script : 
      1. PDF file to open
      2. String that I would like to add as \\Server\Directory name
      Steps : 
      1. Open the file in Adobe Acrobat Pro
      2. Browse to View > Tools > Send For Review > Open (see image 1)
      3. On the launched tool bar click on "Send for Shared Connecting" (see image 2)
      4. Next select option "Automatically Collect comments on my..." in dropdown and click 'Next' (see image 3)
      5. Select radiobutton "Network folder" and paste the input "\\Server\Directory" in text field and click 'Next' (see image 4)
      Experts, I would really appreciate a quick script which I can run and get rolling.
      Please note, I would not likwe to rely on MouseClick and/or cordinates match approach.
      PLEASE SUPPORT!!!!
      Makz
      **********************************************************************************************************
      Image 1

       
      Image 2

       
      Image 3

       
      Image 4

    • By jitendriya
      Hi every one .
      I want to read a pdf file and write into a excel using autoit , so how can i do this with out using third party server please tell me .
      Thank you..
    • By mLipok
      ; #INDEX# =======================================================================================================================
      ; Title .........: UDF for "Debenu Quick PDF Library"
      ; AutoIt Version : 3.3.10.2++
      ; Language ......: English
      ; Description ...: A collection of functions for Debenu Quick PDF Library
      ; Author(s) .....: mLipok
      ; Modified ......:
      ; ===============================================================================================================================
      Release note:
       
       
      Erratum v0.7:
       
      Forum link:
       
       
×
×
  • Create New...