Jump to content

Xdoc2txt, extracting text from advanced documment formats, using DLL


Recommended Posts

Hello!

i wrote this function as alternative to using the Com Object or Commandline version of this project, discussed also earlyer on this forum.

Project site - http://ebstudio.info/home/xdoc2txt.html

Advantage of this implementation is that you do not need to register Com dll, using regsvr32.

But you still need the project Dll (xd2txlib.dll).

Enjoy!

; #FUNCTION# ====================================================================================================================
; Name ..........: _ExtractText
; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...)
; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]])
; Parameters ....: $sFilename           - a string value.
;                  $bProperties         - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text.
;                  $hDll                - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call.
; Return value .: String, containing the text or documment properties or empty string and Error as follows:
;1 - The file does not exists.
;2 - Error during opening xd2txlib.dll.
;3 - No text returned.
; Author ........: Fenzik
; Modified ......:
; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ExtractText($sFilename, $bProperties = False, $hDll = 0)
If Not FileExists($sFilename) Then Return SetError(1, "", "")
Local $bLoaded = False
If $hDll = 0 Then
  $hDll = DllOpen(@scriptdir&"\xd2txlib.dll")
  If $hDll = -1 Then Return SetError(2, "", "")
$bLoaded = True
Endif
$aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "")
If $aResult[0] = 0 Then Return SetError(3, "", "")
If $bLoaded = True Then DllClose($hDll)
Return $aResult[3]
EndFunc

 

 

xd2txlib-example.zip

Edited by Fenzik
Attached the example instead of original project.
Link to post
Share on other sites

Hi Fensik,

I made this small piece of code for testing and I got the following error: !>10:25:20 AutoIt3.exe ended.rc:-1073740940.

Would you please point me out what I'm doing wrong. Thanks in advance.

 

Global $bProperties, $hDll
Global $sMessage = "Hold down Ctrl or Shift to choose multiple files."
Global $sFileOpenDialog = FileOpenDialog($sMessage, @ScriptDir & "\", "Text files (*.docx;*.doc;*.rtf;*.wri;*.txt)|Excel (*.xls;*.xlsx)", BitOR($FD_FILEMUSTEXIST, $FD_MULTISELECT))
If @error Then
    MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
    Exit
Else
    FileChangeDir(@ScriptDir)
    $sFileOpenDialog = StringReplace($sFileOpenDialog, "|", @CRLF)
    MsgBox($MB_SYSTEMMODAL, "", "You chose the following files:" & @CRLF & $sFileOpenDialog)
EndIf

Global $FoundText = _ExtractText($sFileOpenDialog, $bProperties, $hDll) ; $bProperties = False --------------------------------
If Not @error Then ConsoleWrite($FoundText)
Exit

; #FUNCTION# ====================================================================================================================
; Name ..........: _ExtractText
; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...)
; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]])
; Parameters ....: $sFilename           - a string value.
;                  $bProperties         - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text.
;                  $hDll                - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call.
; Return value .: String, containing the text or documment properties or empty string and Error as follows:
;1 - The file does not exists.
;2 - Error during opening xd2txlib.dll.
;3 - No text returned.
; Author ........: Fenzik
; Modified ......:
; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ExtractText($sFilename, $bProperties = False, $hDll = 0)
    If Not FileExists($sFilename) Then Return SetError(1, "", "")
    Local $bLoaded = False
    If $hDll = 0 Then
        $hDll = DllOpen(@ScriptDir & "\xd2txlib.dll")
        If $hDll = -1 Then Return SetError(2, "", "")
        $bLoaded = True
    EndIf
    $aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "")
    If $aResult[0] = 0 Then Return SetError(3, "", "")
    If $bLoaded = True Then DllClose($hDll)
    Return $aResult[3]
EndFunc   ;==>_ExtractText

 

Link to post
Share on other sites

Ok, i made few corrections and comments to your example.

 

So here it is..

#include <FileConstants.au3>
#include <msgboxconstants.au3>

Global $properties = False
Global $foundtext = ""
Global $sMessage = "Hold down Ctrl or Shift to choose multiple files."
Global $sFileOpenDialog = FileOpenDialog($sMessage, @ScriptDir & "\", "Text files (*.docx;*.doc;*.rtf;*.wri;*.txt)|Excel (*.xls;*.xlsx)", BitOR($FD_FILEMUSTEXIST, $FD_MULTISELECT))
If @error Then
  MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
  Exit
EndIf
;FileChangeDir(@ScriptDir)
;It's not necessary to change working dir here.
;Here we must know if user selected one or more files
If Not StringInStr($sFileOpenDialog, "|") Then
  ;Only one selected file
  MsgBox($MB_SYSTEMMODAL, "", "You chose the following file:" & @CRLF & $sFileOpenDialog) ;Only one selected file
  ;so lets convert it here
  $foundtext = _ExtractText($sFileOpenDialog)
  ;in this case the DLL is opened and closed during function call.
  ;Properties are False by default so it's not necessary to use it if you want to have it false.
  ;show the result
  If Not @error Then ConsoleWrite($foundtext)
Else
  $files = StringSplit($sFileOpenDialog, "|") ;Multiple files
  ;The path is in $files[1], so we have to put the path and filenames together
  $sFileOpenDialog = ""
  For $i = 1 To $files[0] - 1
    $sFileOpenDialog &= $files[1] & "\" & $files[$i + 1] & @CRLF
  Next
  ;so here you have full paths to selected files divided by @crlf and you can show them in msgbox
  MsgBox($MB_SYSTEMMODAL, "", "You chose the following files:" & @CRLF & $sFileOpenDialog)
  ;And here is the problem. You passed whole set of files, divided by @crlf, with path to the directory only at the first line..
  ;Solet them be converted And showed one by one And pass the handle of previously opened DLL.
  $hXd2tx = DllOpen(@ScriptDir & "\xd2txlib.dll")
  $files = StringSplit($sFileOpenDialog, @CRLF)
  For $i = 1 To UBound($files) - 1
    $foundtext = _ExtractText($files[$i], $properties, $hXd2tx) ; $bProperties = False --------------------------------
    If Not @error Then ConsoleWrite($foundtext)
  Next
;Close the DLL
DllClose($hxd2tx)
EndIf

; #FUNCTION# ====================================================================================================================
; Name ..........: _ExtractText
; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...)
; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]])
; Parameters ....: $sFilename           - a string value.
;                  $bProperties         - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text.
;                  $hDll                - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call.
; Return value .: String, containing the text or documment properties or empty string and Error as follows:
;1 - The file does not exists.
;2 - Error during opening xd2txlib.dll.
;3 - No text returned.
; Author ........: Fenzik
; Modified ......:
; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ExtractText($sFilename, $bProperties = False, $hDll = 0)
  If Not FileExists($sFilename) Then Return SetError(1, "", "")
  Local $bLoaded = False
  If $hDll = 0 Then
    $hDll = DllOpen(@ScriptDir & "\xd2txlib.dll")
    If $hDll = -1 Then Return SetError(2, "", "")
    $bLoaded = True
  EndIf
  $aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "")
  If $aResult[0] = 0 Then Return SetError(3, "", "")
  If $bLoaded = True Then DllClose($hDll)
  Return $aResult[3]
EndFunc   ;==>_ExtractText

 

Edited by Fenzik
Link to post
Share on other sites

I've got last versions of:

-SciTE Version 3.6.0
-Autoit 3.3.14.5
-xdoc2txt 32bit (x86) version (Windows OS is 32bit / 64bit) in the scriptdir.

I also tested it with xdoc2txt 32bit (x86) and xdoc2txt 64bit (x64) version and  I also install  "Microsoft Visual C ++ 2010 Redistributable Package (x86)" and "Microsoft Visual C ++ 2010 Redistributable Package (x64)" to avoid any dependency issue.

But even though it keeps showing the same error: !>17:08:44 AutoIt3.exe ended.rc:-1073740940
+>17:08:44 AutoIt3Wrapper Finished.

 

Link to post
Share on other sites

My friends can use this code (use COM - without registering Dll):

MsgBox(0, 'Result', _ExtractTextCOM("sample.docx", False))

Func _ExtractTextCOM($sFilename, $bProperties = False)
    Local $hXd2tx = DllOpen(@ScriptDir & "\xd2txcom" & (@AutoItX64 ? '_64' : '') & ".dll")
    If @error Then Return SetError(1, '', '')
    Local $oXd2tx = ObjCreate('{4ECE8E8A-BCC2-4709-BCAE-264210DF321B}', '{EB26F494-4E90-4432-9BA6-C6D9CDEE25C4}', $hXd2tx)
    Return $oXd2tx.ExtractText($sFilename, $bProperties)
EndFunc

 

Edited by moimon
Wrong Code
Link to post
Share on other sites

@jcpetu:

Unfortunately no idea.

It works perfect on my side.

What about bit version of autoit.exe?

Dll is X86.

So it shoult be run using X86 version of Autoit or compiled as X86.

What about Scite settings? Don't you prefer X64 Autoit here?

Good Luck! On my side unfortunately no other idea to make it work on your enviroment..

Link to post
Share on other sites

@jcpetu:
I think that i solved your trouble. :)
You probably don't have installed Microsoft Visual C++ 2010 Redistributable package for x86, which is necessary for whole Xdoc2Txt project.
............
So try to install it, for example from Here.
Then the x86 version of the DLL and script should work i hope.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By walec
      Hello
      How can I export a sheet to pdf using the OOoCalc.au3 UDF?
      Thank you for any hints or possibly other solutions / functions.
    • By mLipok
      ; #INDEX# =======================================================================================================================
      ; Title .........: UDF for "Debenu Quick PDF Library"
      ; AutoIt Version : 3.3.10.2++
      ; Language ......: English
      ; Description ...: A collection of functions for Debenu Quick PDF Library
      ; Author(s) .....: mLipok
      ; Modified ......:
      ; ===============================================================================================================================
      Release note:
       
       
      Erratum v0.7:
       
      Forum link:
       
       
    • By mLipok
      I would like to present, the UDF for Debenu Quick PDF Library
      Getting Started:
      http://www.debenu.com/products/development/debenu-pdf-library/getting-started/getting-started-activex-edition/
      Function Reference:
      http://www.debenu.com/docs/pdf_library_reference/FunctionGroups.php
      Useful information:
      http://www.quickpdf.org/forum/quickpdf-the-first-steps_topic1242.html
      http://www.quickpdflibrary.com/blog/2011/02/the-basics-getting-familiar-with-quick-pdf-library/
      First time installing Debenu Quick PDF Library:
      You can download demo here:
      http://www.quickpdflibrary.com/downloads/quick_pdf_library_demo.exe
      Requirements:
      file "DebenuPDFLibraryLite1012.dll" or "DebenuPDFLibraryAX1014.dll" must be in @ScriptDir

      if you want to try you must download it from here:
      http://www.debenu.com/products/development/debenu-pdf-library/trial/
      or
      http://www.debenu.com/products/development/debenu-pdf-library-lite/free/
      after install you can find this dll in:
      "c:\Program Files (x86)\DebenuPDF Library\ActiveX\DebenuPDFLibraryAX1014.dll"
      or
      "c:\Program Files (x86)\DebenuPDF Library\Lite\DebenuPDFLibraryLite1012.dll"

      !!! you can change DLL to newer version using _QPDF_NewLibraryPath_Commercial()

      EDIT: actualy (19-06-2015) there is DebenuPDFLibraryAX1115.dll and DebenuPDFLibraryLite1115.dll
        to use the commercial version you need to use the _QPdf_SetLicenseKey() AutoIt 3.3.10.2++ For UDF and examples download, and the current version information please go to download section:

       
      The following information are outdated
    • By mLipok
      This is a UDF for use Debenu PDF Viewer SDK - ActiveX component.
      You can read more about this ActiveX component here:
      http://www.debenu.com/products/development/debenu-pdf-viewer-sdk/
       
      v 0.2
       
      2015/05/18   v0.3
       
    • By seadoggie01
      This UDF is because I'm tired of trying to use UI Automation and Send to automate Adobe Acrobat. I often need to read the contents of PDFs and Acrobat is not easy to work with as a window.  The functions are based on the API Reference from Adobe located here.
      Acrobat Pro is required for all functions.
      It's very beta right now, but it still seems to work. Currently, the functions are based around page level manipulation of PDF documents: re-arranging, swapping, deleting, and moving pages as this is what I use the most.
      Please feel free to request/suggest features!
       
       
×
×
  • Create New...