Jump to content

Xdoc2txt, extracting text from advanced documment formats, using DLL


Fenzik
 Share

Recommended Posts

Hello!

i wrote this function as alternative to using the Com Object or Commandline version of this project, discussed also earlyer on this forum.

Project site - http://ebstudio.info/home/xdoc2txt.html

Advantage of this implementation is that you do not need to register Com dll, using regsvr32.

But you still need the project Dll (xd2txlib.dll).

Enjoy!

; #FUNCTION# ====================================================================================================================
; Name ..........: _ExtractText
; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...)
; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]])
; Parameters ....: $sFilename           - a string value.
;                  $bProperties         - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text.
;                  $hDll                - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call.
; Return value .: String, containing the text or documment properties or empty string and Error as follows:
;1 - The file does not exists.
;2 - Error during opening xd2txlib.dll.
;3 - No text returned.
; Author ........: Fenzik
; Modified ......:
; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ExtractText($sFilename, $bProperties = False, $hDll = 0)
If Not FileExists($sFilename) Then Return SetError(1, "", "")
Local $bLoaded = False
If $hDll = 0 Then
  $hDll = DllOpen(@scriptdir&"\xd2txlib.dll")
  If $hDll = -1 Then Return SetError(2, "", "")
$bLoaded = True
Endif
$aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "")
If $aResult[0] = 0 Then Return SetError(3, "", "")
If $bLoaded = True Then DllClose($hDll)
Return $aResult[3]
EndFunc

 

 

xd2txlib-example.zip

Edited by Fenzik
Attached the example instead of original project.
Link to comment
Share on other sites

Hi Fensik,

I made this small piece of code for testing and I got the following error: !>10:25:20 AutoIt3.exe ended.rc:-1073740940.

Would you please point me out what I'm doing wrong. Thanks in advance.

 

Global $bProperties, $hDll
Global $sMessage = "Hold down Ctrl or Shift to choose multiple files."
Global $sFileOpenDialog = FileOpenDialog($sMessage, @ScriptDir & "\", "Text files (*.docx;*.doc;*.rtf;*.wri;*.txt)|Excel (*.xls;*.xlsx)", BitOR($FD_FILEMUSTEXIST, $FD_MULTISELECT))
If @error Then
    MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
    Exit
Else
    FileChangeDir(@ScriptDir)
    $sFileOpenDialog = StringReplace($sFileOpenDialog, "|", @CRLF)
    MsgBox($MB_SYSTEMMODAL, "", "You chose the following files:" & @CRLF & $sFileOpenDialog)
EndIf

Global $FoundText = _ExtractText($sFileOpenDialog, $bProperties, $hDll) ; $bProperties = False --------------------------------
If Not @error Then ConsoleWrite($FoundText)
Exit

; #FUNCTION# ====================================================================================================================
; Name ..........: _ExtractText
; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...)
; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]])
; Parameters ....: $sFilename           - a string value.
;                  $bProperties         - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text.
;                  $hDll                - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call.
; Return value .: String, containing the text or documment properties or empty string and Error as follows:
;1 - The file does not exists.
;2 - Error during opening xd2txlib.dll.
;3 - No text returned.
; Author ........: Fenzik
; Modified ......:
; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ExtractText($sFilename, $bProperties = False, $hDll = 0)
    If Not FileExists($sFilename) Then Return SetError(1, "", "")
    Local $bLoaded = False
    If $hDll = 0 Then
        $hDll = DllOpen(@ScriptDir & "\xd2txlib.dll")
        If $hDll = -1 Then Return SetError(2, "", "")
        $bLoaded = True
    EndIf
    $aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "")
    If $aResult[0] = 0 Then Return SetError(3, "", "")
    If $bLoaded = True Then DllClose($hDll)
    Return $aResult[3]
EndFunc   ;==>_ExtractText

 

Link to comment
Share on other sites

Ok, i made few corrections and comments to your example.

 

So here it is..

#include <FileConstants.au3>
#include <msgboxconstants.au3>

Global $properties = False
Global $foundtext = ""
Global $sMessage = "Hold down Ctrl or Shift to choose multiple files."
Global $sFileOpenDialog = FileOpenDialog($sMessage, @ScriptDir & "\", "Text files (*.docx;*.doc;*.rtf;*.wri;*.txt)|Excel (*.xls;*.xlsx)", BitOR($FD_FILEMUSTEXIST, $FD_MULTISELECT))
If @error Then
  MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
  Exit
EndIf
;FileChangeDir(@ScriptDir)
;It's not necessary to change working dir here.
;Here we must know if user selected one or more files
If Not StringInStr($sFileOpenDialog, "|") Then
  ;Only one selected file
  MsgBox($MB_SYSTEMMODAL, "", "You chose the following file:" & @CRLF & $sFileOpenDialog) ;Only one selected file
  ;so lets convert it here
  $foundtext = _ExtractText($sFileOpenDialog)
  ;in this case the DLL is opened and closed during function call.
  ;Properties are False by default so it's not necessary to use it if you want to have it false.
  ;show the result
  If Not @error Then ConsoleWrite($foundtext)
Else
  $files = StringSplit($sFileOpenDialog, "|") ;Multiple files
  ;The path is in $files[1], so we have to put the path and filenames together
  $sFileOpenDialog = ""
  For $i = 1 To $files[0] - 1
    $sFileOpenDialog &= $files[1] & "\" & $files[$i + 1] & @CRLF
  Next
  ;so here you have full paths to selected files divided by @crlf and you can show them in msgbox
  MsgBox($MB_SYSTEMMODAL, "", "You chose the following files:" & @CRLF & $sFileOpenDialog)
  ;And here is the problem. You passed whole set of files, divided by @crlf, with path to the directory only at the first line..
  ;Solet them be converted And showed one by one And pass the handle of previously opened DLL.
  $hXd2tx = DllOpen(@ScriptDir & "\xd2txlib.dll")
  $files = StringSplit($sFileOpenDialog, @CRLF)
  For $i = 1 To UBound($files) - 1
    $foundtext = _ExtractText($files[$i], $properties, $hXd2tx) ; $bProperties = False --------------------------------
    If Not @error Then ConsoleWrite($foundtext)
  Next
;Close the DLL
DllClose($hxd2tx)
EndIf

; #FUNCTION# ====================================================================================================================
; Name ..........: _ExtractText
; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...)
; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]])
; Parameters ....: $sFilename           - a string value.
;                  $bProperties         - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text.
;                  $hDll                - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call.
; Return value .: String, containing the text or documment properties or empty string and Error as follows:
;1 - The file does not exists.
;2 - Error during opening xd2txlib.dll.
;3 - No text returned.
; Author ........: Fenzik
; Modified ......:
; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ExtractText($sFilename, $bProperties = False, $hDll = 0)
  If Not FileExists($sFilename) Then Return SetError(1, "", "")
  Local $bLoaded = False
  If $hDll = 0 Then
    $hDll = DllOpen(@ScriptDir & "\xd2txlib.dll")
    If $hDll = -1 Then Return SetError(2, "", "")
    $bLoaded = True
  EndIf
  $aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "")
  If $aResult[0] = 0 Then Return SetError(3, "", "")
  If $bLoaded = True Then DllClose($hDll)
  Return $aResult[3]
EndFunc   ;==>_ExtractText

 

Edited by Fenzik
Link to comment
Share on other sites

Strange!

are you using last version of Autoit and Scite?

I have both last versions and everithing runs OK.

So try to update to the last versions.

And do you have the file xd2txlib.dll in the @scriptdir?

Link to comment
Share on other sites

I've got last versions of:

-SciTE Version 3.6.0
-Autoit 3.3.14.5
-xdoc2txt 32bit (x86) version (Windows OS is 32bit / 64bit) in the scriptdir.

I also tested it with xdoc2txt 32bit (x86) and xdoc2txt 64bit (x64) version and  I also install  "Microsoft Visual C ++ 2010 Redistributable Package (x86)" and "Microsoft Visual C ++ 2010 Redistributable Package (x64)" to avoid any dependency issue.

But even though it keeps showing the same error: !>17:08:44 AutoIt3.exe ended.rc:-1073740940
+>17:08:44 AutoIt3Wrapper Finished.

 

Link to comment
Share on other sites

My friends can use this code (use COM - without registering Dll):

MsgBox(0, 'Result', _ExtractTextCOM("sample.docx", False))

Func _ExtractTextCOM($sFilename, $bProperties = False)
    Local $hXd2tx = DllOpen(@ScriptDir & "\xd2txcom" & (@AutoItX64 ? '_64' : '') & ".dll")
    If @error Then Return SetError(1, '', '')
    Local $oXd2tx = ObjCreate('{4ECE8E8A-BCC2-4709-BCAE-264210DF321B}', '{EB26F494-4E90-4432-9BA6-C6D9CDEE25C4}', $hXd2tx)
    Return $oXd2tx.ExtractText($sFilename, $bProperties)
EndFunc

 

Edited by moimon
Wrong Code
Link to comment
Share on other sites

@jcpetu:

Unfortunately no idea.

It works perfect on my side.

What about bit version of autoit.exe?

Dll is X86.

So it shoult be run using X86 version of Autoit or compiled as X86.

What about Scite settings? Don't you prefer X64 Autoit here?

Good Luck! On my side unfortunately no other idea to make it work on your enviroment..

Link to comment
Share on other sites

@jcpetu:
I think that i solved your trouble. :)
You probably don't have installed Microsoft Visual C++ 2010 Redistributable package for x86, which is necessary for whole Xdoc2Txt project.
............
So try to install it, for example from Here.
Then the x86 version of the DLL and script should work i hope.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...