Jump to content

Xdoc2txt, extracting text from advanced documment formats, using DLL


Recommended Posts

Hello!

i wrote this function as alternative to using the Com Object or Commandline version of this project, discussed also earlyer on this forum.

Project site - http://ebstudio.info/home/xdoc2txt.html

Advantage of this implementation is that you do not need to register Com dll, using regsvr32.

But you still need the project Dll (xd2txlib.dll).

Enjoy!

; #FUNCTION# ====================================================================================================================
; Name ..........: _ExtractText
; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...)
; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]])
; Parameters ....: $sFilename           - a string value.
;                  $bProperties         - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text.
;                  $hDll                - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call.
; Return value .: String, containing the text or documment properties or empty string and Error as follows:
;1 - The file does not exists.
;2 - Error during opening xd2txlib.dll.
;3 - No text returned.
; Author ........: Fenzik
; Modified ......:
; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ExtractText($sFilename, $bProperties = False, $hDll = 0)
If Not FileExists($sFilename) Then Return SetError(1, "", "")
Local $bLoaded = False
If $hDll = 0 Then
  $hDll = DllOpen(@scriptdir&"\xd2txlib.dll")
  If $hDll = -1 Then Return SetError(2, "", "")
$bLoaded = True
Endif
$aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "")
If $aResult[0] = 0 Then Return SetError(3, "", "")
If $bLoaded = True Then DllClose($hDll)
Return $aResult[3]
EndFunc

 

 

xd2txlib-example.zip

Edited by Fenzik
Attached the example instead of original project.
Link to post
Share on other sites

Hi Fensik,

I made this small piece of code for testing and I got the following error: !>10:25:20 AutoIt3.exe ended.rc:-1073740940.

Would you please point me out what I'm doing wrong. Thanks in advance.

 

Global $bProperties, $hDll
Global $sMessage = "Hold down Ctrl or Shift to choose multiple files."
Global $sFileOpenDialog = FileOpenDialog($sMessage, @ScriptDir & "\", "Text files (*.docx;*.doc;*.rtf;*.wri;*.txt)|Excel (*.xls;*.xlsx)", BitOR($FD_FILEMUSTEXIST, $FD_MULTISELECT))
If @error Then
    MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
    Exit
Else
    FileChangeDir(@ScriptDir)
    $sFileOpenDialog = StringReplace($sFileOpenDialog, "|", @CRLF)
    MsgBox($MB_SYSTEMMODAL, "", "You chose the following files:" & @CRLF & $sFileOpenDialog)
EndIf

Global $FoundText = _ExtractText($sFileOpenDialog, $bProperties, $hDll) ; $bProperties = False --------------------------------
If Not @error Then ConsoleWrite($FoundText)
Exit

; #FUNCTION# ====================================================================================================================
; Name ..........: _ExtractText
; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...)
; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]])
; Parameters ....: $sFilename           - a string value.
;                  $bProperties         - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text.
;                  $hDll                - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call.
; Return value .: String, containing the text or documment properties or empty string and Error as follows:
;1 - The file does not exists.
;2 - Error during opening xd2txlib.dll.
;3 - No text returned.
; Author ........: Fenzik
; Modified ......:
; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ExtractText($sFilename, $bProperties = False, $hDll = 0)
    If Not FileExists($sFilename) Then Return SetError(1, "", "")
    Local $bLoaded = False
    If $hDll = 0 Then
        $hDll = DllOpen(@ScriptDir & "\xd2txlib.dll")
        If $hDll = -1 Then Return SetError(2, "", "")
        $bLoaded = True
    EndIf
    $aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "")
    If $aResult[0] = 0 Then Return SetError(3, "", "")
    If $bLoaded = True Then DllClose($hDll)
    Return $aResult[3]
EndFunc   ;==>_ExtractText

 

Link to post
Share on other sites

Ok, i made few corrections and comments to your example.

 

So here it is..

#include <FileConstants.au3>
#include <msgboxconstants.au3>

Global $properties = False
Global $foundtext = ""
Global $sMessage = "Hold down Ctrl or Shift to choose multiple files."
Global $sFileOpenDialog = FileOpenDialog($sMessage, @ScriptDir & "\", "Text files (*.docx;*.doc;*.rtf;*.wri;*.txt)|Excel (*.xls;*.xlsx)", BitOR($FD_FILEMUSTEXIST, $FD_MULTISELECT))
If @error Then
  MsgBox($MB_SYSTEMMODAL, "", "No file(s) were selected.")
  Exit
EndIf
;FileChangeDir(@ScriptDir)
;It's not necessary to change working dir here.
;Here we must know if user selected one or more files
If Not StringInStr($sFileOpenDialog, "|") Then
  ;Only one selected file
  MsgBox($MB_SYSTEMMODAL, "", "You chose the following file:" & @CRLF & $sFileOpenDialog) ;Only one selected file
  ;so lets convert it here
  $foundtext = _ExtractText($sFileOpenDialog)
  ;in this case the DLL is opened and closed during function call.
  ;Properties are False by default so it's not necessary to use it if you want to have it false.
  ;show the result
  If Not @error Then ConsoleWrite($foundtext)
Else
  $files = StringSplit($sFileOpenDialog, "|") ;Multiple files
  ;The path is in $files[1], so we have to put the path and filenames together
  $sFileOpenDialog = ""
  For $i = 1 To $files[0] - 1
    $sFileOpenDialog &= $files[1] & "\" & $files[$i + 1] & @CRLF
  Next
  ;so here you have full paths to selected files divided by @crlf and you can show them in msgbox
  MsgBox($MB_SYSTEMMODAL, "", "You chose the following files:" & @CRLF & $sFileOpenDialog)
  ;And here is the problem. You passed whole set of files, divided by @crlf, with path to the directory only at the first line..
  ;Solet them be converted And showed one by one And pass the handle of previously opened DLL.
  $hXd2tx = DllOpen(@ScriptDir & "\xd2txlib.dll")
  $files = StringSplit($sFileOpenDialog, @CRLF)
  For $i = 1 To UBound($files) - 1
    $foundtext = _ExtractText($files[$i], $properties, $hXd2tx) ; $bProperties = False --------------------------------
    If Not @error Then ConsoleWrite($foundtext)
  Next
;Close the DLL
DllClose($hxd2tx)
EndIf

; #FUNCTION# ====================================================================================================================
; Name ..........: _ExtractText
; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...)
; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]])
; Parameters ....: $sFilename           - a string value.
;                  $bProperties         - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text.
;                  $hDll                - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call.
; Return value .: String, containing the text or documment properties or empty string and Error as follows:
;1 - The file does not exists.
;2 - Error during opening xd2txlib.dll.
;3 - No text returned.
; Author ........: Fenzik
; Modified ......:
; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _ExtractText($sFilename, $bProperties = False, $hDll = 0)
  If Not FileExists($sFilename) Then Return SetError(1, "", "")
  Local $bLoaded = False
  If $hDll = 0 Then
    $hDll = DllOpen(@ScriptDir & "\xd2txlib.dll")
    If $hDll = -1 Then Return SetError(2, "", "")
    $bLoaded = True
  EndIf
  $aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "")
  If $aResult[0] = 0 Then Return SetError(3, "", "")
  If $bLoaded = True Then DllClose($hDll)
  Return $aResult[3]
EndFunc   ;==>_ExtractText

 

Edited by Fenzik
Link to post
Share on other sites

I've got last versions of:

-SciTE Version 3.6.0
-Autoit 3.3.14.5
-xdoc2txt 32bit (x86) version (Windows OS is 32bit / 64bit) in the scriptdir.

I also tested it with xdoc2txt 32bit (x86) and xdoc2txt 64bit (x64) version and  I also install  "Microsoft Visual C ++ 2010 Redistributable Package (x86)" and "Microsoft Visual C ++ 2010 Redistributable Package (x64)" to avoid any dependency issue.

But even though it keeps showing the same error: !>17:08:44 AutoIt3.exe ended.rc:-1073740940
+>17:08:44 AutoIt3Wrapper Finished.

 

Link to post
Share on other sites

My friends can use this code (use COM - without registering Dll):

MsgBox(0, 'Result', _ExtractTextCOM("sample.docx", False))

Func _ExtractTextCOM($sFilename, $bProperties = False)
    Local $hXd2tx = DllOpen(@ScriptDir & "\xd2txcom" & (@AutoItX64 ? '_64' : '') & ".dll")
    If @error Then Return SetError(1, '', '')
    Local $oXd2tx = ObjCreate('{4ECE8E8A-BCC2-4709-BCAE-264210DF321B}', '{EB26F494-4E90-4432-9BA6-C6D9CDEE25C4}', $hXd2tx)
    Return $oXd2tx.ExtractText($sFilename, $bProperties)
EndFunc

 

Edited by moimon
Wrong Code
Link to post
Share on other sites

@jcpetu:

Unfortunately no idea.

It works perfect on my side.

What about bit version of autoit.exe?

Dll is X86.

So it shoult be run using X86 version of Autoit or compiled as X86.

What about Scite settings? Don't you prefer X64 Autoit here?

Good Luck! On my side unfortunately no other idea to make it work on your enviroment..

Link to post
Share on other sites

@jcpetu:
I think that i solved your trouble. :)
You probably don't have installed Microsoft Visual C++ 2010 Redistributable package for x86, which is necessary for whole Xdoc2Txt project.
............
So try to install it, for example from Here.
Then the x86 version of the DLL and script should work i hope.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By SkysLastChance
      I am having trouble finding a good way to click these "button" below. 

      I only need to be able to click them when they have both yes/no. Otherwise I don't have to worry about them. For instance if they looked like this I would NOT have worry about clicking them and can just ignore them all togheter.(Below Picture)

      The problem is as mentioned in the title, all of the ID's  are dynamic. (Classes too)

      Here is what it looks like if yes is already selected.

      This is what I was using to select the the button. However, I need to know if the button has already been clicked/selected or not.
      _WD_LoadWait($sSession) $sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//span[text() = 'Offered access to electronic health information?']") Sleep(1000) _WD_ElementAction($sSession, $sElement, 'click') Sleep(500) _WD_Action($sSession, "actions", $sActionTab) Sleep(500) _WD_Action($sSession, "actions", $sActionEnter) Is there a way I can get the count of spans in the span class-"s_636" by tabbing over to the button? I am hoping someone might have some ideas on what I can try.
      Unfortunally, The site is for work so giving the site wont do any good. 
    • By goku200
      I'm having an issue with my html paginated table. The script work as expected. It reads the html table and clicks on the Download button. However when it clicks on the next page its not iterating the items. instead it goes to the next URL from the spreadsheet and then iterates through the html table clicking the Download button and so on. Not sure why its doing that. I want it to click the next page and then continue iterating then after it has reached the end of the pagination go to the next url in the spreadsheet and repeat the process. Below is my script. Any help is appreciated 🙂
       
       
    • By VinMe
      Dear all, 
      I am working on automation of ms word, on which my scripts runs with multiple word document at a time. But I wanted all the word to be running in the background.
      do we have any UDF available? Please help.
      BR,
      VinMe
       
    • By walec
      Hello
      How can I export a sheet to pdf using the OOoCalc.au3 UDF?
      Thank you for any hints or possibly other solutions / functions.
    • By mLipok
      ; #INDEX# =======================================================================================================================
      ; Title .........: UDF for "Debenu Quick PDF Library"
      ; AutoIt Version : 3.3.10.2++
      ; Language ......: English
      ; Description ...: A collection of functions for Debenu Quick PDF Library
      ; Author(s) .....: mLipok
      ; Modified ......:
      ; ===============================================================================================================================
      Release note:
       
       
      Erratum v0.7:
       
      Forum link:
       
       
×
×
  • Create New...