KickStarter15

Extract Specific Paragraph(s) in Word Document.

20 posts in this topic

#1 ·  Posted (edited)

Hi Experts,

In my searching of reading doc file in forums, I found This:

However, this thread will read all the contents found in the document file and extract to text file. Now my question is, using the code given to that thread (which is very useful to me and has a great scripting made ^_^), is there a chance to get only the specific paragraph(s) from that document? example: i just want to get the first paragraph from that document and not all the content of a word file. would that be possible using this code:

Dim $file = FileOpenDialog("Choose .docx file", @DesktopDir, "Word docx file (*.docx)", 1)
If @error Then Exit

Dim $TxT = _ReadDocXContent($file)

MsgBox(0, "Extracted text", $TxT)
FileWriteLine(@ScriptDir & "\" & "Test.txt", $TxT)


Func _ReadDocXContent($ReadLocation)

    Local $extension = StringSplit($ReadLocation, ".", 1)
    $extension = $extension[$extension[0]]
    
    Local $hwnd = FileOpen($ReadLocation, 16)
    Local $header = FileRead($hwnd, 2)
    FileClose($hwnd)
    
    If $header <> '0x504B' Or $extension <> 'docx' Then Return SetError(1) ; not .docx file
    
    Local $Name, $UnZipName, $TempZipName

    Local $i, $f_name = "~TempDoc"
    Do
        $i += 1
        $Name = @TempDir & "\" & $f_name & $i & ".zip"
    Until Not FileExists($Name)

    FileCopy($ReadLocation, $Name, 9)

    Local $j
    Do
        $j += 1
        $UnZipName = @TempDir & "\~DocXdoc" & $j
    Until Not FileExists($UnZipName)

    DirCreate($UnZipName)

    Local $k
    Do
        $k += 1
        $TempZipName = @TempDir & "\Temporary Directory " & $k & " for " & $f_name & $i & ".zip"
    Until Not FileExists($TempZipName)

    Local $oApp = ObjCreate("Shell.Application")
    
    If Not IsObj($oApp) Then Return SetError(2) ; highly unlikely but could happen
    
    $oApp.NameSpace($UnZipName).CopyHere($oApp.NameSpace($Name & '\word' ).ParseName("document.xml"), 4)

    Local $Text = FileRead($UnZipName & "\document.xml")

    DirRemove($UnZipName, 1)
    FileDelete($Name)
    DirRemove($TempZipName, 1)

    $Text = StringReplace($Text, @CRLF, "")
    $Text = StringRegExpReplace($Text, "<w:body>(.*?)</w:body>", '$1', 0)
    $Text = StringReplace($Text, "</w:p>", @CRLF)
    $Text = StringReplace($Text, "<w:cr/>", @CRLF)
    $Text = StringReplace($Text, "<w:br/>", @CRLF)
    $Text = StringReplace($Text, "<w:tab/>", @TAB)

    $Text = StringRegExpReplace($Text, "<(.*?)>", "")
    
    $Text = StringReplace($Text, "&lt;", "<")
    $Text = StringReplace($Text, "&gt;", ">")
    $Text = StringReplace($Text, "&amp;", "&")

    $Text = StringReplace($Text, Chr(226) & Chr(130) & Chr(172), Chr(128))
    $Text = StringReplace($Text, Chr(194) & Chr(129), Chr(129))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(154), Chr(130))
    $Text = StringReplace($Text, Chr(198) & Chr(146), Chr(131))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(158), Chr(132))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(166), Chr(133))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(160), Chr(134))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(161), Chr(135))
    $Text = StringReplace($Text, Chr(203) & Chr(134), Chr(136))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(176), Chr(137))
    $Text = StringReplace($Text, Chr(197) & Chr(160), Chr(138))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(185), Chr(139))
    $Text = StringReplace($Text, Chr(197) & Chr(146), Chr(140))
    $Text = StringReplace($Text, Chr(194) & Chr(141), Chr(141))
    $Text = StringReplace($Text, Chr(197) & Chr(189), Chr(142))
    $Text = StringReplace($Text, Chr(194) & Chr(143), Chr(143))
    $Text = StringReplace($Text, Chr(194) & Chr(144), Chr(144))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(152), Chr(145))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(153), Chr(146))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(156), Chr(147))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(157), Chr(148))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(162), Chr(149))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(147), Chr(150))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(148), Chr(151))
    $Text = StringReplace($Text, Chr(203) & Chr(156), Chr(152))
    $Text = StringReplace($Text, Chr(226) & Chr(132) & Chr(162), Chr(153))
    $Text = StringReplace($Text, Chr(197) & Chr(161), Chr(154))
    $Text = StringReplace($Text, Chr(226) & Chr(128) & Chr(186), Chr(155))
    $Text = StringReplace($Text, Chr(197) & Chr(147), Chr(156))
    $Text = StringReplace($Text, Chr(194) & Chr(157), Chr(157))
    $Text = StringReplace($Text, Chr(197) & Chr(190), Chr(158))
    $Text = StringReplace($Text, Chr(197) & Chr(184), Chr(159))

    For $x = 160 To 191
        $Text = StringReplace($Text, Chr(194) & Chr($x), Chr($x))
    Next

    For $x = 192 To 255
        $Text = StringReplace($Text, Chr(195) & Chr($x - 64), Chr($x))
    Next

    Return $Text

EndFunc

sorry for asking this and re-post this old thread about 8 years-old already, but i found it helpful in my current project right now.:sweating:

Hope this is okay...

 

Thank you in advance guys!

KS15

Edited by KickStarter15

Share this post


Link to post
Share on other sites



#2 ·  Posted

If Word is installed on your computer you could use the Word UDF that comes with AutoIt.
Use _Word_DocRangeSet to set the start and end of the paragraph and then use the Copy method to copy the paragraph to the clipboard: https://msdn.microsoft.com/en-us/library/ff837718(v=office.14).aspx

1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Hi Water,

Thanks for your input but my version doesn't support "_Word_DocRangeSet", instead i tried using this:

#include <Word.au3>
$oWordApp = _WordCreate(@ScriptDir & "\Test.doc")
$oDoc = _WordDocGetCollection($oWordApp, 0)
ConsoleWrite("paragraphs - " & _WordDocPropertyGet($oDoc, "paragraphs") & @CRLF)

Example from help. But it doesn't as well get the range paragraph i need. It will gave me the whole count of paragraphs found in the doc file instead.

Now, when it tried this:

#include <Word.au3>

Local $oWordApp = _WordCreate(@ScriptDir & "\Test.doc")
Local $oDoc = _WordDocGetCollection($oWordApp, 0)
For $i = 1 To 30
    ConsoleWrite("Property Index " & $i & " - " & _WordDocPropertyGet($oDoc, $i) & @CR)
Next

It gives me an error "--> Word.au3 Error from function _WordDocPropertyGet, $_WordStatus_InvalidValue (The specified property is not supported.)". Not sure where to start having this confusing codes found. :sweating:

 

Edited by KickStarter15

Share this post


Link to post
Share on other sites

#4 ·  Posted

Upgrading to the latest version of AutoIt isn#t an option?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#5 ·  Posted

Here is an example, similar to @waters first link:

#include <File.au3>

Local $sTextFile = @ScriptDir & "\TextFile.txt"
Local $aWordFiles = _FileListToArrayRec(@ScriptDir, "*.docx", 1, 0, 0, 2)
Local $oWordDoc, $oWord = ObjCreate("Word.Application")
Local $hTextFile = FileOpen($sTextFile, 1)
For $i = 1 To $aWordFiles[0]
    $oWordDoc = $oWord.Documents.Open($aWordFiles[$i])
    $oWordDoc.Paragraphs(1).Range.Copy
    FileWrite($hTextFile, ClipGet() & @CRLF)
    $oWordDoc.Close()
Next
FileClose($hTextFile)

 

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

@water I can, if there is available to upgrade to the latest but in a while I only have this version. :D maybe next month they can provide me the version i need. Thanks anyway.

 

@Subz Is there a way that i can choose my own paragraph to flagged? or let's say based on my input of what paragraph should i search?

so far with my testing with your code. I have this error:

D:\Program\Test.au3 (6) : ==> Unknown function name.:
Local $aWordFiles = _FileListToArrayRec(@ScriptDir, "Test.doc", 1, 0, 0, 2)
Local $aWordFiles = ^ ERROR

I think "_FileListToArrayRec" is not working as well in my version, maybe?:>

I also tried fixing it but below is the error i've got.

D:\Program\Test.au3 (10) : ==> The requested action with this object has failed.:
$oWordDoc = $oWord.Documents.Open($aWordFiles[$i])
$oWordDoc = $oWord.Documents.Open($aWordFiles[$i])^ ERROR

Edited by KickStarter15

Share this post


Link to post
Share on other sites

#7 ·  Posted

Do you include the needed UDF for _FileListToArray?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#8 ·  Posted

Quote

Is there a way that i can choose my own paragraph to flagged? or let's say based on my input of what paragraph should i search?

Sure. Modify this line and replace "1" with a variable.

$oWordDoc.Paragraphs(1).Range.Copy

 


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#9 ·  Posted

Your version probably doesn't include _FileListToArrayRec.  Do you have _FileListToArray()?

Here's a simple function to either search Word and write the paragraph where the text is found or based upon an paragraph index.

PS: Sorry about the header, I'm useless when it comes to documentation :)

#include <File.au3>

Local $sTextFile = @ScriptDir & "\TextFile.txt"
Local $hTextFile = FileOpen($sTextFile, 1)
Local $aWordFiles = _FileListToArrayRec(@ScriptDir, "*.docx", 1, 0, 0, 2)
Global $oWord = ObjCreate("Word.Application")

_WordSearch("The", 1, 1) ;~ Writes the first paragraph instance with the word "The"
_WordSearch("Title", 1, 0) ;~ Writes all paragraphs with the word "Title" in it
_WordSearch("", 7) ;~ Writes the seventh paragraph

FileClose($hTextFile)


; #FUNCTION# ====================================================================================================================
; Name ..........: _WordSearch
; Description ...:
; Syntax ........: _WordSearch([$sSearch = ""[, $iParagraph = 1[, $iReturn = 1]]])
; Parameters ....: $sSearch             - [optional] Search Text value. Default is "".
;                  $iParagraph          - [optional] Paragraph Index. Default is 1.
;                  $iReturn             - [optional] Paragraph Result. Default is 1.
; Return values .: None
; Author ........: Subz
; Modified ......:
; Remarks .......: If $sSearch is not defined the function will write the paragraph based upon Index ($iParagraph)
; ...............: If the $iParagraph is less than the number of paragraphs in the document returns "".
; ...............: If $sSearch is defined and $iReturn = 0 then will write all paragraphs with the defined "Search Text"
; ...............: within the paragraph. If $iReturn = 1 will retun first instance.
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _WordSearch($sSearch = "", $iParagraph = 1, $iReturn = 1)
    Local $oWordDoc
    For $i = 1 To $aWordFiles[0]
        $oWordDoc = $oWord.Documents.Open($aWordFiles[$i])
        If $sSearch = "" Then
            If $oWordDoc.Paragraphs.Count > $iParagraph Then
                $vStartRange = $oWordDoc.Paragraphs($iParagraph).Range.Start
                $vEndRange = $oWordDoc.Paragraphs($iParagraph).Range.End
                $tRange = $oWordDoc.Range($vStartRange, $vEndRange)
                $tString = $tRange.Text
                If $tString <> "" Then FileWrite($hTextFile, $tString & @CRLF)
            EndIf
        Else
            For $j = 1 To $oWordDoc.Paragraphs.Count
                $iStartRange = $oWordDoc.Paragraphs($j).Range.Start
                $iEndRange = $oWordDoc.Paragraphs($j).Range.End
                $tRange = $oWordDoc.Range($iStartRange, $iEndRange)
                $tString = $tRange.Text
                $tRange.Find.Text = $sSearch
                $tRange.Find.Execute
                If $tRange.Find.Found Then FileWrite($hTextFile, $tString & @CRLF)
                If $iReturn = 1 Then ExitLoop
            Next
        EndIf
        $oWordDoc.Close()
    Next
    Return
EndFunc

 

1 person likes this

Share this post


Link to post
Share on other sites

#10 ·  Posted

On 1.4.2017 at 0:46 PM, KickStarter15 said:

but my version doesn't support "_Word_DocRangeSet",

Which version of AutoIt do we talk about?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#11 ·  Posted

2 hours ago, water said:

Do you include the needed UDF for _FileListToArray?

 

1 hour ago, water said:

Which version of AutoIt do we talk about?

@water Yes, the UDF was included for _FileListToArray. Apology with my it's version2.^_^

 

1 hour ago, Subz said:

Your version probably doesn't include _FileListToArrayRec.  Do you have _FileListToArray()?expandpopup

 

@Subz Yes, i only have "_FileListToArray()" but when i tried using this, it give's me the below error.

D:\Program\Test.au3 (38) : ==> The requested action with this object has failed.:
$oWordDoc = $oWord.Documents.Open($aWordFiles[$i])
$oWordDoc = $oWord.Documents.Open($aWordFiles[$i])^ ERROR

 

Peace guys, having hard time this time.:sweating:

Share this post


Link to post
Share on other sites

#12 ·  Posted

3 minutes ago, KickStarter15 said:

Apology with my it's version2.^_^

There is no version 2 available.
What do you get when you run

MsgBox(0, "Version", @AutoItVersion)

 


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#14 ·  Posted

11 minutes ago, water said:

There is no version 2 available.
What do you get when you run

MsgBox(0, "Version", @AutoItVersion)

 

Sorry guys, it's 3.3.8.1. my bad... :>

 

8 minutes ago, Subz said:

What does the following return?

#include <Array.au3>
#include <File.au3>

Local $aWordFiles = _FileListToArray(@ScriptDir, "*.docx", 1, True)
If @error Then MsgBox(48, "Error", "Error Id:" & @error)
_ArrayDisplay($aWordFiles)

 

@Subz got no error and a gui popup containing the existing document found at @ScriptDir.

Share this post


Link to post
Share on other sites

#16 ·  Posted

@Subz I really don't know what's wrong with this code why it flagged as error in my testing.:(

$oWordDoc = $oWord.Documents.Open($aWordFiles[$i])

D:\Program\Test.au3 (42) : ==> The requested action with this object has failed.:
$oWordDoc = $oWord.Documents.Open($aWordFiles[$i])
$oWordDoc = $oWord.Documents.Open($aWordFiles[$i])^ ERROR

I have word document installed "MSword2010" in my PC but still could not proceed with the code. It gives me headache:D.

23 hours ago, Subz said:

So can you replace

Local $aWordFiles = _FileListToArrayRec(@ScriptDir, "*.docx", 1, 0, 0, 2)

with

Local $aWordFiles = _FileListToArray(@ScriptDir, "*.docx", 1, True)

 

I did what you suggested above but it returns error.

D:\Program\Test.au3 (11) : ==> Incorrect number of parameters in function call.:
Local $aWordFiles = _FileListToArray(@ScriptDir, "Sample.docx", 1, True)
Local $aWordFiles = ^ ERROR

 

Share this post


Link to post
Share on other sites

#17 ·  Posted

Can you post your whole script + can you post from your help file:

  • _WordCreate syntax
  • _FileListToArray syntax

The issue you're having is that I'm using the latest version of AutoIT and so most of the functions have been updated since your release so the functions I'm writing were meant for the current version.

 

Share this post


Link to post
Share on other sites

#18 ·  Posted

@Subz and @water,

Guys, I need your help, where can i get the latest version of Autoit that is in zip package so that i can download it. I need to have it in zip so that the admin rights won't block my downloading progress.:> 

I'm really out of no where here in my old version used. lots of adjustments from latest converting it old version.

BTW: is that allowed to ask for latest version in our forum? i did not see it in forum rules.:D

Share this post


Link to post
Share on other sites

#20 ·  Posted (edited)

@Floops Thanks, man for sharing this to me. Appreciated your help.:D

@water The _Word_DocRangeSet works in the provided latest version. Thank you so much for your help.

@Subz I've encountered no errors with the latest version and your code works as expected. I just copied and paste it to the latest version and it really rocks man, it did the trick.:sweating: Though, it took a little bit more time reading the document but i can manage that. Thank you!

 

Guys, thank you so much for your countless help. Now i can start real coding and it gives me more excitement to explore, read basics and learn more about autoit. :drool:

 

Thanks,

KS15

 

Edited by KickStarter15

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now