Jump to content
anthonyjr2

Finding a string in a Word Document

Recommended Posts

anthonyjr2

I'm using the Word UDF for the first time, and I'm having some trouble with _Word_DocFind(). There isn't really much talk around the forums about this so it's hard to find any support on the issue I'm having. Here's my code:

#include <Word.au3>

$listPath = @ScriptDir & "\AMCH OFFSET 042617.docx"

$pWord = _Word_Create()
$oWord = _Word_DocOpen($pWord, $listPath)

Local $ctr = 0
Local $searchRange = _Word_DocFind($oWord, "Claim Number")
If Not @error Then
    $ctr += 1
EndIf
While ($searchRange <> 0)
    $searchRange = _Word_DocFind($oWord, "Claim Number", 0, $searchRange)
    If Not @error Then
        $ctr += 1
    EndIf
    $searchRange.Select
WEnd

My problem is that it doesn't seem to find a match of the string on any page after the second. When I run the script, it just loops indefinitely on the second page. I can't post an example of the word document because it is medical data, but every page is basically the same and every page has the string I am looking for on it. Also I tried checking @error after doing a find and it is never set, so I don't think that's the problem.


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites
Subz

The following works for me:

#include <Word.au3>
$sListPath = @ScriptDir & "\AMCH OFFSET 042617.docx"

$oWord = _Word_Create()
$oDoc = _Word_DocOpen($oWord, $sListPath)

Local $vSearchRange = 0, $iSearchRange = 0
While 1
    $vSearchRange = $vSearchRange = 0 ? _Word_DocFind($oDoc, "Claim Number") : _Word_DocFind($oDoc, "Claim Number", 0, $vSearchRange)
    If @error Then ExitLoop
    $iSearchRange += 1
    $vSearchRange.Select
WEnd

 

Share this post


Link to post
Share on other sites
water

You do not process the first hit after the first _Word_DocFind. You need something like this (my example file contains: Claim Number x")

#include <Word.au3>

Global $iCounter = 0
Global $sFilePath = @ScriptDir & "\Test.docx"
Global $oWord = _Word_Create()
Global $oDoc = _Word_DocOpen($oWord, $sFilePath)
Global $oRangeFound = _Word_DocFind($oDoc, "Claim Number")
If Not @error Then
    While 1
        $oRangeLine = _Word_DocRangeSet($oDoc, $oRangeFound, Default, Default, $wdCharacter, 2) ; Extend the range 2 characters to the right 
        ConsoleWrite($oRangeLine.Text & @CRLF) ; Write the found text to the console
        $iCounter += 1
        $oRangeFound = _Word_DocFind($oDoc, "Claim Number", 0, $oRangeFound)
        If @error Then ExitLoop
    WEnd
EndIf
ConsoleWrite($iCounter & " matches found!" & @CRLF)

 


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-06-01 - Version 1.4.9.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-01-27 - Version 1.3.3.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
anthonyjr2

Hey guys,

Thanks for the suggestions, but I still have the same issue. I tried both examples and they still don't progress past the second page of the document. I wasn't sure if it was something weird with the document but doing a CTRL + F in the document finds all occurrences of the string. I'll try creating a new document again to see if it works better.


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites
water

Can you please post your test document so we can play with it?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-06-01 - Version 1.4.9.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-01-27 - Version 1.3.3.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
anthonyjr2

I PMed you @water


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites
Broadcastic

So did u find solution? I also keep finding same stuff and it does not move to the next

Share this post


Link to post
Share on other sites
Subz

Can you post your document and the word your searching for?

Share this post


Link to post
Share on other sites
Broadcastic

I search all occurrence of word "History" in an ebook. I cant post ebook coz its copyrighted. the loop keeps finding the first word and I pass back original range object. The DocFind correctly calculates Start of next search but the new DocFind iteration keeps finding the first word only, Not sure if something changed in Office 2016 or Win10?

1094-1101
TEST1101 1199965ForwardTrue
1094-1101
TEST1101 1199965ForwardTrue
1094-1101
 

 

Share this post


Link to post
Share on other sites
water

A simple reproducer document would be fine to play with.
There is another thread with the sam problm but th document is very complex.
I'm still running Office 2010 but in the near future will be able to test with Office 2016.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-06-01 - Version 1.4.9.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-01-27 - Version 1.3.3.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
anthonyjr2

Me and water have discussed what the problem possibly is, but he hasn't had time yet to take a look into it. It may be due to how the page breaks are laid out. For the record I'm using Office 2007, which means the issue happens on 2007, 2010, and 2016.

Edited by anthonyjr2

UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites
water

Right now I'm sitting in front of my Windows PC and try to find out what causes the problem.
I tested:

  • Section break (next page) - works as expected (finds all occurrences)
  • Section break (continuous)  - works as expected (finds all occurrences)
  • Column break  - works as expected (finds all occurrences)
  • To be continued

 

Edited by water
  • Like 1

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-06-01 - Version 1.4.9.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-01-27 - Version 1.3.3.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
water

I removed all formatting by copying all content to an empty document and now it works as expected. The function finds all occurrences.
So it is not the function but the formatting that causes the problem.

I have now been playing with the document for more than an hour - to no avail.
In my opinion this is the "strangest" document I have ever layed my eyes upon.

Any change to clean up the formatting of the document? Means it doesn't use table borders but overlays the table with a picture containing a grid etc. etc.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-06-01 - Version 1.4.9.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-01-27 - Version 1.3.3.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
anthonyjr2

To be honest the formatting of the document doesn't matter to me at all. The original document is a PDF file that I actually convert using Acrobat DC to a docx, since I couldn't find another easy way to do so. All I really need to do is grab some information such as the claim number from that document, is that still possible if the formatting is removed?


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites
water

You could write the text content of the document to a flat text file and then process this file:

#include <Word.au3>
#include <MsgBoxConstants.au3>
Global $oWord = _Word_Create()
Global $oDoc = _Word_DocOpen($oWord, @ScriptDir & "\TestSearchFullCompatibilityMode.docx")
Global $sText = $oDoc.Content.Text
FileWrite("C:\temp\tt.txt", $sText)

 


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-06-01 - Version 1.4.9.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-01-27 - Version 1.3.3.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
anthonyjr2

That's what I originally tried to do, but it ends up writing things out in weird locations which makes it hard to structure my find correctly. I figured if the document had some structure such as a word document it would be easier to keep track of things because it is separated by pages.


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites
SkysLastChance

I have monarch with the PDF. I can get it in to excel for you if that is any help. :/


Life's simple. You make choices and you don't look back.

Share this post


Link to post
Share on other sites
SkysLastChance

<double post>

Edited by SkysLastChance

Life's simple. You make choices and you don't look back.

Share this post


Link to post
Share on other sites
anthonyjr2
2 minutes ago, SkysLastChance said:

I can get it in to excel for you if that is any help

Nah, I'm not really trying to get it into Excel, all I need to do is grab some specific information from certain spots.


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • Subz
      By Subz
      Backstory:
      Our Microsoft Office Templates shared folder was changed from a DFS share to an Isilon share. example:
      Old Server: \\Domain.com\Office\Templates
      New Server: \\Templates.domain.com\Office\Templates
      The team making the changes overlooked that several hundred thousand documents, had been attached to the old template documents.  So when you open a document which has been attached, it will take a couple of minutes to open, while it tries to locate the old server path.  I've been asked to come in and fix it, so after several hours found that the data is being held in document.zip\word\_rels\settings.xml.rels, I now need to replace the old server path with the new server path.  I didn't want to use dom as that would take too long and found a tool wtc https://github.com/NeosIT/wtc which  works perfectly, takes about 8 minutes to scan a single directory with 4000 documents and fix them.  The problem is the documents are all held on sharepoint and they want to retain the file timestamp, which is easy enough, but they also don't want to keep the "Modified By" apparently they don't like seeing all the documents appearing as "Modified by: Subz"  Anyone know of way to retain the "Modified By" info,
    • FrancescoDiMuro
      By FrancescoDiMuro
      Good evening everyone
      I am working with Word UDF ( thanks @water! ), and, especially, with the function _Word_DocFindReplace().
      The replace does work everywhere in the document, but, it does not work in Headers or Footers.
      Am I missing something or am I forced to use the code below?
      I have already looked in the Help file ( about _Word_DocFindReplace() ), but there are no mentions about replace text in Headers/Footers.
      Sub FindAndReplaceFirstStoryOfEachType() Dim rngStory As Range For Each rngStory In ActiveDocument.StoryRanges With rngStory.Find .Text = "find text" .Replacement.Text = "I'm found .Wrap = wdFindContinue .Execute Replace:=wdReplaceAll End With Next rngStory End Sub Thanks everyone in advance


      Best Regards.
    • Benandro
      By Benandro
      Hello,
      im working on a Script that should change a high amount of Word Templates at once.
      Target is to open each Templatefile (.dotx) in a specific folder and do the following steps:
      Add a page break at the end of the document (works) Add a text on the created Page (works) Change the headerstyle to blank for the new page and the following (missing) Add a heading between two specific headings (missing) Can please someone help me to add the 2 functions to the script?
       
      #include <word.au3> #include <File.au3> #include <array.au3> ; wdGoToDirection Const $wdGoToNext = 2 ; wdGoToItem Const $wdGoToPage = 1 ; Created a logfile for tracking/error reporting on my local desktop, though anywhere would work. Needs to be changed or it will error. Global $LogFile = FileOpen("c:\logfiles\test.log", 1) ; This is the network path, change it or this will error as it is. ListFiles ("D:\Templates\") Global $loopend=$aFileList[0] ; Creates an instance of Word for the program to use. Logs any errors associated. Global $oWord = _Word_Create(False, False) If @error <> 0 Then Exit _FileWriteLog($LogFile, "Error creating a new Word application object. @error = " & @error & ", @extended = " & @extended & @crlf) If @extended = 1 Then _FileWriteLog ($LogFile, "MS Word was not running when _Word_Create was called." & @CRLF) Else _FileWriteLog ($LogFile, "MS Word was already running when _Word_Create was called." & @CRLF) EndIf ; Logs and begins loop _FileWriteLog ($LogFile, "Beginning Loop." & @CRLF) For $looper = 1 to $loopend Step +1 _FileWriteLog ($LogFile, "Modifying file: " & $aFileList[$looper], " ") OpenAndModify ("D:\Templates\" & $aFileList[$looper]) Next ; Closes instance of Word _Word_Quit ($oWord) _FileWriteLog ($LogFile, "Program Completed.") ; Begins Function section ; Two functions, Listfiles and OpenAndModify Func ListFiles($FolderPath) ; Function puts all files in the network folder into an array. Logs any errors. _FileWriteLog ($LogFile, "Getting File Information for: " & $FolderPath & @crlf) Global $aFileList = _FileListToArray($FolderPath, "*") If @error = 1 Then _FileWriteLog($LogFile, "Path was invalid." & @crlf) EndIf If @error = 4 Then _FileWriteLog ($LogFile, "No file(s) were found." & @crlf) EndIf EndFunc Func OpenAndModify ($sDocument) ; Function opens file and changes the Page Setup ; Opens the Document Local $oDoc = _Word_DocOpen ($oWord, $sDocument, Default, Default, Default) If @error <> 0 Then _FileWriteLog ($LogFile, "Error opening " & $sDocument & " @error = " & @error & ", @extended = " & @extended & @crlf) & Exit ; Changes Tray Settings ;$oDoc.PageSetup.FirstPageTray = 0 ;$oDoc.PageSetup.OtherPagesTray = 0 ; Add a link to the end of the document and set parameters ; ScreenTip and TextToDisplay Local $oRange = _Word_DocRangeSet($oDoc, -2); Go to end of document $oRange.InsertBreak($wdPageBreak) ;MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocRangeSet Example", "Inserted a break.") $oRange.Text = "«Text»" ; Add a space at the end of the document $oRange = _Word_DocRangeSet($oDoc, -2) If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocLinkAdd Example", _ "Error adding a link to the document." & @CRLF & "@error = " & @error & ", @extended = " & @extended) MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocLinkAdd Example", "Baustein wurde an das Ende des Dokuments eingefügt.") ; Saves the document _Word_DocSave($oDoc) _FileWriteLog ($LogFile, "Modification of" & $sDocument & " complete." & @CRLF) EndFunc  
    • Neonovaz
      By Neonovaz
      Hello
       
      Is there anyway to store word documents in Autoit GUI? For example I have a instruction sheet that I want to bundle up with the exe.

      So a user simply clicks the icon and the stored document will launch  (Something like how you can add objects like excel sheets in word documents )

      (I Know we can launch word files from script directory)

       
    • Jury
      By Jury
      I've failed to find an example of _Word_DocFindReplace which searches for formatted text (I'm looking for stand alone paragraph marks that are formatted other than normal i.e. Bold Italic, Underlined). 
      The reason being that when converting a Word document to html one of the main problems in the results is that a stand alone paragraph mark is converted to an html space that retains the formatting ...>&nbsp;<... thus showing up as a underline _  in a browser when it should be blank.  I've played around with the script and got it to at least un-bold  the first paragraph mark regardless if it was bold or not but I'd like to clear all formatting from any stand alone paragraph marks in the whole document.  Below is what I've done so far (not much more than in the help file I'm afraid) .  Way down at the bottom of the _Word_DocFindReplace  help  text is this parameter but without any examples to be found :
      $bFormat   [optional] True to have the find operation locate formatting in addition to or instead of the find text (default = False) #include <MsgBoxConstants.au3> #include <Word.au3> $processing = @MyDocumentsDir & '\AutoIt_code\getter\processing\' Global $oWord = _Word_Create() Global $sTestfile = $processing & "Testing.docx" ConsoleWrite($sTestfile & @CRLF) Global $oDoc = _Word_DocOpen($oWord, $sTestfile) If @error Then Exit MsgBox($MB_SYSTEMMODAL, "ERROR", "Error opening file = '" & $sTestfile & "'" & @CRLF & "@error = " & @error & ", @extended = " & @extended) $oRangeFound = _Word_DocFind($oDoc, "^p", Default, Default) If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocFind Example", _ "Error locating paragraph control character in the document." & @CRLF & "@error = " & @error & ", @extended = " & @extended) $oRangeFound.Bold = False If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocFind Example", _ "Error inserting text after the paragraph control character in the document." & @CRLF & "@error = " & @error & _ ", @extended = " & @extended) MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocFind Example", "Paragraph control character successfully replaced." & @CRLF & _ "Text inserted in paragraph 2.")  
×