Jump to content

Finding a string in a Word Document


Recommended Posts

I'm using the Word UDF for the first time, and I'm having some trouble with _Word_DocFind(). There isn't really much talk around the forums about this so it's hard to find any support on the issue I'm having. Here's my code:

#include <Word.au3>

$listPath = @ScriptDir & "\AMCH OFFSET 042617.docx"

$pWord = _Word_Create()
$oWord = _Word_DocOpen($pWord, $listPath)

Local $ctr = 0
Local $searchRange = _Word_DocFind($oWord, "Claim Number")
If Not @error Then
    $ctr += 1
EndIf
While ($searchRange <> 0)
    $searchRange = _Word_DocFind($oWord, "Claim Number", 0, $searchRange)
    If Not @error Then
        $ctr += 1
    EndIf
    $searchRange.Select
WEnd

My problem is that it doesn't seem to find a match of the string on any page after the second. When I run the script, it just loops indefinitely on the second page. I can't post an example of the word document because it is medical data, but every page is basically the same and every page has the string I am looking for on it. Also I tried checking @error after doing a find and it is never set, so I don't think that's the problem.

UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Link to post
Share on other sites

The following works for me:

#include <Word.au3>
$sListPath = @ScriptDir & "\AMCH OFFSET 042617.docx"

$oWord = _Word_Create()
$oDoc = _Word_DocOpen($oWord, $sListPath)

Local $vSearchRange = 0, $iSearchRange = 0
While 1
    $vSearchRange = $vSearchRange = 0 ? _Word_DocFind($oDoc, "Claim Number") : _Word_DocFind($oDoc, "Claim Number", 0, $vSearchRange)
    If @error Then ExitLoop
    $iSearchRange += 1
    $vSearchRange.Select
WEnd

 

Link to post
Share on other sites

You do not process the first hit after the first _Word_DocFind. You need something like this (my example file contains: Claim Number x")

#include <Word.au3>

Global $iCounter = 0
Global $sFilePath = @ScriptDir & "\Test.docx"
Global $oWord = _Word_Create()
Global $oDoc = _Word_DocOpen($oWord, $sFilePath)
Global $oRangeFound = _Word_DocFind($oDoc, "Claim Number")
If Not @error Then
    While 1
        $oRangeLine = _Word_DocRangeSet($oDoc, $oRangeFound, Default, Default, $wdCharacter, 2) ; Extend the range 2 characters to the right 
        ConsoleWrite($oRangeLine.Text & @CRLF) ; Write the found text to the console
        $iCounter += 1
        $oRangeFound = _Word_DocFind($oDoc, "Claim Number", 0, $oRangeFound)
        If @error Then ExitLoop
    WEnd
EndIf
ConsoleWrite($iCounter & " matches found!" & @CRLF)

 

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2021-11-10 - Version 1.6.0.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (NEW 2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (NEW 2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2019-12-03 - Version 1.5.1.0) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to post
Share on other sites

Hey guys,

Thanks for the suggestions, but I still have the same issue. I tried both examples and they still don't progress past the second page of the document. I wasn't sure if it was something weird with the document but doing a CTRL + F in the document finds all occurrences of the string. I'll try creating a new document again to see if it works better.

UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Link to post
Share on other sites

Can you please post your test document so we can play with it?

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2021-11-10 - Version 1.6.0.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (NEW 2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (NEW 2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2019-12-03 - Version 1.5.1.0) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to post
Share on other sites
  • 2 weeks later...

I search all occurrence of word "History" in an ebook. I cant post ebook coz its copyrighted. the loop keeps finding the first word and I pass back original range object. The DocFind correctly calculates Start of next search but the new DocFind iteration keeps finding the first word only, Not sure if something changed in Office 2016 or Win10?

1094-1101
TEST1101 1199965ForwardTrue
1094-1101
TEST1101 1199965ForwardTrue
1094-1101
 

 

Link to post
Share on other sites

A simple reproducer document would be fine to play with.
There is another thread with the sam problm but th document is very complex.
I'm still running Office 2010 but in the near future will be able to test with Office 2016.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2021-11-10 - Version 1.6.0.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (NEW 2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (NEW 2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2019-12-03 - Version 1.5.1.0) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to post
Share on other sites

Me and water have discussed what the problem possibly is, but he hasn't had time yet to take a look into it. It may be due to how the page breaks are laid out. For the record I'm using Office 2007, which means the issue happens on 2007, 2010, and 2016.

Edited by anthonyjr2

UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Link to post
Share on other sites

Right now I'm sitting in front of my Windows PC and try to find out what causes the problem.
I tested:

  • Section break (next page) - works as expected (finds all occurrences)
  • Section break (continuous)  - works as expected (finds all occurrences)
  • Column break  - works as expected (finds all occurrences)
  • To be continued

 

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2021-11-10 - Version 1.6.0.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (NEW 2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (NEW 2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2019-12-03 - Version 1.5.1.0) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to post
Share on other sites

I removed all formatting by copying all content to an empty document and now it works as expected. The function finds all occurrences.
So it is not the function but the formatting that causes the problem.

I have now been playing with the document for more than an hour - to no avail.
In my opinion this is the "strangest" document I have ever layed my eyes upon.

Any change to clean up the formatting of the document? Means it doesn't use table borders but overlays the table with a picture containing a grid etc. etc.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2021-11-10 - Version 1.6.0.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (NEW 2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (NEW 2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2019-12-03 - Version 1.5.1.0) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to post
Share on other sites

To be honest the formatting of the document doesn't matter to me at all. The original document is a PDF file that I actually convert using Acrobat DC to a docx, since I couldn't find another easy way to do so. All I really need to do is grab some information such as the claim number from that document, is that still possible if the formatting is removed?

UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Link to post
Share on other sites

You could write the text content of the document to a flat text file and then process this file:

#include <Word.au3>
#include <MsgBoxConstants.au3>
Global $oWord = _Word_Create()
Global $oDoc = _Word_DocOpen($oWord, @ScriptDir & "\TestSearchFullCompatibilityMode.docx")
Global $sText = $oDoc.Content.Text
FileWrite("C:\temp\tt.txt", $sText)

 

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2021-11-10 - Version 1.6.0.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (NEW 2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (NEW 2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2019-12-03 - Version 1.5.1.0) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to post
Share on other sites

That's what I originally tried to do, but it ends up writing things out in weird locations which makes it hard to structure my find correctly. I figured if the document had some structure such as a word document it would be easier to keep track of things because it is separated by pages.

UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Link to post
Share on other sites
2 minutes ago, SkysLastChance said:

I can get it in to excel for you if that is any help

Nah, I'm not really trying to get it into Excel, all I need to do is grab some specific information from certain spots.

UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By diff
      Hello,
       
      so I have started to learn to use the Word UDF and got issue to add my pictures after exact paragraphs. I was searching in the forum for the solution, checked with examples and still I don't understand how to add pictures after paragraph in new line.
       
      My word document has like 8 pages and for example on page I have paragraph named "My examples:" and here starts my problem.
       
      I have tried to do this:
      #include <Word.au3> Local $oWord = _Word_Create() Local $oDoc = _Word_DocOpen($Word, @ScriptDir & "\examples.docx", Default, Default, True) $oSearchRange = _Word_DocRangeSet($oDoc, -1, $wdParagraph, 0) $oRangeFound = _Word_DocFind($oDoc, "My examples:", $oSearchRange) _Word_DocPictureAdd($oDoc, @ScriptDir & "\pic1.jpg", Default, Default, $oRangeFound) And here the picture adds before the paragraph My examples: on same line and looks like
      {pic1}My Examples:
       
      What I want to see is:
      My examples:
      {PICTURE IN NEW LINE} which is pic1.jpg from my code.
       
      How I can do that? I want to add 3 pictures in a row in new line each like:
      My examples:
      {pic1.jpg}
      {pic2.jpg}
      {pic3.jpg}
       
      Hope I explained this well, but you can ask me if you need any additional information to clarify.
    • By ahha
      I'm trying to get the number of columns in a specific row in a Word table and am stuck.   I need a push.  Program below and Word file attached.
      Thanks.
      #AutoIt3Wrapper_run_debug_mode=Y ;use this to debug in console window #include <Word.au3> $oWord = _Word_Create(True, True) ;Create Word application object, make it visible, and force a new instance of Word $oDoc = _Word_DocOpen($oWord, @ScriptDir&"\ColumnTest.docx", Default, Default, True) ;Open the Word document $iTablesCount = $oDoc.Tables.Count ;get Tables count in $oDoc Pause("$iTablesCount = '" & $iTablesCount & "'") $iRowCount = $oDoc.Tables.Item(1).Rows.Count ;Table hard coded $iColCount = $oDoc.Tables.Item(1).Columns.Count Pause("Table#1 $iRowCount = '" & $iRowCount & " $iColCount = '" & $iColCount & "'") ;trying to get the number of columns in each row ;$ColCountInRow = $oDoc.Tables.Item(1).Rows(1).Columns.Count ;this fails and read somewhere to use Cells.Count $ColCountInRow = $oDoc.Tables.Item(1).Rows(1).Cells.Count ;hard code Row 1 <<<<< ERROR here ;this is the error I get ;: ==> The requested action with this object has failed.: ;$ColCountInRow = $oDoc.Tables.Item(1).Rows(1).Cells.Count ;$ColCountInRow = $oDoc.Tables.Item(1)^ ERROR Pause("Row 1 has " & $ColCountInRow & " Columns") Exit Func Pause($text="") MsgBox(262144, "DEBUG", "Paused: " & $text) EndFunc  
      ColumnTest.docx
    • By Fenzik
      Hello!
      i wrote this function as alternative to using the Com Object or Commandline version of this project, discussed also earlyer on this forum.
      Project site - http://ebstudio.info/home/xdoc2txt.html
      Advantage of this implementation is that you do not need to register Com dll, using regsvr32.
      But you still need the project Dll (xd2txlib.dll).
      Enjoy!
      ; #FUNCTION# ==================================================================================================================== ; Name ..........: _ExtractText ; Description ...: Extracts text from advanced documment formats (Doc, Docx, ODT, XLS, ...) ; Syntax ........: _ExtractText($sFilename[, $bProperties = False[, $hDll = 0]]) ; Parameters ....: $sFilename - a string value. ; $bProperties - [optional] a boolean value. Default is False. If True, documment properties will be returned instead of the text. ; $hDll - [optional] a handle value. Default is 0. Optional handle to previously opened xd2txlib.dll. By default the xd2txlib.dll (Expected in @scriptdir) will be opened and closed during the function call. ; Return value .: String, containing the text or documment properties or empty string and Error as follows: ;1 - The file does not exists. ;2 - Error during opening xd2txlib.dll. ;3 - No text returned. ; Author ........: Fenzik ; Modified ......: ; Remarks .......: Project site - http://ebstudio.info/home/xdoc2txt.html ; Related .......: ; Link ..........: ; Example .......: No ; =============================================================================================================================== Func _ExtractText($sFilename, $bProperties = False, $hDll = 0) If Not FileExists($sFilename) Then Return SetError(1, "", "") Local $bLoaded = False If $hDll = 0 Then $hDll = DllOpen(@scriptdir&"\xd2txlib.dll") If $hDll = -1 Then Return SetError(2, "", "") $bLoaded = True Endif $aResult = DllCall($hDll, "int:cdecl", "ExtractText", "WSTR", $sFilename, "BOOL", $bProperties, "WSTR*", "") If $aResult[0] = 0 Then Return SetError(3, "", "") If $bLoaded = True Then DllClose($hDll) Return $aResult[3] EndFunc  
       
      xd2txlib-example.zip
    • By lavascript
      I have a Word document containing a 9-column table where row 1 is the column headers. My goal is to read the table into a 2d array, remove some rows, update some fields, and add a few rows to the end. The resulting array will likely be a different length. Next, I want to write the data back into the table. If it's easier, I can write the data to a new document from a template containing the same table header with a blank 2nd row.
      Here's my early attempt:
      Local $oWord = _Word_Create() Local $oDoc = _Word_DocOpen($oWord, $sFile) Local $aData = _Word_DocTableRead($oDoc, 1) $aData[3][5] = "Something else" Local $oRange = _Word_DocRangeSet($oDoc, 0) $oRange = _Word_DocRangeSet($oDoc, $oRange, $wdCell, 9) _Word_DocTableWrite($oRange,$aData) This, unfortunately, writes the entire array into the first cell of row 2. What am I doing wrong?
       
    • By Subz
      Backstory:
      Our Microsoft Office Templates shared folder was changed from a DFS share to an Isilon share. example:
      Old Server: \\Domain.com\Office\Templates
      New Server: \\Templates.domain.com\Office\Templates
      The team making the changes overlooked that several hundred thousand documents, had been attached to the old template documents.  So when you open a document which has been attached, it will take a couple of minutes to open, while it tries to locate the old server path.  I've been asked to come in and fix it, so after several hours found that the data is being held in document.zip\word\_rels\settings.xml.rels, I now need to replace the old server path with the new server path.  I didn't want to use dom as that would take too long and found a tool wtc https://github.com/NeosIT/wtc which  works perfectly, takes about 8 minutes to scan a single directory with 4000 documents and fix them.  The problem is the documents are all held on sharepoint and they want to retain the file timestamp, which is easy enough, but they also don't want to keep the "Modified By" apparently they don't like seeing all the documents appearing as "Modified by: Subz"  Anyone know of way to retain the "Modified By" info,
×
×
  • Create New...