anthonyjr2

Finding a string in a Word Document

19 posts in this topic

#1 ·  Posted

I'm using the Word UDF for the first time, and I'm having some trouble with _Word_DocFind(). There isn't really much talk around the forums about this so it's hard to find any support on the issue I'm having. Here's my code:

#include <Word.au3>

$listPath = @ScriptDir & "\AMCH OFFSET 042617.docx"

$pWord = _Word_Create()
$oWord = _Word_DocOpen($pWord, $listPath)

Local $ctr = 0
Local $searchRange = _Word_DocFind($oWord, "Claim Number")
If Not @error Then
    $ctr += 1
EndIf
While ($searchRange <> 0)
    $searchRange = _Word_DocFind($oWord, "Claim Number", 0, $searchRange)
    If Not @error Then
        $ctr += 1
    EndIf
    $searchRange.Select
WEnd

My problem is that it doesn't seem to find a match of the string on any page after the second. When I run the script, it just loops indefinitely on the second page. I can't post an example of the word document because it is medical data, but every page is basically the same and every page has the string I am looking for on it. Also I tried checking @error after doing a find and it is never set, so I don't think that's the problem.


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites



#2 ·  Posted

The following works for me:

#include <Word.au3>
$sListPath = @ScriptDir & "\AMCH OFFSET 042617.docx"

$oWord = _Word_Create()
$oDoc = _Word_DocOpen($oWord, $sListPath)

Local $vSearchRange = 0, $iSearchRange = 0
While 1
    $vSearchRange = $vSearchRange = 0 ? _Word_DocFind($oDoc, "Claim Number") : _Word_DocFind($oDoc, "Claim Number", 0, $vSearchRange)
    If @error Then ExitLoop
    $iSearchRange += 1
    $vSearchRange.Select
WEnd

 

Share this post


Link to post
Share on other sites

#3 ·  Posted

You do not process the first hit after the first _Word_DocFind. You need something like this (my example file contains: Claim Number x")

#include <Word.au3>

Global $iCounter = 0
Global $sFilePath = @ScriptDir & "\Test.docx"
Global $oWord = _Word_Create()
Global $oDoc = _Word_DocOpen($oWord, $sFilePath)
Global $oRangeFound = _Word_DocFind($oDoc, "Claim Number")
If Not @error Then
    While 1
        $oRangeLine = _Word_DocRangeSet($oDoc, $oRangeFound, Default, Default, $wdCharacter, 2) ; Extend the range 2 characters to the right 
        ConsoleWrite($oRangeLine.Text & @CRLF) ; Write the found text to the console
        $iCounter += 1
        $oRangeFound = _Word_DocFind($oDoc, "Claim Number", 0, $oRangeFound)
        If @error Then ExitLoop
    WEnd
EndIf
ConsoleWrite($iCounter & " matches found!" & @CRLF)

 


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#4 ·  Posted

Hey guys,

Thanks for the suggestions, but I still have the same issue. I tried both examples and they still don't progress past the second page of the document. I wasn't sure if it was something weird with the document but doing a CTRL + F in the document finds all occurrences of the string. I'll try creating a new document again to see if it works better.


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites

#5 ·  Posted

Can you please post your test document so we can play with it?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#6 ·  Posted

I PMed you @water


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites

#7 ·  Posted

So did u find solution? I also keep finding same stuff and it does not move to the next

Share this post


Link to post
Share on other sites

#8 ·  Posted

Can you post your document and the word your searching for?

Share this post


Link to post
Share on other sites

#9 ·  Posted

I search all occurrence of word "History" in an ebook. I cant post ebook coz its copyrighted. the loop keeps finding the first word and I pass back original range object. The DocFind correctly calculates Start of next search but the new DocFind iteration keeps finding the first word only, Not sure if something changed in Office 2016 or Win10?

1094-1101
TEST1101 1199965ForwardTrue
1094-1101
TEST1101 1199965ForwardTrue
1094-1101
 

 

Share this post


Link to post
Share on other sites

#10 ·  Posted

A simple reproducer document would be fine to play with.
There is another thread with the sam problm but th document is very complex.
I'm still running Office 2010 but in the near future will be able to test with Office 2016.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

Me and water have discussed what the problem possibly is, but he hasn't had time yet to take a look into it. It may be due to how the page breaks are laid out. For the record I'm using Office 2007, which means the issue happens on 2007, 2010, and 2016.

Edited by anthonyjr2

UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Right now I'm sitting in front of my Windows PC and try to find out what causes the problem.
I tested:

  • Section break (next page) - works as expected (finds all occurrences)
  • Section break (continuous)  - works as expected (finds all occurrences)
  • Column break  - works as expected (finds all occurrences)
  • To be continued

 

Edited by water
1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#13 ·  Posted

I removed all formatting by copying all content to an empty document and now it works as expected. The function finds all occurrences.
So it is not the function but the formatting that causes the problem.

I have now been playing with the document for more than an hour - to no avail.
In my opinion this is the "strangest" document I have ever layed my eyes upon.

Any change to clean up the formatting of the document? Means it doesn't use table borders but overlays the table with a picture containing a grid etc. etc.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#14 ·  Posted

To be honest the formatting of the document doesn't matter to me at all. The original document is a PDF file that I actually convert using Acrobat DC to a docx, since I couldn't find another easy way to do so. All I really need to do is grab some information such as the claim number from that document, is that still possible if the formatting is removed?


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites

#15 ·  Posted

You could write the text content of the document to a flat text file and then process this file:

#include <Word.au3>
#include <MsgBoxConstants.au3>
Global $oWord = _Word_Create()
Global $oDoc = _Word_DocOpen($oWord, @ScriptDir & "\TestSearchFullCompatibilityMode.docx")
Global $sText = $oDoc.Content.Text
FileWrite("C:\temp\tt.txt", $sText)

 


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#16 ·  Posted

That's what I originally tried to do, but it ends up writing things out in weird locations which makes it hard to structure my find correctly. I figured if the document had some structure such as a word document it would be easier to keep track of things because it is separated by pages.


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites

#17 ·  Posted

I have monarch with the PDF. I can get it in to excel for you if that is any help. :/


Life's simple. You make choices and you don't look back.

Share this post


Link to post
Share on other sites

#18 ·  Posted (edited)

<double post>

Edited by SkysLastChance

Life's simple. You make choices and you don't look back.

Share this post


Link to post
Share on other sites

#19 ·  Posted

2 minutes ago, SkysLastChance said:

I can get it in to excel for you if that is any help

Nah, I'm not really trying to get it into Excel, all I need to do is grab some specific information from certain spots.


UHJvZmVzc2lvbmFsIENvbXB1dGVyZXI=

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • Jury
      By Jury
      I've failed to find an example of _Word_DocFindReplace which searches for formatted text (I'm looking for stand alone paragraph marks that are formatted other than normal i.e. Bold Italic, Underlined). 
      The reason being that when converting a Word document to html one of the main problems in the results is that a stand alone paragraph mark is converted to an html space that retains the formatting ...>&nbsp;<... thus showing up as a underline _  in a browser when it should be blank.  I've played around with the script and got it to at least un-bold  the first paragraph mark regardless if it was bold or not but I'd like to clear all formatting from any stand alone paragraph marks in the whole document.  Below is what I've done so far (not much more than in the help file I'm afraid) .  Way down at the bottom of the _Word_DocFindReplace  help  text is this parameter but without any examples to be found :
      $bFormat   [optional] True to have the find operation locate formatting in addition to or instead of the find text (default = False) #include <MsgBoxConstants.au3> #include <Word.au3> $processing = @MyDocumentsDir & '\AutoIt_code\getter\processing\' Global $oWord = _Word_Create() Global $sTestfile = $processing & "Testing.docx" ConsoleWrite($sTestfile & @CRLF) Global $oDoc = _Word_DocOpen($oWord, $sTestfile) If @error Then Exit MsgBox($MB_SYSTEMMODAL, "ERROR", "Error opening file = '" & $sTestfile & "'" & @CRLF & "@error = " & @error & ", @extended = " & @extended) $oRangeFound = _Word_DocFind($oDoc, "^p", Default, Default) If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocFind Example", _ "Error locating paragraph control character in the document." & @CRLF & "@error = " & @error & ", @extended = " & @extended) $oRangeFound.Bold = False If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocFind Example", _ "Error inserting text after the paragraph control character in the document." & @CRLF & "@error = " & @error & _ ", @extended = " & @extended) MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocFind Example", "Paragraph control character successfully replaced." & @CRLF & _ "Text inserted in paragraph 2.")  
    • FrancescoDiMuro
      By FrancescoDiMuro
      Good morning everyone
      I am working on a little script, which takes some data from a SQLite DB and should create a sort of report, inserting rows in a Word Document... I arrived at the point of:
      _Word_DocTableWrite() and, I don't know how to set the range parameter? What does that specify? 
      Thanks a lot for the help
      EDIT:
      Managed to write a table in the Word document, but now I get an error when I save the document with _Word_DocSaveAs(), with error 2.
      Which are possible causes? Thanks a lot, again
      EDIT 2:
      ... And, how can I set a border to the table? Maybe, with a sort of auto-formatting for text ( larger is the text, larger is the height/width of the table's cell ).
      Thanks  
      EDIT 3 ( bug ):
      Including the parameter $WdSaveChanges in the function _Word_DocSaveAs(), a save dialog box appears, and it should not do it, as it's written in the MSDN documentation:
      wdSaveChanges -1 Save pending changes automatically without prompting the user. Thanks again for everyone will answer to me  
    • Alin86
      By Alin86
      Hello, in my own design of small procedures, used to insert the ellipse notation in the GUI graphics, and let the arrow pointing to the little girl mouth position. I discovered that point to the gap region size is not fixed, first thought he had just set the angle, calculate the arc starting coordinates and end coordinates OK, then I found that I was wrong. The online search information, for a long time did not find the answer, only to the official website to help you, because I do not start.

      #include <GDIPlus.au3> #include <GUIConstantsEx.au3> #include <Array.au3> Opt("MouseCoordMode", 2) ;1=absolute, 0=relative, 2=client $nPI = 3.1415926535897932384626433832795 $iAngle = 95 $iSpace = 50 $nX = 100 $nY = 200 $nWidth = 200 $nHeight = 100 $fStartAngle = $iAngle + $iSpace / 2 $fSweepAngle = 360 - $iSpace _GDIPlus_Startup() Local Const $iWidth = 600, $iHeight = 600 Local $hGUI = GUICreate("GDI+ UDF 示例", $iWidth, $iHeight) GUISetState(@SW_SHOW) Local $hGraphics = _GDIPlus_GraphicsCreateFromHWND($hGUI) _GDIPlus_GraphicsSetSmoothingMode($hGraphics, $GDIP_SMOOTHINGMODE_HIGHQUALITY) Local $hPen = _GDIPlus_PenCreate(0xFFFF8080, 1) Local $hImage = _GDIPlus_ImageLoadFromFile("g_8.png") _GDIPlus_GraphicsDrawImageRect($hGraphics, $hImage, 400, 100, 102, 278) _GDIPlus_GraphicsDrawArc($hGraphics, $nX, $nY, $nWidth, $nHeight, $fStartAngle, $fSweepAngle, $hPen) ; Local $a = $nWidth / 2 Local $b = $nHeight / 2 Local $c = Sqrt($a ^ 2 - $b ^ 2) Local $Coordinate[4] $Coordinate[0] = Sqrt($a ^ 2 * $b ^ 2 / ($a ^ 2 * Tan((360 - $iAngle - $iSpace / 2) * $nPI / 180) ^ 2 + $b ^ 2)) $Coordinate[1] = Tan((360 - $iAngle - $iSpace / 2) * $nPI / 180) * $Coordinate[0] ;_ArrayDisplay($Coordinate) If (360 - $iAngle - $iSpace / 2) >= 90 And (360 - $iAngle - $iSpace / 2) <= 270 Then $Coordinate[0] = -$Coordinate[0] If (360 - $iAngle - $iSpace / 2) >= 0 And (360 - $iAngle - $iSpace / 2) < 180 Then $Coordinate[1] = -$Coordinate[1] $Coordinate[0] = $nX + $a + $Coordinate[0] $Coordinate[1] = $nY + $b + $Coordinate[1] ;MouseMove($Coordinate[0], $Coordinate[1]) Do Until GUIGetMsg() = $GUI_EVENT_CLOSE _GDIPlus_PenDispose($hPen) _GDIPlus_ImageDispose($hImage) _GDIPlus_GraphicsDispose($hGraphics) _GDIPlus_Shutdown() GUIDelete($hGUI)
    • Duck
      By Duck
      I'm attempting to read each line of a word document and assign the line to a variable. Similarly to how you can read a line from a text file (.txt or .csv) using FileReadLine(). So far i have been unsuccessful in reading from a .doc/.docx file, nor have i found any documentation that has helped.

      In searching for a solution i did find a function to convert the word doc to a text file, however my script is for (PCI) auditing purposes and i do not want to create a new file on the HDD.  I have also read through the _Word UDF help files... Unless im not understanding the _Word UDF correctly, I did not see anything that functions similarly to the FileReadLine function.
      Any help/advice is greatly appreciated!  
       
      Here is what i have been attempting to do(doesn't work): 
       
      #include <file.au3> #include <Array.au3> #include <LuhnCheck.au3> #include <Excel.au3> #include <Word.au3> Global $sPath = 'C:\Users\' Global $filePath Global $pii = @ScriptDir & '\pii_CreditCard.csv' Global $filesArray = _FileListToArrayRec($sPath , '*.txt;*.csv;*.doc;*.docx;*.xls;*.xlsx',1,1,0,2) For $i = 1 to $filesArray[0] ;Loop through file extensions and add files to the fileArray ;Assign the position in the filesArray to filePath (filePath is set to full path in FileListToArrayRec) $filePath = $filesArray[$i] readFile($filePath) Next Func readFile($file) If StringInStr($file, '.txt') Or StringInStr($file, '.csv') Then ; .txt file readTxtFile($file) ElseIf StringInStr($file, '.doc') Then ; .doc & .docx files ;============================================== part that does not work========================= Local $oWord = _Word_Create() ;$openFile = FileOpen($file, 0); While 1 Local $line = FileReadLine(_Word_DocOpen($oWord, $file, Default, Default, True)) If @error = -1 Then ExitLoop ;lookForCreditCardNumbers($line) MsgBox(0,0, $line) WEnd FileClose($openFile) ;============================================== part that does not work========================== EndIf EndFunc Func readTxtFile($fileToOpen) $openFile = FileOpen($fileToOpen, 0); open file for reading and assing it to the openFile variable While 1 Local $line = FileReadLine($openFile) If @error = -1 Then ExitLoop lookForCreditCardNumbers($line) WEnd FileClose($openFile) EndFunc Func lookForCreditCardNumbers($evaluateString) $aResult = StringRegExp($evaluateString, '[4|5|3|6][0-9]{15}|[4|5|3|6][0-9]{3}[-| ][0-9]{4}[-| ][0-9]{4}[-| ][0-9]{4}', $STR_REGEXPARRAYMATCH) If Not @error Then Local $newString1 = StringReplace($aResult[0], ' ', '') ;remove spaces Local $newString2 = StringReplace($newString1, '-', '') ;remove dashes Local $bool = _LuhnValidate($newString2) ; Check possible CC number against the Luhn algorithm If $bool = 'True' Then Local $piiCSV = FileOpen($pii, 1) ;open text file for appending/writing, 1 FileWriteLine($piiCSV, $filePath & ', ' & $newString2) FileClose($piiCSV) EndIf EndIf EndFunc  
    • MrCheese
      By MrCheese
      Hi Guys,
      Firstly, thanks for your  help in the past.
      I have a new activity I need to accomplish.
      In summary:
      * need to read a cell in excel (containing a file name)
      * open the file name in word (as its a word document
      *copy the word document
      *paste the word document into the master document
      *read next cell in excel
      ... and repeat until you reach the bottom of the column.
       
      I can read cells open workbooks etc.
      But as far as copying and pasting in word - where is the best place to start, and what functions should I be looking at. Or even if autoit is the right system to use?
      Thanks