Jump to content

word find chars like regexp


Go to solution Solved by mikell,

Recommended Posts

Posted

I'm trying to find/replace some chars into a .docx like this:

#include <Word.au3>

Local $oWord = _Word_Create()
Local $oDoc = _Word_DocOpen($oWord, "test.docx")
Local $oRangeFound, $oRangeText, $oSearchRange = _Word_DocRangeSet($oDoc, -1)

; find at least 2 spaces only after numbers: "wordA 123  wordB  wordC" --> "wordA 123 wordB  wordC"
$oRangeFound = _Word_DocFindReplace($oDoc, "([0-9]*) {2,}", "\1", 2, 0, 0,0,1)

It does not work...

How to do it?

 

Posted

_Word_DocFindReplace does not support Regular Expressions.
I'm not sure it is possible with Word at all.

My UDFs and Tutorials:

  Reveal hidden contents

 

Posted

The only way I can think, and I have never done this, is to open the entire document as an object and step through in portions. Run the Regex on a portion at a time...?

Skysnake

Why is the snake in the sky?

Posted
  On 11/9/2021 at 12:58 PM, frank10 said:

How to do it?

Expand  

Word doesn’t seem to support “*” greedy character.

What happens if you just omit it:

$oRangeFound = _Word_DocFindReplace($oDoc, "([0-9]) {2,}", "\1", 2, 0, 0,0,1)

 

Code hard, but don’t hard code...

Posted (edited)
  On 11/9/2021 at 9:06 PM, JockoDundee said:

Word doesn’t seem to support “*” greedy character.

What happens if you just omit it:

$oRangeFound = _Word_DocFindReplace($oDoc, "([0-9]) {2,}", "\1", 2, 0, 0,0,1)

 

Expand  

Doesn't work.

It's not the * that gives problem, it seems it is the "{2,}".

In fact this works:

$oRangeFound = _Word_DocFindReplace($oDoc, "([0-9]*)[ ]*", "\1°", 2, 0, 0,0,1)

BUT as * means 0 or more, it changes also numbers followed by one space... (strangely it should change also if number is followed by no space, instead it does not change this...)

123word     --> no change
123 word    --> 123°word
123  word   --> 123° word



What I want is change only from 2 spaces or more...

Of course I can do workarounds like:

$oRangeFound = _Word_DocFindReplace($oDoc, "([0-9]*)   ", "\1 ", 2, 0, 0,0,1)
$oRangeFound = _Word_DocFindReplace($oDoc, "([0-9]*)  ", "\1 ", 2, 0, 0,0,1)

But it would be a lot better to find out a similar regExp, also for other future uses...

Edited by frank10
Posted

Extended -2147352567 (decimal) is 0x80020009 (hex) and stands for "General Error".
You need a full COM error handler to get more detailed information about the error.
Unfortunately this is a bit complex caused by the way AutoIt handles COM errors in the Word UDF.

#include <Word.au3>
#include <MsgBoxConstants.au3>
Global $oError = ObjEvent("AutoIt.Error", "__Word_COMErrFuncEX")

; Add your code to find/replace text here by calling the modified _Word_DocFindReplaceEX function!

Exit

; #FUNCTION# ====================================================================================================================
; Author ........: water (based on the Word UDF written by Bob Anthony)
; Modified ......:
; ===============================================================================================================================
Func _Word_DocFindReplaceEX($oDoc, $sFindText = Default, $sReplaceWith = Default, $iReplace = Default, $vSearchRange = Default, $bMatchCase = Default, $bMatchWholeWord = Default, $bMatchWildcards = Default, $bMatchSoundsLike = Default, $bMatchAllWordForms = Default, $bForward = Default, $iWrap = Default, $bFormat = Default)
    If $sFindText = Default Then $sFindText = ""
    If $sReplaceWith = Default Then $sReplaceWith = ""
    If $iReplace = Default Then $iReplace = $WdReplaceAll
    If $vSearchRange = Default Then $vSearchRange = 0
    If $bMatchCase = Default Then $bMatchCase = False
    If $bMatchWholeWord = Default Then $bMatchWholeWord = False
    If $bMatchWildcards = Default Then $bMatchWildcards = False
    If $bMatchSoundsLike = Default Then $bMatchSoundsLike = False
    If $bMatchAllWordForms = Default Then $bMatchAllWordForms = False
    If $bForward = Default Then $bForward = True
    If $iWrap = Default Then $iWrap = $WdFindContinue
    If $bFormat = Default Then $bFormat = False
    If Not IsObj($oDoc) Then Return SetError(1, 0, 0)
    Switch $vSearchRange
        Case -1
            $vSearchRange = $oDoc.Application.Selection.Range
        Case 0
            $vSearchRange = $oDoc.Range()
        Case Else
            If Not IsObj($vSearchRange) Then Return SetError(2, 0, 0)
    EndSwitch
    Local $oFind = $vSearchRange.Find
    $oFind.ClearFormatting()
    $oFind.Replacement.ClearFormatting()
    Local $bReturn = $oFind.Execute($sFindText, $bMatchCase, $bMatchWholeWord, $bMatchWildcards, $bMatchSoundsLike, _
            $bMatchAllWordForms, $bForward, $iWrap, $bFormat, $sReplaceWith, $iReplace)
    If @error Or Not $bReturn Then Return SetError(3, @error, 0)
    Return 1
EndFunc   ;==>_Word_DocFindReplaceEX

Func __Word_COMErrFuncEX()
    Local $bHexNumber = Hex($oError.number, 8)
        Local $sError = "COM Error Encountered in " & @ScriptName & @CRLF & _
            "@AutoItVersion = " & @AutoItVersion & @CRLF & _
            "@AutoItX64 = " & @AutoItX64 & @CRLF & _
            "@Compiled = " & @Compiled & @CRLF & _
            "@OSArch = " & @OSArch & @CRLF & _
            "@OSVersion = " & @OSVersion & @CRLF & _
            "Scriptline = " & $oError.scriptline & @CRLF & _
            "NumberHex = 0x" & $bHexNumber & @CRLF & _
            "Number = " & $oError.number & @CRLF & _
            "WinDescription = " & StringStripWS($oError.WinDescription, $STR_STRIPTRAILING) & @CRLF & _
            "Description = " & StringStripWS($oError.description, $STR_STRIPTRAILING) & @CRLF & _
            "Source = " & $oError.Source & @CRLF & _
            "HelpFile = " & $oError.HelpFile & @CRLF & _
            "HelpContext = " & $oError.HelpContext & @CRLF & _
            "LastDllError = " & $oError.LastDllError
        MsgBox($MB_ICONERROR, "Debug Info", $sError)
EndFunc   ;==>__AD_ErrorHandler

 

My UDFs and Tutorials:

  Reveal hidden contents

 

Posted

Ok, that error says:

"Description = The Find What text contains a Pattern Match expression which is not valid."

But how should it be written?

([0-9]*)[ ]{2,}

As for the links above, it seems correct...

Posted

I get the impression that MS Word does not fully support Regualr Expressions (https://vlasovstudio.com/regent/documentation/Microsoft-Word-Wildcards-as-Regular-Expressions.html).
Unfortunatley I'm not familiar with the Wildcards supported by MS Word. But I suggest to try in MS Word before using _Word_FindReplace.

My UDFs and Tutorials:

  Reveal hidden contents

 

  • Solution
Posted
  On 11/10/2021 at 8:09 AM, frank10 said:

Yes, they say that you can use {2,}, but in fact it doesn't work...

Expand  

 

  On 11/10/2021 at 7:31 AM, frank10 said:

BUT as * means 0 or more, it changes also numbers followed by one space... (strangely it should change also if number is followed by no space, instead it does not change this...)

Expand  

So did you try this ?
"([0-9]*)[ ][ ]*"

Posted (edited)
  On 11/10/2021 at 1:07 PM, mikell said:

 

So did you try this ?
"([0-9]*)[ ][ ]*"

Expand  

Thank you mikell: good catch!

The only thing, with yours it gets also:

123a                        --> no change
123 aa                      --> no change
123  aaa                    --> ok
123   aaa                   --> ok
wordA 123 wordB   wordC     --> NOT ok it changes also wordB   wordC...

Instead with this:

"([0-9])[ ][ ]*"

it's perfect!

123a                        --> no change
123 aa                      --> no change
123  aaa                    --> ok
123   aaa                   --> ok
wordA 123 wordB   wordC     --> no change

 

Good workaround.

Edited by frank10

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...