Jump to content

StringRegExp -- match more than one pattern


Recommended Posts

G'day everyone

On a line like this:

$foo = '<f1>This is the </f1><f2>BIGGEST</f2> one<s0/>!'

I can either do this:

$tags = StringRegExp ($foo, '(<[/]{0,1}[a-z]{0,3}[0-9]{0,5}[/]{0,1}>)', 3)

to get <f1>, </f1>, <f2>, </f2> and <s0>

or I can do this:

$tags = StringRegExp ($foo, '([A-Z]+[a-z]+)', 3)

to get "This" and "BIGGEST"

But how can I get both?

I mean, how can I get the array being:

<f1>, This, </f1>, <f2>, BIGGEST, </f2> and <s0>

If I do this:

$tags = StringRegExp ($foo, '(<[/]{0,1}[a-z]{0,3}[0-9]{0,5}[/]{0,1}>)([A-Z]+[a-z]*)', 3)

Then the array is only this:

<f1>, This, <f2>, BIGGEST

My question is how can I get both patterns evaluated at the same time?

Thanks

Link to comment
Share on other sites

  • Moderators

Others will post code.. but to be honest, I'm not going to until I can see what exactly you're asking about.

Are you saying you don't want two elements? [0] = This and [1] = Biggest ... That you only want one ... [0] = This Biggest?

Try to be much more specific with what you want the output to be when posting regex questions... so others don't waste their times coming up with solutions that don't meat your needs.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

On a line like this:

$foo = '<f1>This is the </f1><f2>BIGGEST</f2> one<s0/>!'

I've thinkered on and found that this line works (almost):

$tags = StringRegExp ($foo, '(<[/]{0,1}[a-z]{0,3}[0-9]{0,5}[/]{0,1}>)+?|([A-Z]+[a-z]*)+?', 3)

...except that for some reason two blank array items are added at positions 2 and 6 (just before "This" and "BIGGEST").

Link to comment
Share on other sites

  • Moderators

I've thinkered on and found that this line works (almost):

$tags = StringRegExp ($foo, '(<[/]{0,1}[a-z]{0,3}[0-9]{0,5}[/]{0,1}>)+?|([A-Z]+[a-z]*)+?', 3)

...except that for some reason two blank array items are added at positions 2 and 6 (just before "This" and "BIGGEST").

If you don't care about there is more than 1 index in the array this should fit your needs:
"<f\d+>(\w+).*?</f\d+>"

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Others will post code.. but to be honest, I'm not going to until I can see what exactly you're asking about.

Are you saying you don't want two elements? [0] = This and [1] = Biggest ... That you only want one ... [0] = This Biggest?

Thanks for asking... I did not realise that my question was unclear. I did not want to post lots and lots of irrelevant code, see.

I'm working on a script that will read a line of XML-like text from a file, reduce it to the XML-like tags, and then enable the user to paste the XML-like tags into some text editor one after the other. The script doing that work fine (as long as the line of text actually contains tags). Here it is:

#cs

On my keyboard, the M, < and > are next to each other bottom right.  So I used
these three keys for selecting the tags and accepting it.  If you use the home
row, and you drop your hand one row, the < and > should be under your middle
and ring finger, and the M should be under your forefinger.  On the home row,
the K is under your middle finger.

Ctrl+< = previous tag
Ctrl+> = next tag
Ctrl+M = accept tag
Ctrl+K = accept tag

You can also accept the tag by pressing the left or right arrow, obviously, but
Ctrl+M (or Ctrl+K) will ensure that the cursor is directly after the tag and not one position
away from it.

If you use Ctrl <, >, M and K anywhere else, nothing will happen.

#ce

Global $foo
Global $tags
Global $len
Global $ben
Global $numtags
Global $j
Global $a

HotKeySet("^m", "Cancel")
HotKeySet("^k", "Cancel")
HotKeySet("^.", "NextTag")
HotKeySet("^,", "PrevTag")

$j = 0
$a = 1
$z = 1

$time1 = FileGetTime ("source.txt", 0, 1)
$fileopen = FileOpen ("source.txt", 128)
$foo = FileRead ("source.txt")
FileClose ("source.txt")

$tags = StringRegExp ($foo, '(<[/]{0,1}[a-z]{0,3}[0-9]{0,5}[/]{0,1}>)', 3)

$numtags = UBound($tags)

While 1
    Sleep(1000)

$time2 = FileGetTime ("source.txt", 0, 1)

If $time2 <> $time1 Then
$fileopen = FileOpen ("source.txt", 128)
$foo = FileRead ("source.txt")
FileClose ("source.txt")
$tags = StringRegExp ($foo, '(<[/]{0,1}[a-z]{0,3}[0-9]{0,5}[/]{0,1}>)', 3)
$numtags = UBound($tags)
$time1 = FileGetTime ("source.txt", 0, 1)
EndIf

WEnd

; ================================

Func NextTag ()

If $a = 1 Then
If WinActive ("OmegaT", "") Then

If $j = $numtags - 1 Then
$j = 0
Else
$j = $j + 1
EndIf

ClipPut ($tags[$j])
$len = StringLen ($tags[$j])

Send ("^v")
Send ("{LEFT " & $len & "}")
Send ("+{RIGHT " & $len & "}")
Send ("{ESC}")
Send ("{ESC}")

EndIf
EndIf

EndFunc


Func PrevTag ()

If $a = 1 Then
If WinActive ("OmegaT", "") Then

If $j = 0 Then
$j = $numtags - 1
Else
$j = $j - 1
EndIf

ClipPut ($tags[$j])
$len = StringLen ($tags[$j])

Send ("^v")
Send ("{LEFT " & $len & "}")
Send ("+{RIGHT " & $len & "}")
Send ("{ESC}")
Send ("{ESC}")

EndIf
EndIf

EndFunc

Func Cancel ()

If WinActive ("OmegaT", "") Then

Send ("{LEFT}")
Send ("{RIGHT}")

EndIf

EndFunc

The line of text appears in source.txt, and the program in which the user wants to paste the XML-like tags is called OmegaT. This is actually a translators' tool and source.txt contains an exported version of the source text that the translator is translating.

But I want the script to be more fancy.

At present, if source.txt contains this:

<f1>This is the </f1><f2>BIGGEST</f2> one<s0/>!

then the script will enable the user to insert the following tags:

<f1>

</f1>

<f2>

</f2>

<s0/>

...which is useful enough. But wouldn't it be great if the script can also assist the user to insert all words that start with capital letters, and words that consist entirely of capital letters? Such words are usually proper names that don't need translating and that can be carried over from the source text directly to the translation.

So, I want the script to enable the user to insert any of the following:

<f1>

This

</f1>

<f2>

BIGGEST

</f2>

<s0/>

Is this clear enough?

PS I think my script above still has a few bugs, eg if the source.txt contains no tags then the script dies, so I still need to fix that.

Edited by leuce
Link to comment
Share on other sites

But wouldn't it be great if the script can also assist the user to insert all words that start with capital letters, and words that consist entirely of capital letters? Such words are usually proper names that don't need translating and that can be carried over from the source text directly to the translation.

If there is no solution to my problem, don't worry about it. I decided to map capital letters to a different set of hotkeys.

The final script is here:

http://leuce.com/tempfile/omtautoit/taggrabber.zip

Link to comment
Share on other sites

maybe this Regex?

"(</?\w*/?>)|([A-Z]+(\w)*)"

Might need to tweak it a little bit... Basically it should grab any tags that contain word characters inside < and > or it will grab anything that begins with a capital letter. I know I'm missing something, if not a few things, so look over it carefully. But is this what you are aiming at?

Link to comment
Share on other sites

Maybe if you want you can allow things like in the HTML world like <img src="\Smiles\lol.gif">lol</img> or to recursively build the tree like <html><something></something></html> so it matches it as a while <html>...</html> and the bowels as a separated unit, close to the analyzing a DOM browser will do or parse.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...