Jump to content
Sign in to follow this  
RichardL

Regular Expression to select text over multi-lines

Recommended Posts

RichardL

Text in a file, read into var with fileread:

<>
<>
<>
<>
<
J please look
>
<>
<>
<>

Hi, 

I want  a RegExp to select around 'please', back to the previous < and forward to the next >.  I can select the line of text.  Then I add in (?s) and it selects the whole text.  I think I want to make it not greedy, (?U) , that seems to make it ungreedy after, but it still selects all the previous lines.

$sPattern = "(?s)<.*please.*>"            ; 1
$sPattern = "(?s)<(?U).*please.*>"        ; 2
$sPattern = "(?s)<(?U).*please(?U).*>"    ; 3
$sAry = StringRegExp($sHTML, $sPattern, 3)

 

Share this post


Link to post
Share on other sites
mikell
4 hours ago, RichardL said:

back to the previous < and forward to the next >

Literally this means : you get all, including newlines just after the < and before the >

$str = FileRead("1.txt")
$res = StringRegExpReplace($str, '(?s).*<(.*?please.*?)>.*', "$1")
Msgbox(0,"", $res)

 

Share this post


Link to post
Share on other sites
RichardL

Mikell,  Thansks for that.  I've had a quick play with it and must finish now for today (UK 23:30) It provokes questions:

1 - the .* at each end, outside the <> - what are they doing? I don't want anything outside the <>.

2 - I'm using your pattern in StringRegExp, 3) and the selected text doesn't include the immediate <> (all the text up to those, I've added a few chars to make sure.  Why aren't the <> selected if they are in the pattern?  (This agrees with what gets replaced using your StringRegExpReplace).

Richard.

 

 

 

Share this post


Link to post
Share on other sites
Malkey

This example allows for "please" being in first line or the last line.  And returns all of the previous line, and all of the next line of the "please" contained line, if they exist.

Note: In my example and Mikell example all the text in the "test" parameter of StringRegExpReplace() is matched with the regular expression pattern. So, the only text returned is in the "replace" parameter, which is "$1".  This is the first capture group which is referenced by the first back-reference, "$1".  The first capture group or the first back-reference is defined by the matching text that is matched after the first open bracket, traveling from left to right, and before the matching close bracket.

#cs
<>
<>
<>
<>
<
J please look
>
<>
<>
<>
#ce

;$str = FileRead("1.txt")
$str = StringRegExpReplace(FileRead(@ScriptFullPath), "^(?s).*#cs\s+(.+)\s+#ce.*$", "$1") ; Extract test string from this script.
;ConsoleWrite($str  & @CRLF)

$sFind = "please"
$res = StringRegExpReplace($str, '(?s).*?((\V*\v+)?\V*\Q' & $sFind & '\E\V*(\v+\V*)?).*', "$1")

ConsoleWrite($res & @CRLF)
MsgBox(0, "Results", $res)

 

Share this post


Link to post
Share on other sites
mikell
9 hours ago, RichardL said:

1 - the .* at each end, outside the <> - what are they doing? I don't want anything outside the <>.

The pattern represents the whole string and the part to grab is put inside brackets (capturing group). So the whole text will be replaced by the content of this group, which is backreferenced as "$1" as Malkey explained
 

9 hours ago, RichardL said:

2 -....  Why aren't the <> selected if they are in the pattern? 

because they are outside the brackets. Just move the brackets to include < and > in the group and they will be grabbed too

$str = FileRead("1.txt")
; get the wanted part
$res = StringRegExpReplace($str, '(?s).*(<.*?please.*?>).*', "$1")
; remove included newlines
$res = StringRegExpReplace($res, '\R', "")
Msgbox(0,"", $res)

Using StringRegExp, 3 is a little different. You must then specify that the chars to be grabbed around 'please' must not be < or > by using of a negated character class

$str = FileRead("1.txt")
; using StringRegExp w/ flag 3
$res = StringRegExp($str, '(?s)<[^<]*please[^>]*>', 3)
; remove newlines
$res[0] = StringRegExpReplace($res[0], '\R', "")
Msgbox(0,"", $res[0])

Edit
Please note that there are several ways to skin this cat  :)

Edited by mikell

Share this post


Link to post
Share on other sites
RichardL

It took me a few days to get back to this.  Your patterns worked well on the example text.  When I came to look at the actual text again, the 'not include' selection to prevent it including from the first <P needed to be a string <P, not just one char [^<].  I did some Googleing and it looked hard.  Then I realised I could limit the selection to only the immediately surrounding tags using .{1,90} instead of .* .  Not a very precise way to skin the cat but it's working.  I've learned a few things, thanks.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • mLipok
      By mLipok
      In April 5, 2013 I ask @Lazycat 
      he answer:
      Then I change this tool a little.
      Now I back to this and make bigger changed.
      Here is new version.
      Update History: = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2018/11/07 v3.0 * Changed: AU3Check compilant - mLipok * Changed: almost all Variables renamed - mLipok * Added: "Delete RegExp Results" - mLipok * Added: support for dual monitor - mLipok * Added: "full screen mode" - mLipok = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2018/11/08 v3.1 * Added: colors for each Edit control - used GUICtrlSetBkColor() - mLipok * Added: FullScreen option (Checkbox + INI + Remarks in Tip) - mLipok * Added: _IsChecked() - mLipok * Changed: WinMove() - change size of window using: WindowWidth and WindowHeight - mLipok = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2018/11/13 v3.2 * Added: If $bFullScreen Then GUICtrlSetFont() - mLipok * Added: WM_COMMAND , $EN_CHANGE - prevent CPU overheat - mLipok = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2018/11/29 v3.3 * Changed: $_g_idCheckbox_Clear - also clear $_g_idEdit_Result - mLipok * Changed: ClearResult If GUICtrlRead($_g_idEdit_MatchText) = '' Or GUICtrlRead($_g_idEdit_MatchText) = '' - mLipok * Fixed: prevention CPU overheat - If $iGuiMsg <> 0 Then $_g_bWasAChange = True - any GUI change will fire RegExp result refresh - mLipok * Fixed: Top possition of $_g_idLabel_Dummy control - mLipok * Added: support for TabSwitch - CTRL+TAB and CTRL+SHIFT+TAB - mLipok = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =  
    • MrCheese
      By MrCheese
      argh, pulling my hair out.
      considering this post: 
       
      say for a string = "03a", how can I strip out the leading 0 and the a.
      I have tried:
      $new = StringRegExpReplace($string, '[^1-9][^0-9]', '')
       
      and various combinations:
      ^0+[^0-9]
      [^[:digit:]]
      "[^0].*"
      "^0*(d+)"
       
      I'm going loopy!
       
       
    • PClough
      By PClough
      Hi everyone!
      After updating autoit, I tried to run an old program using complex regexp's.  It did not work.  Eventually I broke the problem down to this example:
       
      #include <Array.au3> $buf = "First title" & @CRLF & "Tom" & Chr(0x92) & "s sleepwalking" & @CRLF & "Last | line" & @CRLF $items = StringRegExp($buf, '([\x20-\xff]+)\x0d\x0a', 3) _ArrayDisplay($items,'') And this is the result I get when running it:
      Row 0
       
    • Miliardsto
      By Miliardsto
      Hello . How to do that
      $regexp = starts from "abcdef" and after this could be anything in name
      WinActivate($regexp)
×