Jump to content
kcvinu

How to find first and last word of sentence with Regular Expression

Recommended Posts

kcvinu

Hi all,

How to find first and last word in a sentence with Regular expression ?. Sentence is the code from SciTE. For example if sentence is;

"For $i = 0 to 15" 

I need to extract "For from the sentence

And the same way i need last word from a sentence

"If Apple = 15 Then" 

I need "Then" from the sentence.


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites
water

The most important thing for a working solution is to define the meaning of "word".

You need to define what delimits a word. Space, comma, bracket etc.

  • Like 1

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-06-01 - Version 1.4.9.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-01-27 - Version 1.3.3.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
mpower

Hi all,

How to find first and last word in a sentence with Regular expression ?. Sentence is the code from SciTE. For example if sentence is;

"For $i = 0 to 15" 

I need to extract "For from the sentence

And the same way i need last word from a sentence

"If Apple = 15 Then" 

I need "Then" from the sentence.

 

Give this a whirl:

$sentence = "For $i = 0 to 15"
$sentence2 = "If Apple = 15 Then"

$firstword = StringRegExp($sentence, "(?:^|(?:\.\s))(\w+)", 1)
$lastword = StringRegExp($sentence, "\s(\w+)$", 1)

ConsoleWrite($firstword[0] & @CRLF)
ConsoleWrite($lastword[0] & @CRLF)

$firstword = StringRegExp($sentence2, "(?:^|(?:\.\s))(\w+)", 1)
$lastword = StringRegExp($sentence2, "\s(\w+)$", 1)

ConsoleWrite($firstword[0] & @CRLF)
ConsoleWrite($lastword[0] & @CRLF)

Sources: http://lmgtfy.com/?q=regex+first+word+in+sentence    and     http://lmgtfy.com/?q=regex+last+word+in+sentence

Edited by mpower

Share this post


Link to post
Share on other sites
iamtheky

$sentence = "For $i = 0 to 15"
$sentence2 = "If Apple = 15 Then"

msgbox( 0 , '' , _firstlast($sentence))
msgbox( 0 , '' , _firstlast($sentence2))



Func _firstlast($string)

$aString = stringsplit($string , " ")
return $aString[1] & @CRLF & $aString[ubound($aString) - 1]

EndFunc


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
kcvinu

I have googled a lot and tried something with the RegExp_GUI. Thanks for AutoIt for giving me a nice program for testing regular expresions.

@Water, In this case a word which separated with space only.

@boththose and @mpower, thanks. Let me try.


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites
jguinch

#Include <Array.au3>

$sentence2 = "If Apple = 15 Then"

$words = StringRegExp($sentence2, "(?m)\W*(\w+).*?(\w+)\W*$", 1)
_ArrayDisplay($words)

Here is a small explanation :

(?m) : multiline mode. With this mode, ^ and $ operates on each line instead of the whole string

W* : any non-word character, 0 or more times

(w+) : capturing group. any word character, one or more times

.*? : anything, one or more times. ? takes the smallest occurence

$ : end of line or end of string

Edited by jguinch
  • Like 1

Share this post


Link to post
Share on other sites
kcvinu

@

jguinch, Thanks. There is more than one anser which needs to mark solved. What to do.

My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites
water

Mark the one which is easiest for you to underrstand.

  • Like 1

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-06-01 - Version 1.4.9.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-01-27 - Version 1.3.3.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
kcvinu

@water,

Actually regular expression code is always horrible for me. So there is no easiest thing. I have googled a lot for what this "?" stands for . I didn't get any proper answer. So i have downloaded O'reilly's Regular expression Nutshell. And started reading. 


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites
mikell

If you feel bad with regex, why don't you use the code from boththose ?

It works nice - if you just add a StringStripWS

$sentence = "    If Apple = 15 Then  "

msgbox( 0 , "" , _firstlast($sentence))

Func _firstlast($string)
  $aString = stringsplit(StringStripWS($string, 3) , " ")
  return $aString[1] & @CRLF & $aString[$aString[0]]
EndFunc

Edit

BTW in the case below all the codes on this page will fail  :)

$sentence = "   If Apple = 15 Then   ; this is a condition "
Edited by mikell
  • Like 1

Share this post


Link to post
Share on other sites
kcvinu

@

mikell, I think RegExp code works faster. And i had a wish to learn regular expression. May be it is harder. But i need to learn it. 

My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites
mikell

So...

#Include <Array.au3>

$sentence = "   If Apple = 15 Then   ; this is a condition "

$words = StringRegExp($sentence, "\s*(\w+)[^;]+\s(\w+);?.*", 3)
_ArrayDisplay($words)

:)

  • Like 1

Share this post


Link to post
Share on other sites
kcvinu

@

mikell, So either Regular expression is for humans or you are an alien.  :)

My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites
mikell

Not an alien (so thinks my wife) but I've been a regex student for a long time - and I still am, and always will be   :D

  • Like 1

Share this post


Link to post
Share on other sites
kcvinu

@mikell Then please suggest me a good and free e-book or pdf book to learn regular expression. I am complete beginner.


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites
Melba23

kcvinu,

I always recommend this site. But the learning curve is steep and stays that way - when mikell says "I've been a regex student for a long time " he really means it! :D

M23

  • Like 1

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
kcvinu

@Melba23 , Thank you. I know that. 


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites
mikell

I have 2 favorite sites : the one Melba mentioned, and this other one http://www.rexegg.com/

On both sites explanations are clear and understandable - very important condition when dealing with regex  :)

  • Like 1

Share this post


Link to post
Share on other sites
kcvinu

@Mikell, Thanks. :)


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • PClough
      By PClough
      Hi everyone!
      After updating autoit, I tried to run an old program using complex regexp's.  It did not work.  Eventually I broke the problem down to this example:
       
      #include <Array.au3> $buf = "First title" & @CRLF & "Tom" & Chr(0x92) & "s sleepwalking" & @CRLF & "Last | line" & @CRLF $items = StringRegExp($buf, '([\x20-\xff]+)\x0d\x0a', 3) _ArrayDisplay($items,'') And this is the result I get when running it:
      Row 0
       
    • Miliardsto
      By Miliardsto
      Hello . How to do that
      $regexp = starts from "abcdef" and after this could be anything in name
      WinActivate($regexp)
    • Robinson1
      By Robinson1
      Well the plan is to use the power of regular expressions engine of AutoIT for patching binary data.
      Something like this: StringRegExp( $BinaryData,  "(?s)\x55\x8B.."
       
      <cut> ... Okay straight to question/problem
      ... certain bytes that are in the range from 0x80 to 0xA0 won't match.
      Hmm seem to be a char encoding problem. In detail these are 27 chars: 0x80, 0x82~8C, 0x8E, 0x91~9C, 0x9E,0x9F
      Here's a small code snippet to explore / explain this problem:
      #include "StringConstants.au3" $TestData = BinaryToString("0x7E7F808182") ;Okay $match = StringRegExp( $TestData ,'\x7E' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Okay $match = StringRegExp( $TestData ,'\x7F' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Error no match $match = StringRegExp( $TestData ,'\x80' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Okay $match = StringRegExp( $TestData ,'\x81' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Error no match $match = StringRegExp( $TestData ,'\x82' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;~ output: ;~ @extended = 2 $match = ;~ @extended = 3 $match = ;~ @extended = 0 $match = 1 ;~ @extended = 5 $match = ;~ @extended = 0 $match = 1 Hmm what to do? Go back and use the 'numberstring monster' implementation or just omit that range of 'unsafe bytes'. What is the root of this problem?
      Any idea how to fix this?
       
      Update: Okay I know a byte is not a character.
      But StringRegExp operates on String and so character level.
      Okay as long as you stay at Ansi encoding and only use /x00 - /X7F in the search pattern using  StringRegExp works well to search for binary data.
      What bytes can be matched that are in the range from /X7F - /xFF is also depending on the code page.
      So this avoid to search for bytes in the range from 0x80-0xa0 only applies to Germany.
      I just change this country setting:

      to Thai and now near all bytes from /X7F - /xFF fails to match.
    • RichardL
      By RichardL
      Text in a file, read into var with fileread:
      <> <> <> <> < J please look > <> <> <> Hi, 
      I want  a RegExp to select around 'please', back to the previous < and forward to the next >.  I can select the line of text.  Then I add in (?s) and it selects the whole text.  I think I want to make it not greedy, (?U) , that seems to make it ungreedy after, but it still selects all the previous lines.
      $sPattern = "(?s)<.*please.*>" ; 1 $sPattern = "(?s)<(?U).*please.*>" ; 2 $sPattern = "(?s)<(?U).*please(?U).*>" ; 3 $sAry = StringRegExp($sHTML, $sPattern, 3)  
×