kcvinu

How to find first and last word of sentence with Regular Expression

23 posts in this topic

Hi all,

How to find first and last word in a sentence with Regular expression ?. Sentence is the code from SciTE. For example if sentence is;

"For $i = 0 to 15" 

I need to extract "For from the sentence

And the same way i need last word from a sentence

"If Apple = 15 Then" 

I need "Then" from the sentence.


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites



The most important thing for a working solution is to define the meaning of "word".

You need to define what delimits a word. Space, comma, bracket etc.

1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Hi all,

How to find first and last word in a sentence with Regular expression ?. Sentence is the code from SciTE. For example if sentence is;

"For $i = 0 to 15" 

I need to extract "For from the sentence

And the same way i need last word from a sentence

"If Apple = 15 Then" 

I need "Then" from the sentence.

 

Give this a whirl:

$sentence = "For $i = 0 to 15"
$sentence2 = "If Apple = 15 Then"

$firstword = StringRegExp($sentence, "(?:^|(?:\.\s))(\w+)", 1)
$lastword = StringRegExp($sentence, "\s(\w+)$", 1)

ConsoleWrite($firstword[0] & @CRLF)
ConsoleWrite($lastword[0] & @CRLF)

$firstword = StringRegExp($sentence2, "(?:^|(?:\.\s))(\w+)", 1)
$lastword = StringRegExp($sentence2, "\s(\w+)$", 1)

ConsoleWrite($firstword[0] & @CRLF)
ConsoleWrite($lastword[0] & @CRLF)

Sources: http://lmgtfy.com/?q=regex+first+word+in+sentence    and     http://lmgtfy.com/?q=regex+last+word+in+sentence

Edited by mpower

Share this post


Link to post
Share on other sites

$sentence = "For $i = 0 to 15"
$sentence2 = "If Apple = 15 Then"

msgbox( 0 , '' , _firstlast($sentence))
msgbox( 0 , '' , _firstlast($sentence2))



Func _firstlast($string)

$aString = stringsplit($string , " ")
return $aString[1] & @CRLF & $aString[ubound($aString) - 1]

EndFunc


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

I have googled a lot and tried something with the RegExp_GUI. Thanks for AutoIt for giving me a nice program for testing regular expresions.

@Water, In this case a word which separated with space only.

@boththose and @mpower, thanks. Let me try.


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

#Include <Array.au3>

$sentence2 = "If Apple = 15 Then"

$words = StringRegExp($sentence2, "(?m)\W*(\w+).*?(\w+)\W*$", 1)
_ArrayDisplay($words)

Here is a small explanation :

(?m) : multiline mode. With this mode, ^ and $ operates on each line instead of the whole string

W* : any non-word character, 0 or more times

(w+) : capturing group. any word character, one or more times

.*? : anything, one or more times. ? takes the smallest occurence

$ : end of line or end of string

Edited by jguinch
1 person likes this

Share this post


Link to post
Share on other sites

@

jguinch, Thanks. There is more than one anser which needs to mark solved. What to do.

My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites

Mark the one which is easiest for you to underrstand.

1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

@water,

Actually regular expression code is always horrible for me. So there is no easiest thing. I have googled a lot for what this "?" stands for . I didn't get any proper answer. So i have downloaded O'reilly's Regular expression Nutshell. And started reading. 


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

If you feel bad with regex, why don't you use the code from boththose ?

It works nice - if you just add a StringStripWS

$sentence = "    If Apple = 15 Then  "

msgbox( 0 , "" , _firstlast($sentence))

Func _firstlast($string)
  $aString = stringsplit(StringStripWS($string, 3) , " ")
  return $aString[1] & @CRLF & $aString[$aString[0]]
EndFunc

Edit

BTW in the case below all the codes on this page will fail  :)

$sentence = "   If Apple = 15 Then   ; this is a condition "
Edited by mikell
1 person likes this

Share this post


Link to post
Share on other sites

@

mikell, I think RegExp code works faster. And i had a wish to learn regular expression. May be it is harder. But i need to learn it. 

My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites

So...

#Include <Array.au3>

$sentence = "   If Apple = 15 Then   ; this is a condition "

$words = StringRegExp($sentence, "\s*(\w+)[^;]+\s(\w+);?.*", 3)
_ArrayDisplay($words)

:)

1 person likes this

Share this post


Link to post
Share on other sites

@

mikell, So either Regular expression is for humans or you are an alien.  :)

My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites

Not an alien (so thinks my wife) but I've been a regex student for a long time - and I still am, and always will be   :D

1 person likes this

Share this post


Link to post
Share on other sites

@mikell Then please suggest me a good and free e-book or pdf book to learn regular expression. I am complete beginner.


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites

kcvinu,

I always recommend this site. But the learning curve is steep and stays that way - when mikell says "I've been a regex student for a long time " he really means it! :D

M23

1 person likes this

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

@Melba23 , Thank you. I know that. 


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites

I have 2 favorite sites : the one Melba mentioned, and this other one http://www.rexegg.com/

On both sites explanations are clear and understandable - very important condition when dealing with regex  :)

1 person likes this

Share this post


Link to post
Share on other sites

@Mikell, Thanks. :)


My Contributions

UDF Link Viewer   --- A tool to visit the links of some most important UDFs 

 Includer_2  ----- A tool to type the #include statement automatically 

 Digits To Date  ----- date from 3 integer values

PrintList ----- prints arrays into console for testing.

 Alert  ------ An alternative for MsgBox 

 MousePosition ------- A simple tooltip display of mouse position

GRM Helper -------- A littile tool to help writing code with GUIRegisterMsg function

Access_UDF  -------- An UDF for working with access database files. (.*accdb only)

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • Robinson1
      By Robinson1
      Well the plan is to use the power of regular expressions engine of AutoIT for patching binary data.
      Something like this: StringRegExp( $BinaryData,  "(?s)\x55\x8B.."
       
      <cut> ... Okay straight to question/problem
      ... certain bytes that are in the range from 0x80 to 0xA0 won't match.
      Hmm seem to be a char encoding problem. In detail these are 27 chars: 0x80, 0x82~8C, 0x8E, 0x91~9C, 0x9E,0x9F
      Here's a small code snippet to explore / explain this problem:
      #include "StringConstants.au3" $TestData = BinaryToString("0x7E7F808182") ;Okay $match = StringRegExp( $TestData ,'\x7E' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Okay $match = StringRegExp( $TestData ,'\x7F' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Error no match $match = StringRegExp( $TestData ,'\x80' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Okay $match = StringRegExp( $TestData ,'\x81' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Error no match $match = StringRegExp( $TestData ,'\x82' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;~ output: ;~ @extended = 2 $match = ;~ @extended = 3 $match = ;~ @extended = 0 $match = 1 ;~ @extended = 5 $match = ;~ @extended = 0 $match = 1 Hmm what to do? Go back and use the 'numberstring monster' implementation or just omit that range of 'unsafe bytes'. What is the root of this problem?
      Any idea how to fix this?
       
      Update: Okay I know a byte is not a character.
      But StringRegExp operates on String and so character level.
      Okay as long as you stay at Ansi encoding and only use /x00 - /X7F in the search pattern using  StringRegExp works well to search for binary data.
      What bytes can be matched that are in the range from /X7F - /xFF is also depending on the code page.
      So this avoid to search for bytes in the range from 0x80-0xa0 only applies to Germany.
      I just change this country setting:

      to Thai and now near all bytes from /X7F - /xFF fails to match.
    • RichardL
      By RichardL
      Text in a file, read into var with fileread:
      <> <> <> <> < J please look > <> <> <> Hi, 
      I want  a RegExp to select around 'please', back to the previous < and forward to the next >.  I can select the line of text.  Then I add in (?s) and it selects the whole text.  I think I want to make it not greedy, (?U) , that seems to make it ungreedy after, but it still selects all the previous lines.
      $sPattern = "(?s)<.*please.*>" ; 1 $sPattern = "(?s)<(?U).*please.*>" ; 2 $sPattern = "(?s)<(?U).*please(?U).*>" ; 3 $sAry = StringRegExp($sHTML, $sPattern, 3)  
    • JohnNash
      By JohnNash
      I want to rename every new instance of notepad to notepad(random number)
      If I use WinSetTitle ( "notepad", "", "notepad("&$randomnumber&")" )
      this will work pretty good, because if more windows match the search entry it will take the newest. But what if this code runs, but there is no new instance of notepad. It will rename one that was already assigned a number. So I would like to check whether it is already renamed. For example by excluding titles that contain a ")".
      How do I do that. 
      Read this, but that is pretty confusing: http://stackoverflow.com/questions/406230/regular-expression-to-match-line-that-doesnt-contain-a-word?rq=1
    • Mingre
      By Mingre
      #include <Array.au3> ; Script Start - Add your code below here Local $test = "<li>One<li>Inner<li>Innermost</li></li></li>" & _ "<li>Two</li> " $loob = StringRegExp($test, '\Q<li>\E(.*?)\Q</li>\E', 3) _ArrayDisplay($loob, "How to return the One... and Two?") Hello, can somebody help me:
      (1) How can I have the regexp matched the two outermost bullets? Such that:
       
      (2) How can I match the "Innermost" bullet?
      Thanks so much.
    • mLipok
      By mLipok
      #include <Array.au3> If @Compiled Then Exit Global Enum $FUNC_OUTER, $FUNC_NAME, $FUNC_PARAM, $FUNC_INNER _Example() Func _Example() Local $sIncludeDir = StringTrimRight(@AutoItExe, StringLen('AutoIt3.exe')) & 'Include\' Local $aOuterArray = _GetFunctionsToArray($sIncludeDir & 'Color.au3') If Not @error Then For $iOuter_idx = 0 To UBound($aOuterArray) - 1 _ArrayDisplay($aOuterArray[$iOuter_idx], ($aOuterArray[$iOuter_idx])[$FUNC_NAME]) Next EndIf EndFunc ;==>_Example Func _GetFunctionsToArray($sUDF_FileFullPath) Local $sUDFContent = FileRead($sUDF_FileFullPath) Local $aResult = StringRegExp($sUDFContent, '(?is)\RFunc (.*?)\((.*?)\)\v\R(.*?)\REndFunc', $STR_REGEXPARRAYGLOBALFULLMATCH) Return SetError(@error, @extended, $aResult) EndFunc ;==>_GetFunctionsToArray