Jump to content
Sign in to follow this  
qwert

Seeking better way to confirm proper punctuation

Recommended Posts

I'm parsing text to determine if each paragraph (terminated with a CRLF) ends with a proper punctuation mark.

As I've added to the set of what's "proper", my method (parse statement) has become unwieldy (and still doesn't cover all cases):

$proper = StringInStr($paragraph, '.' & @CRLF) + StringInStr($paragraph, '?' & @CRLF) + StringInStr$paragraph, '!' & @CRLF) + StringInStr($paragraph, '"' & @CRLF) + StringInStr($paragraph, ';' & @CRLF)

Can someone suggest a better approach?

Thanks in advance for any help.

Share this post


Link to post
Share on other sites

I'm a bit confused, are you checking only the punctuation mark at the very end of the paragraph, or the punctuation at the end of each sentence within? And when you say "proper punctuation" are you saying that it just needs to be some punctuation mark? Or do you expect it to know if that mark should be a "?" or a "!"?


"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Share this post


Link to post
Share on other sites
1 minute ago, JLogan3o13 said:

only the punctuation mark at the very end of the paragraph

Yes.  And "yes", it needs to be some punctuation mark (not just a symbol like ^ for example).  There will need to be 8 or 10 in the "dictionary" of proper marks.

Share this post


Link to post
Share on other sites

@Nine: Well, that's cool: turn the algorithm around and look for a single isolated character in a string of choices.  I like it!

Thanks very much.

Share this post


Link to post
Share on other sites

As I don't exactly know the final purpose this is a simple try (for the concept)

#Include <Array.au3>

$s = "AutoIt v3 is a freeware BASIC-like scripting language designed for automating the Windows GUI and general scripting" & @crlf & _
"It uses a combination of simulated keystrokes, mouse movement and window/control manipulation in order to automate tasks in a way not possible or reliable with other languages (e.g. VBScript and SendKeys);  " & @crlf & _
"AutoIt is also very small, self-contained and will run on all versions of Windows out-of-the-box with no annoying runtimes required"

$s = StringRegExpReplace($s, '\h*\R|$', @crlf)
$res = StringRegExp($s, '([^\.\?!;"])\r', 3)

 _ArrayDisplay($res)

 

Share this post


Link to post
Share on other sites

Hi all,
I tried a regexp approach, using a "negative lookbehind" assertion to search for the character before any newline sequence (\R matches @CRLF or lone @CR or @LF) not being one of those indicated by Qwert.
I typed ERROR HERE in the Replace Pattern tab . Here are the results, where a comma ending a line is detected :

520914165_negativelookbehind.png.65b006cf36e21e2d43176b00555e1ec3.png

Thanks to our regexp gurus for commenting, if the expression can be improved :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...