Jump to content
vinnyMS

extract a sentence containing word in text file

Recommended Posts

i need a script that can extract a sentence containing a word written in a list. the result is a text file with sentences extracted with a period as a sentence end limit. after and before a period is the extracted sentence.

word list text file:

word 1

word 2

word 3

 

extracted sentence written in "sentence" text file:

this is word 1 sentence.

this is word 2 sentence.

this is word 3 sentence.

Share this post


Link to post
Share on other sites

Just to understand better : Is this what you want ?

Sourcetext :

Sentence 1 without the searched term.Sentence 2 is word 1 sentence.Sentence 3 without the searched term.Sentence 4 without the searched term.Sentence 5 is word 2 sentence.Sentence 6 is word 3 sentence.Sentence 7 without the searched term.

Word list (as textfile) : word 1 , word 2 , word 3

Resulttext :

Sentence 2 is word 1 sentence.Sentence 5 is word 2 sentence.Sentence 6 is word 3 sentence.

By the way: It would be helpful if you could provide a source and the word list as text files. Only a few helpers have time and passion to create the files themselves ;).


Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Share this post


Link to post
Share on other sites
Posted (edited)
55 minutes ago, Musashi said:

Only a few helpers have time and passion to create the files themselves

... but some are passionate guys who create something similar themselves so this allows a first try :D

#Include <Array.au3>

$p = "word 1|word 2|word 3"

$txt = " Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. "

$res = StringRegExp($txt, '(?s)\s*([^.]+\b(?|' & $p & ')\b[^.]+\.)', 3)
_ArrayDisplay($res)

Edit
Waiting now for new requirements to come ;)

Edited by mikell

Share this post


Link to post
Share on other sites
59 minutes ago, Alecsis1 said:

Btw, sorry for my bad English…

I doubt vimmy even cares about such things :)


Code hard, but don’t hard code...

Share this post


Link to post
Share on other sites
Posted (edited)
3 hours ago, JockoDundee said:

I doubt vimmy even cares about such things :)

I doubt that too :lol:.

@Alecsis1 :

As far as I have tested this on the quick, your script also delivers the desired result. However, the RegEx variant from @mikell is much shorter (as usual ;)).

BTW : I would remove the following directive :

[...]
#pragma compile(UPX, True)
[...]

AV scanners react badly on UPX compressed executables.

Use #AutoIt3Wrapper_UseUpx = N or #pragma compile(UPX, False) (which is the default) instead.

Edited by Musashi
typo

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Share this post


Link to post
Share on other sites

An hybrid solution maybe ?

#include <Constants.au3>

$p = "word 1|word 2|word 3"

$txt = "Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term."
$aSentence = StringSplit($txt, ".", $STR_NOCOUNT)
For $i = 0 to UBound($aSentence) - 2
  If StringRegExp($aSentence[$i], "\b(" & $p & ")\b") Then FileWriteLine("Result.txt", StringStripWS($aSentence[$i], $STR_STRIPLEADING+$STR_STRIPTRAILING))
Next

 

Share this post


Link to post
Share on other sites
4 hours ago, Musashi said:

delivers the desired result

I confess I omitted some details because it sounded a bit like spoon-feeding  :rolleyes:

#Include <Array.au3>

#cs
1.txt :
word 1
word 2
word 3
#ce

$p = StringReplace(StringStripWS(FileRead("1.txt"), 3), @crlf, "|")
;$p = "word 1|word 2|word 3"

#cs
2.txt :
 Sentence 1 without the searched term. Sentence 2 is word 1 sentence. 
Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. 
Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. 
#ce

$txt = FileRead("2.txt")
;$txt = " Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. "

$res = StringRegExp($txt, '(?s)\s*([^.]+\b(?|' & $p & ')\b[^.]+\.)', 3)
;_ArrayDisplay($res)
FileWrite("result.txt", _ArrayToString($res, @crlf))

 

Share this post


Link to post
Share on other sites
Posted (edited)
On 4/19/2021 at 9:49 AM, mikell said:

I confess I omitted some details because it sounded a bit like spoon-feeding  :rolleyes:

#Include <Array.au3>

#cs
1.txt :
word 1
word 2
word 3
#ce

$p = StringReplace(StringStripWS(FileRead("1.txt"), 3), @crlf, "|")
;$p = "word 1|word 2|word 3"

#cs
2.txt :
 Sentence 1 without the searched term. Sentence 2 is word 1 sentence. 
Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. 
Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. 
#ce

$txt = FileRead("2.txt")
;$txt = " Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. "

$res = StringRegExp($txt, '(?s)\s*([^.]+\b(?|' & $p & ')\b[^.]+\.)', 3)
;_ArrayDisplay($res)
FileWrite("result.txt", _ArrayToString($res, @crlf))

 

thank you it works, except it adds the text file 1 words in the end of result.txt

 

Edited by vinnyMS

Share this post


Link to post
Share on other sites
Posted (edited)

This ?

#include <Constants.au3>

$p = "\Q" & StringReplace(StringStripWS(FileRead("1.txt"), 3), @CRLF, "\E|\Q") & "\E"
$NUMBER_OF_LINES = 3

$txt = "Sentence 1 without the searched term? Sentence 2 is (TCP/IP) sentence. " & @crlf & "Sentence 3 without the searched term! Sentence 4 without the searched term? Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term ?"
;$txt = FileRead("2.txt")

$aSentence = StringSplit($txt, ".?!", $STR_NOCOUNT)
For $i = 0 to $NUMBER_OF_LINES - 1
  If StringRegExp($aSentence[$i], "\W" & $p & "\W") Then FileWriteLine("Result.txt", StringStripWS($aSentence[$i], $STR_STRIPLEADING+$STR_STRIPTRAILING))
Next

 

Edited by Nine

Share this post


Link to post
Share on other sites

this extracts a sentence to the period, removes the period and extracts the next sentence that does have the word in it then saves a s result with all the sentences extracted also what i don't need.

2.txt

Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications. The answer to the question What is a protocol? must begin with the question What is a network? 
Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications. The answer to the question What is a protocol? must begin with the question What is a network? 
Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications. The answer to the question What is a protocol? must begin with the question What is a network? 


result.txt

 

Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications
The answer to the question What is a protocol? must begin with the question What is a network? 
Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications
The answer to the question What is a protocol? must begin with the question What is a network? 
Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications
 

Share this post


Link to post
Share on other sites

I think I gave you enough tools to work with (as well as the others).  Adjust the code to fit your needs now.

Share this post


Link to post
Share on other sites
3 hours ago, Nine said:

An hybrid solution maybe ?

Did you use “An” because H is silent in French? :)


Code hard, but don’t hard code...

Share this post


Link to post
Share on other sites
38 minutes ago, JockoDundee said:

Did you use “An” because H is silent in French?

So you are telling I should have used a hybrid ? :ermm:

Share this post


Link to post
Share on other sites
3 minutes ago, Nine said:

So you are telling I should have used a hybrid

Yes.  We say “An hour”, but “A history”.  Or “An unsigned integer”, but “A Ulimit”.

It depends on whether there is a consonant sound that starts the word after the a or not.

 


Code hard, but don’t hard code...

Share this post


Link to post
Share on other sites

Ahhh.  Always had a hard time with languages.  One of my prof told me once, that I speak better Fortran that I speak french. :lol:

Share this post


Link to post
Share on other sites
25 minutes ago, Nine said:

Always had a hard time with languages. 

No, you’re actually correct.  Because of your Demain comme jamais tag, whenever I read your posts, I can’t help but hear them (in my mind’s ear) in a thick French accent.

So I heard “An eye-brid solution”, which is perfect.


Code hard, but don’t hard code...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...