Jump to content

extract a sentence containing word in text file


Recommended Posts

i need a script that can extract a sentence containing a word written in a list. the result is a text file with sentences extracted with a period as a sentence end limit. after and before a period is the extracted sentence.

word list text file:

word 1

word 2

word 3

 

extracted sentence written in "sentence" text file:

this is word 1 sentence.

this is word 2 sentence.

this is word 3 sentence.

Link to comment
Share on other sites

Just to understand better : Is this what you want ?

Sourcetext :

Sentence 1 without the searched term.Sentence 2 is word 1 sentence.Sentence 3 without the searched term.Sentence 4 without the searched term.Sentence 5 is word 2 sentence.Sentence 6 is word 3 sentence.Sentence 7 without the searched term.

Word list (as textfile) : word 1 , word 2 , word 3

Resulttext :

Sentence 2 is word 1 sentence.Sentence 5 is word 2 sentence.Sentence 6 is word 3 sentence.

By the way: It would be helpful if you could provide a source and the word list as text files. Only a few helpers have time and passion to create the files themselves ;).

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Link to comment
Share on other sites

55 minutes ago, Musashi said:

Only a few helpers have time and passion to create the files themselves

... but some are passionate guys who create something similar themselves so this allows a first try :D

#Include <Array.au3>

$p = "word 1|word 2|word 3"

$txt = " Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. "

$res = StringRegExp($txt, '(?s)\s*([^.]+\b(?|' & $p & ')\b[^.]+\.)', 3)
_ArrayDisplay($res)

Edit
Waiting now for new requirements to come ;)

Edited by mikell
Link to comment
Share on other sites

3 hours ago, JockoDundee said:

I doubt vimmy even cares about such things :)

I doubt that too :lol:.

@Alecsis1 :

As far as I have tested this on the quick, your script also delivers the desired result. However, the RegEx variant from @mikell is much shorter (as usual ;)).

BTW : I would remove the following directive :

[...]
#pragma compile(UPX, True)
[...]

AV scanners react badly on UPX compressed executables.

Use #AutoIt3Wrapper_UseUpx = N or #pragma compile(UPX, False) (which is the default) instead.

Edited by Musashi
typo

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Link to comment
Share on other sites

An hybrid solution maybe ?

#include <Constants.au3>

$p = "word 1|word 2|word 3"

$txt = "Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term."
$aSentence = StringSplit($txt, ".", $STR_NOCOUNT)
For $i = 0 to UBound($aSentence) - 2
  If StringRegExp($aSentence[$i], "\b(" & $p & ")\b") Then FileWriteLine("Result.txt", StringStripWS($aSentence[$i], $STR_STRIPLEADING+$STR_STRIPTRAILING))
Next

 

Link to comment
Share on other sites

4 hours ago, Musashi said:

delivers the desired result

I confess I omitted some details because it sounded a bit like spoon-feeding  :rolleyes:

#Include <Array.au3>

#cs
1.txt :
word 1
word 2
word 3
#ce

$p = StringReplace(StringStripWS(FileRead("1.txt"), 3), @crlf, "|")
;$p = "word 1|word 2|word 3"

#cs
2.txt :
 Sentence 1 without the searched term. Sentence 2 is word 1 sentence. 
Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. 
Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. 
#ce

$txt = FileRead("2.txt")
;$txt = " Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. "

$res = StringRegExp($txt, '(?s)\s*([^.]+\b(?|' & $p & ')\b[^.]+\.)', 3)
;_ArrayDisplay($res)
FileWrite("result.txt", _ArrayToString($res, @crlf))

 

Link to comment
Share on other sites

On 4/19/2021 at 9:49 AM, mikell said:

I confess I omitted some details because it sounded a bit like spoon-feeding  :rolleyes:

#Include <Array.au3>

#cs
1.txt :
word 1
word 2
word 3
#ce

$p = StringReplace(StringStripWS(FileRead("1.txt"), 3), @crlf, "|")
;$p = "word 1|word 2|word 3"

#cs
2.txt :
 Sentence 1 without the searched term. Sentence 2 is word 1 sentence. 
Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. 
Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. 
#ce

$txt = FileRead("2.txt")
;$txt = " Sentence 1 without the searched term. Sentence 2 is word 1 sentence. " & @crlf & "Sentence 3 without the searched term. Sentence 4 without the searched term. Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term. "

$res = StringRegExp($txt, '(?s)\s*([^.]+\b(?|' & $p & ')\b[^.]+\.)', 3)
;_ArrayDisplay($res)
FileWrite("result.txt", _ArrayToString($res, @crlf))

 

thank you it works, except it adds the text file 1 words in the end of result.txt

 

Edited by vinnyMS
Link to comment
Share on other sites

This ?

#include <Constants.au3>

$p = "\Q" & StringReplace(StringStripWS(FileRead("1.txt"), 3), @CRLF, "\E|\Q") & "\E"
$NUMBER_OF_LINES = 3

$txt = "Sentence 1 without the searched term? Sentence 2 is (TCP/IP) sentence. " & @crlf & "Sentence 3 without the searched term! Sentence 4 without the searched term? Sentence 5, is word 2 sentence. " & @crlf & "Sentence 6 is word 3 sentence. Sentence 7 is otherword 3 sentence. Sentence 8 without the searched term ?"
;$txt = FileRead("2.txt")

$aSentence = StringSplit($txt, ".?!", $STR_NOCOUNT)
For $i = 0 to $NUMBER_OF_LINES - 1
  If StringRegExp($aSentence[$i], "\W" & $p & "\W") Then FileWriteLine("Result.txt", StringStripWS($aSentence[$i], $STR_STRIPLEADING+$STR_STRIPTRAILING))
Next

 

Edited by Nine
Link to comment
Share on other sites

this extracts a sentence to the period, removes the period and extracts the next sentence that does have the word in it then saves a s result with all the sentences extracted also what i don't need.

2.txt

Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications. The answer to the question What is a protocol? must begin with the question What is a network? 
Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications. The answer to the question What is a protocol? must begin with the question What is a network? 
Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications. The answer to the question What is a protocol? must begin with the question What is a network? 


result.txt

 

Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications
The answer to the question What is a protocol? must begin with the question What is a network? 
Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications
The answer to the question What is a protocol? must begin with the question What is a network? 
Transmission Control Protocol/Internet Protocol (TCP/IP) is a protocol system—a collection of protocols that supports network communications
 

Link to comment
Share on other sites

Link to comment
Share on other sites

3 minutes ago, Nine said:

So you are telling I should have used a hybrid

Yes.  We say “An hour”, but “A history”.  Or “An unsigned integer”, but “A Ulimit”.

It depends on whether there is a consonant sound that starts the word after the a or not.

 

Code hard, but don’t hard code...

Link to comment
Share on other sites

25 minutes ago, Nine said:

Always had a hard time with languages. 

No, you’re actually correct.  Because of your Demain comme jamais tag, whenever I read your posts, I can’t help but hear them (in my mind’s ear) in a thick French accent.

So I heard “An eye-brid solution”, which is perfect.

Code hard, but don’t hard code...

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...