Jump to content

extract a sentence containing word in text file


Recommended Posts

Link to comment
Share on other sites

Local $sentences=".!? " & StringRegExpReplace(FileRead("2.txt"), "([\!\.\?])", "$1.!?")
$sentences=StringRegExpReplace($sentences, "([\n][\n])", ".!?")
FileDelete("results.txt")
For $word In StringSplit(FileRead("1.txt"), @CRLF, 3)
   FileWriteLine("results.txt",@CRLF & "WORD: "& $word)
   $iEPos=1
   For $n = 1 to 3
      $iWPos=StringInStr($sentences, $word, 0, 1, $iEpos)
      If $iWPos=0 Then ExitLoop
      $iSPos=StringInStr($sentences, ".!?", 0, -1, $iwPos)
      $iEPos=StringInStr($sentences, ".!?", 0, 1, $iwPos)
      $sentence=StringStripWS(StringMid($sentences, $iSPos + 3, $iEPos - $iSpos - 3),3)
      FileWriteLine("results.txt", $sentence)
   Next
Next

 

Code hard, but don’t hard code...

Link to comment
Share on other sites

10 hours ago, JockoDundee said:
10 hours ago, Nine said:

Never underestimate a MVP that speak french at 6h30 pm...

Thank you for translating from your native time for me...

[Completely off topic : Question to native speakers]

Could it not also be pronounced or written "Never underestimate an MVP that speak french at 6h30 pm... " ? (I'm pretty sure, I've already heard it this way from American sports commentators (m/f/d))

Is this possibly one of those "hybrid things", that Jocko is so in love with ;)?

Musashi-C64.png

"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move."

Link to comment
Share on other sites

On 4/29/2021 at 7:19 PM, JockoDundee said:
Local $sentences=".!? " & StringRegExpReplace(FileRead("2.txt"), "([\!\.\?])", "$1.!?")
$sentences=StringRegExpReplace($sentences, "([\n][\n])", ".!?")
FileDelete("results.txt")
For $word In StringSplit(FileRead("1.txt"), @CRLF, 3)
   FileWriteLine("results.txt",@CRLF & "WORD: "& $word)
   $iEPos=1
   For $n = 1 to 3
      $iWPos=StringInStr($sentences, $word, 0, 1, $iEpos)
      If $iWPos=0 Then ExitLoop
      $iSPos=StringInStr($sentences, ".!?", 0, -1, $iwPos)
      $iEPos=StringInStr($sentences, ".!?", 0, 1, $iwPos)
      $sentence=StringStripWS(StringMid($sentences, $iSPos + 3, $iEPos - $iSpos - 3),3)
      FileWriteLine("results.txt", $sentence)
   Next
Next

 

this works only for the previous examples i posted (text files) if 2.txt is different it doesn't give the right results.

2.txt

Untitled.png

Edited by vinnyMS
Link to comment
Share on other sites

22 minutes ago, vinnyMS said:

this works only for the previous examples i posted (text files) if 2.txt is different it doesn't give the right results.

2.txt 385 B · 2 downloads

Untitled.png

I see we have upgraded to extended ascii characters.

Post the entire 1.txt and 2.txt.   It’s just some O’Reilly book, right?

p.s. the reason that it works for “only the previous versions” that you’ve posted is because you introduce a new general case each time.  So let’s put all the cards on the table, eh?

Edited by JockoDundee

Code hard, but don’t hard code...

Link to comment
Share on other sites

Yo, vinny that file is a UniCode Zoo, with quite a few strange formatting rules.

But If you make it like your original files, no prob.

However, if you insist, here's one last go; also and btw, @mikell made a new unremunerated funny, maybe an oversight on your part?

Local $sentences=".!? " & StringRegExpReplace(FileRead("2.txt"), "([\!\.\?])", "$1.!?")
$sentences=StringReplace($sentences, @CRLF&@CRLF, ".!?" & @CRLF)
Local $sentences2=StringRegExpReplace($sentences, "([\!\.\?\n\r\,\;\:\f\(\)])", " ")
FileDelete("results.txt")
For $word In StringSplit(FileRead("1.txt"), @CRLF, 3)
   FileWriteLine("results.txt",@CRLF & "WORD: "& $word)
   $iEPos=1
   For $n = 1 to 3
      $iWPos=StringInStr($sentences2, " "&$word&" ", 0, 1, $iEpos)
      If $iWPos=0 Then ExitLoop
      $iSPos=StringInStr($sentences, ".!?", 0, -1, $iwPos)
      $iEPos=StringInStr($sentences, ".!?", 0, 1, $iwPos)
      $sentence=StringStripWS(StringMid($sentences, $iSPos + 3, $iEPos - $iSpos - 3),3)
      $sentence=StringReplace($sentence, @CRLF, "")
      FileWriteLine("results.txt", $sentence)
   Next
Next

 

Code hard, but don’t hard code...

Link to comment
Share on other sites

As long as it takes another shot.  (Might be cleaner)

#include <File.au3>

$file = '2.txt'

$word_MatchRepeat = 4

$sWords = "protocol|across|report|browser|closed"

$array = _a(FileRead($file), $sWords, $word_MatchRepeat)
If IsArray($array) Then
    _ArrayDisplay($array, "Results")
    _SaveToFile($array, "results.txt")
EndIf

Func _SaveToFile($arr, $sFileName)
    FileDelete($sFileName)
    FileWrite($sFileName, _ArrayToString($arr, Default, 1, Default, Default, 0, 0))
EndFunc   ;==>_SaveToFile

Func _a($s, $wPatt, $iWR)
    If $s = "" Then Return
    Local $aWords = StringSplit($wPatt, "|")
    Local $s1 = StringRegExpReplace(StringRegExpReplace($s, '\b[^.][^\R]\R', " "), '(\b[^.]+)\R(?=\b[^.]+\.)', "")
    Local $patt = "[^.]+(\b\Q " & StringReplace($wPatt, "|", "\E|\b\Q ") & '\E)+([.$]|[^.]+\.)(*SKIP)(*F)|[^.]+'

    $s1 = StringRegExpReplace($s1, $patt, "")
    Local $a = StringRegExp($s1, '\b[^.]+[.$]', 3)

    _ArrayColInsert($a, 1)
    ReDim $a[UBound($a)][$aWords[0] + 2]

    For $i = 1 To $aWords[0]
        For $j = 0 To UBound($a) - 1
            If StringInStr($a[$j][0], " " & $aWords[$i]) Then
                $a[$j][$i] = $aWords[$i]
                $a[$j][UBound($a, 2) - 1] += 1
            EndIf
        Next
    Next
    _ArraySort($a, 0, Default, Default, UBound($a, 2) - 1)
    _ArrayColInsert($aWords, 1)

    Local $aNew[1][2], $iRandom
    For $i = 1 To $aWords[0][0]
        $iRandom = Random(UBound($a) - 1 > 9 ? 10 : 1, UBound($a) - 1, 1)
        Do
            If $aWords[$i][1] >= $iWR Then ExitLoop
            $index = _ArraySearch($a, "\w+", $iRandom, Default, Default, 3, Default, $i)
            If $index < 0 Then
                If $iRandom > 1 Then
                    $iRandom -= Random(1, 4, 1)
                    ContinueLoop
                Else
                    ExitLoop
                EndIf
            EndIf

            _ArrayAdd($aNew, StringRegExpReplace($a[$index][0], "\R", "") & "|" & $aWords[$i][0], Default, "|", @CRLF)
            For $j = 1 To $aWords[0][0]
                If $j <> $i Then
                    If $a[$index][$j] Then
                        $aWords[$j][1] += 1
                        ;Delete additional repetition or do comment this section to retain the repetition.
                        ;---------------------------------------------------
                        If $aWords[$j][1] >= $iWR Then
                            Local $1 = _ArraySearch($aNew, $aWords[$j][0], 0, Default, 1, Default, Default, 1)
                            If $1 >= 0 Then
                                _ArrayDelete($aNew, $1)
                            EndIf
                        EndIf
                        ; --------------------------------------------------
                        $aNew[UBound($aNew) - 1][1] &= ", " & $aWords[$j][0]
                    EndIf
                Else
                    $aWords[$i][1] += 1
                EndIf
                $a[$index][$j] = ""
            Next
        Until @error
    Next
    $aNew[0][0] = UBound($aNew) - 1
    Return $aNew
EndFunc   ;==>_a

 

Edited by Deye
CleanUp
Link to comment
Share on other sites

7 hours ago, JockoDundee said:

Oh no, @Deye, now he can cross-check :)

has the counter-control already advanced ?

Updated the above as it would incorrectly get "enclosed" instances where it should simply be "closed"
 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...