Jump to content

What functions can help me pull strings out of a parapragh?


Recommended Posts

Thanks so much Mikell!  I have to say, the times I've lurked here and posted from years back, I'm always amazed by how great this community is.  Mucho thanks.

One thing though is that I must be screwing something up.  I ran the code and the msgbox comes back blank.  I added the filewrite line (and made a results.txt file) and it comes up empty.  I copied and pasted the code twice now and not seeing where I am going wrong.  I named the file Mikell.au3, made sure test.txt is the file I uploaded here (downloaded it to double check). Could it be I'm running a different version of autoit?  Am I missing some #include?  I made sure the Mikell.au3, test.txt, and results.txt are all in the same directory.  But since the msgbox comes up blank I think something else is going on.

#include <array.au3>
#include <Misc.au3>

$sText = FileRead("test.txt")
$a = StringRegExp($sText, 'EDT\s+(.+?)\s{2,}?\R', 3)
_ArrayDisplay($a)
$res = ""
For $i = 0 to UBound($a)-1
   If StringStripWS($a[$i], 3) = "" Then ContinueLoop   ; this excludes empty lines
   $tmp = StringRegExp($a[$i], '([^\s,]+)', 3)
   For $j = 0 to UBound($tmp)-1
       $res &= $tmp[$j] & @crlf
   Next
Next
Msgbox(0,"", $res)
FileWrite("results.txt", $res)
Edited by handofthrawn
Link to comment
Share on other sites

I installed the latest version of Autoit and Scite4autoit and bam it works!

I could kiss you right now, this is amazing.  Thank you so much for the help!!!! 

Anything I could do to pay this favor forward a little (never could return in full).   Help in the general area with general questions?  Donation to the site? 

Link to comment
Share on other sites

Mikell

I noticed your script of post#20 returns 933 items from "Text.txt" file.
My edited script of post#13 returns 1035 items.

Looking at the third captured item from "Text.txt" file, "BONT" is followed by one tab.
Your RegExp, 'EDTs+(.+?)s{2,}?R', contains "s{2,}?", which will not match one whitespace character.

Link to comment
Share on other sites

Wow how could I skip that ?  :>

Thanks for these remarks, post #20 edited with correction of the main regex

$a = StringRegExp($sText, 'EDT\h+([$.A-Z,' & Chr(32) & ']+)', 3)

handofthrawn I'm sooorry, please use Malkey's code or my corrected one in post #20  :sweating:

Edited by mikell
Link to comment
Share on other sites

Well, I searched for a one liner solution but did not succeed. :(

Here my solution, which seems to be working for your test.txt file:
 

#include <Array.au3>

Global $sExtracted, $aTokens, $sLine
$aTokens = StringRegExp(FileRead(@ScriptDir & "\Test.txt"), "(?i).*EDT" & Chr(09) & "{1,}(.+?)" & Chr(09) & "{1,}.*", 3)
For $i = 0 To UBound($aTokens) - 1
    $sLine = StringStripWS($aTokens[$i], 7)
    $sExtracted &= $sLine <> "" ? $sLine & @CRLF : ""
Next
ConsoleWrite($sExtracted & @CRLF)

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

  • 2 weeks later...

I am revisiting this because I found an error in my current script.   I was trying to remove duplicates but I ran into issues when a word has "." in it or I have a single letter word.

The line I am trying to fix is:

dim $pattern = "(bw+ B)"

This is supposed to grab a word but when I have a word like RDS.A (in my FlyNews.txt file), it separates it into two.  When I tried to use this line of code to fix it

dim $pattern = "(bw.+ B)"

That caused me to lose single letter words like B.

If anyone has any suggestions I would greatly appreciate it.  Thanks.

#include <array.au3>
#include <Misc.au3>

$sText = FileRead("FlyNews.txt")
$a = StringRegExp($sText, 'EDTh+([$.A-Z,' & Chr(32) & ']+)', 3)
$res = ""
For $i = 0 to UBound($a)-1
   If StringStripWS($a[$i], 3) = "" Then ContinueLoop   ; this excludes empty lines
   $tmp = StringRegExp($a[$i], '([^s,]+)', 3)

   For $j = 0 to UBound($tmp)-1
       $res &= $tmp[$j] & @crlf
   Next
Next

$emptyfile = FileOpen("results.txt", 2)
FileClose("results.txt")
FileWrite("results.txt", $res)


$string = FileRead("results.txt")
$emptyfile = FileOpen("results.txt", 2)
FileClose("results.txt")
dim $pattern = "(bw+ B)"
dim $return = StringRegExp($string, $pattern, 3)
dim $obj = ObjCreate("System.Collections.ArrayList")
For $i = 0 To UBound($return) -1
    If Not $obj.Contains($return[$i]) Then $obj.Add($return[$i])
Next
dim $cleared = ''
For $word In $obj
    $cleared &= $word & @crlf
Next
FileWrite("results.txt", $cleared)

 

p.s. UEZ, I tried your code but its throwing me an error syntax on that $ssLine where the "? is at. 

FlyNews.txt

results.txt

Link to comment
Share on other sites

Here is the solution of my post #7 again.

I added the conversion of tabs to spaces.

#include <array.au3>
$sText = FileRead(@ScriptFullPath & ".FlyNews.txt")
MsgBox(0, "Source", $sText)
$a1 = StringSplit(StringReplace(StringStripWS(StringReplace($sText,@TAB," "), 4), ", ", ",,"), "EDT ", 3)
_ArrayDelete($a1, 0) ; first entry is before EDT
$words = ""
For $i = 0 To UBound($a1) - 1
    $a2 = StringSplit($a1[$i], " ")
    $words &= StringReplace($a2[1], ",", " ") & " "
Next
$a2 = StringSplit(StringStripWS($words, 7), " ", 2)
_ArrayDisplay($a2, "Solution of EXIT")

Result:

Row|Col 0
[0]|STWD
[1]|RDS.A
[2]|COP
[3]|CVX
[4]|B
[5]|TOT
[6]|BP
[7]|theflyonthewall.com:
[8]|STWD
[9]|WAG

Is there anything wrong with the result?

App: Au3toCmd              UDF: _SingleScript()                             

Link to comment
Share on other sites

Exit, that works but I'm screwing up your result because I need the format slightly changed.  I'm trying to get the result into notepad without the [0], [1]. [2]. so it looks just like this:

STWD

RDS.A

COP

CVX

B

...

And I'm also trying to remove the duplicates (exactly like the excel duplicate removal feature where it removes the duplicates, keeps the unique words, and sorts it out so there are no gaps in between).  My code above got close to achieving this but it fell flat on words with a "."

I've tried to copy some for loops and use arrayunique but I'm messing up slightly and just can't nail the result.

Link to comment
Share on other sites

handofthrawn,

With this script you don't really need a 2nd treatment to remove duplicates, just manage to build the file avoiding duplicates :)

#include <array.au3>
#include <Misc.au3>

$sText = FileRead("FlyNews.txt")
$a = StringRegExp($sText, 'EDT\h+([$.A-Z,' & Chr(32) & ']+)', 3)
$res = ""
For $i = 0 to UBound($a)-1
   If StringStripWS($a[$i], 3) = "" Then ContinueLoop   ; this excludes empty lines
   $tmp = StringRegExp($a[$i], '([^\s,]+)', 3)

   For $j = 0 to UBound($tmp)-1
       If NOT StringInStr($res, $tmp[$j] & @crlf) Then  ; this avoids duplicates
           $res &= $tmp[$j] & @crlf
           Msgbox(0, "", $res)   ; use this to check how $res is built
       EndIf
   Next
Next

FileOpen("results.txt", 2)
FileWrite("results.txt", $res)
FileClose("results.txt")
Link to comment
Share on other sites

Exit, that works but I'm screwing up your result because I need the format slightly changed.  I'm trying to get the result into notepad without the [0], [1]. [2]. so it looks just like this:

STWD

RDS.A

COP

CVX

B

...

And I'm also trying to remove the duplicates (exactly like the excel duplicate removal feature where it removes the duplicates, keeps the unique words, and sorts it out so there are no gaps in between).  My code above got close to achieving this but it fell flat on words with a "."

I've tried to copy some for loops and use arrayunique but I'm messing up slightly and just can't nail the result.

 

Better?

#include <array.au3>
#include <file.au3>
$sText = FileRead(@ScriptFullPath & ".FlyNews.txt")
$sText &= $sText  ; just to force duplicates
;~ MsgBox(0, "Source", $sText)
$a1 = StringSplit(StringReplace(StringStripWS(StringReplace($sText, @TAB, " "), 4), ", ", ",,"), "EDT ", 3)
_ArrayDelete($a1, 0) ; first entry is before EDT
$words = ""
For $i = 0 To UBound($a1) - 1
    $a2 = StringSplit($a1[$i], " ")
    $words &= StringReplace($a2[1], ",", " ") & " "
Next
$a2 = StringSplit(StringStripWS($words, 7), " ", 2)
$a2 = _ArrayUnique($a2)
_ArrayDelete($a2, 0)
;~ _ArrayDisplay($a2, "Solution of EXIT")
_FileWriteFromArray(@ScriptFullPath & ".Result.txt",$a2)
ShellExecute(@ScriptFullPath & ".Result.txt")

App: Au3toCmd              UDF: _SingleScript()                             

Link to comment
Share on other sites

Exit, that did it! Thanks so much. 

Mikell, thanks for the tip.  I knew there was a one line solution out there to remove duplicates but I would have never guessed to use a NOT with Stringinstr.  This stuff is slowly turning from gibberish to code I can read and for that I thank you.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...