handofthrawn Posted July 27, 2014 Author Posted July 27, 2014 (edited) Thanks so much Mikell! I have to say, the times I've lurked here and posted from years back, I'm always amazed by how great this community is. Mucho thanks. One thing though is that I must be screwing something up. I ran the code and the msgbox comes back blank. I added the filewrite line (and made a results.txt file) and it comes up empty. I copied and pasted the code twice now and not seeing where I am going wrong. I named the file Mikell.au3, made sure test.txt is the file I uploaded here (downloaded it to double check). Could it be I'm running a different version of autoit? Am I missing some #include? I made sure the Mikell.au3, test.txt, and results.txt are all in the same directory. But since the msgbox comes up blank I think something else is going on. #include <array.au3> #include <Misc.au3> $sText = FileRead("test.txt") $a = StringRegExp($sText, 'EDT\s+(.+?)\s{2,}?\R', 3) _ArrayDisplay($a) $res = "" For $i = 0 to UBound($a)-1 If StringStripWS($a[$i], 3) = "" Then ContinueLoop ; this excludes empty lines $tmp = StringRegExp($a[$i], '([^\s,]+)', 3) For $j = 0 to UBound($tmp)-1 $res &= $tmp[$j] & @crlf Next Next Msgbox(0,"", $res) FileWrite("results.txt", $res) Edited July 27, 2014 by handofthrawn
mikell Posted July 27, 2014 Posted July 27, 2014 Does the _ArrayDisplay($a) display the array correctly ?
handofthrawn Posted July 27, 2014 Author Posted July 27, 2014 I installed the latest version of Autoit and Scite4autoit and bam it works! I could kiss you right now, this is amazing. Thank you so much for the help!!!! Anything I could do to pay this favor forward a little (never could return in full). Help in the general area with general questions? Donation to the site?
Malkey Posted July 28, 2014 Posted July 28, 2014 Mikell I noticed your script of post#20 returns 933 items from "Text.txt" file. My edited script of post#13 returns 1035 items. Looking at the third captured item from "Text.txt" file, "BONT" is followed by one tab. Your RegExp, 'EDTs+(.+?)s{2,}?R', contains "s{2,}?", which will not match one whitespace character.
mikell Posted July 28, 2014 Posted July 28, 2014 (edited) Wow how could I skip that ? Thanks for these remarks, post #20 edited with correction of the main regex $a = StringRegExp($sText, 'EDT\h+([$.A-Z,' & Chr(32) & ']+)', 3) handofthrawn I'm sooorry, please use Malkey's code or my corrected one in post #20 Edited July 28, 2014 by mikell
handofthrawn Posted July 28, 2014 Author Posted July 28, 2014 Malkey, thanks for this. I was planning on doing the manual writeup this morning and double checking it with the script's output. Looks like you beat me to it. Thanks for noticing.
UEZ Posted July 28, 2014 Posted July 28, 2014 (edited) Well, I searched for a one liner solution but did not succeed. Here my solution, which seems to be working for your test.txt file: #include <Array.au3> Global $sExtracted, $aTokens, $sLine $aTokens = StringRegExp(FileRead(@ScriptDir & "\Test.txt"), "(?i).*EDT" & Chr(09) & "{1,}(.+?)" & Chr(09) & "{1,}.*", 3) For $i = 0 To UBound($aTokens) - 1 $sLine = StringStripWS($aTokens[$i], 7) $sExtracted &= $sLine <> "" ? $sLine & @CRLF : "" Next ConsoleWrite($sExtracted & @CRLF) Br, UEZ Edited July 28, 2014 by UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ
handofthrawn Posted August 6, 2014 Author Posted August 6, 2014 I am revisiting this because I found an error in my current script. I was trying to remove duplicates but I ran into issues when a word has "." in it or I have a single letter word. The line I am trying to fix is: dim $pattern = "(bw+ B)" This is supposed to grab a word but when I have a word like RDS.A (in my FlyNews.txt file), it separates it into two. When I tried to use this line of code to fix it dim $pattern = "(bw.+ B)" That caused me to lose single letter words like B. If anyone has any suggestions I would greatly appreciate it. Thanks. #include <array.au3> #include <Misc.au3> $sText = FileRead("FlyNews.txt") $a = StringRegExp($sText, 'EDTh+([$.A-Z,' & Chr(32) & ']+)', 3) $res = "" For $i = 0 to UBound($a)-1 If StringStripWS($a[$i], 3) = "" Then ContinueLoop ; this excludes empty lines $tmp = StringRegExp($a[$i], '([^s,]+)', 3) For $j = 0 to UBound($tmp)-1 $res &= $tmp[$j] & @crlf Next Next $emptyfile = FileOpen("results.txt", 2) FileClose("results.txt") FileWrite("results.txt", $res) $string = FileRead("results.txt") $emptyfile = FileOpen("results.txt", 2) FileClose("results.txt") dim $pattern = "(bw+ B)" dim $return = StringRegExp($string, $pattern, 3) dim $obj = ObjCreate("System.Collections.ArrayList") For $i = 0 To UBound($return) -1 If Not $obj.Contains($return[$i]) Then $obj.Add($return[$i]) Next dim $cleared = '' For $word In $obj $cleared &= $word & @crlf Next FileWrite("results.txt", $cleared) p.s. UEZ, I tried your code but its throwing me an error syntax on that $ssLine where the "? is at. FlyNews.txtresults.txt
Exit Posted August 6, 2014 Posted August 6, 2014 Here is the solution of my post #7 again. I added the conversion of tabs to spaces. #include <array.au3> $sText = FileRead(@ScriptFullPath & ".FlyNews.txt") MsgBox(0, "Source", $sText) $a1 = StringSplit(StringReplace(StringStripWS(StringReplace($sText,@TAB," "), 4), ", ", ",,"), "EDT ", 3) _ArrayDelete($a1, 0) ; first entry is before EDT $words = "" For $i = 0 To UBound($a1) - 1 $a2 = StringSplit($a1[$i], " ") $words &= StringReplace($a2[1], ",", " ") & " " Next $a2 = StringSplit(StringStripWS($words, 7), " ", 2) _ArrayDisplay($a2, "Solution of EXIT") Result: Row|Col 0 [0]|STWD [1]|RDS.A [2]|COP [3]|CVX [4]|B [5]|TOT [6]|BP [7]|theflyonthewall.com: [8]|STWD [9]|WAG Is there anything wrong with the result? App: Au3toCmd UDF: _SingleScript()
handofthrawn Posted August 6, 2014 Author Posted August 6, 2014 Exit, that works but I'm screwing up your result because I need the format slightly changed. I'm trying to get the result into notepad without the [0], [1]. [2]. so it looks just like this: STWD RDS.A COP CVX B ... And I'm also trying to remove the duplicates (exactly like the excel duplicate removal feature where it removes the duplicates, keeps the unique words, and sorts it out so there are no gaps in between). My code above got close to achieving this but it fell flat on words with a "." I've tried to copy some for loops and use arrayunique but I'm messing up slightly and just can't nail the result.
mikell Posted August 6, 2014 Posted August 6, 2014 handofthrawn, With this script you don't really need a 2nd treatment to remove duplicates, just manage to build the file avoiding duplicates #include <array.au3> #include <Misc.au3> $sText = FileRead("FlyNews.txt") $a = StringRegExp($sText, 'EDT\h+([$.A-Z,' & Chr(32) & ']+)', 3) $res = "" For $i = 0 to UBound($a)-1 If StringStripWS($a[$i], 3) = "" Then ContinueLoop ; this excludes empty lines $tmp = StringRegExp($a[$i], '([^\s,]+)', 3) For $j = 0 to UBound($tmp)-1 If NOT StringInStr($res, $tmp[$j] & @crlf) Then ; this avoids duplicates $res &= $tmp[$j] & @crlf Msgbox(0, "", $res) ; use this to check how $res is built EndIf Next Next FileOpen("results.txt", 2) FileWrite("results.txt", $res) FileClose("results.txt")
Exit Posted August 6, 2014 Posted August 6, 2014 Exit, that works but I'm screwing up your result because I need the format slightly changed. I'm trying to get the result into notepad without the [0], [1]. [2]. so it looks just like this: STWD RDS.A COP CVX B ... And I'm also trying to remove the duplicates (exactly like the excel duplicate removal feature where it removes the duplicates, keeps the unique words, and sorts it out so there are no gaps in between). My code above got close to achieving this but it fell flat on words with a "." I've tried to copy some for loops and use arrayunique but I'm messing up slightly and just can't nail the result. Better? #include <array.au3> #include <file.au3> $sText = FileRead(@ScriptFullPath & ".FlyNews.txt") $sText &= $sText ; just to force duplicates ;~ MsgBox(0, "Source", $sText) $a1 = StringSplit(StringReplace(StringStripWS(StringReplace($sText, @TAB, " "), 4), ", ", ",,"), "EDT ", 3) _ArrayDelete($a1, 0) ; first entry is before EDT $words = "" For $i = 0 To UBound($a1) - 1 $a2 = StringSplit($a1[$i], " ") $words &= StringReplace($a2[1], ",", " ") & " " Next $a2 = StringSplit(StringStripWS($words, 7), " ", 2) $a2 = _ArrayUnique($a2) _ArrayDelete($a2, 0) ;~ _ArrayDisplay($a2, "Solution of EXIT") _FileWriteFromArray(@ScriptFullPath & ".Result.txt",$a2) ShellExecute(@ScriptFullPath & ".Result.txt") App: Au3toCmd UDF: _SingleScript()
handofthrawn Posted August 6, 2014 Author Posted August 6, 2014 Exit, that did it! Thanks so much. Mikell, thanks for the tip. I knew there was a one line solution out there to remove duplicates but I would have never guessed to use a NOT with Stringinstr. This stuff is slowly turning from gibberish to code I can read and for that I thank you.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now