Sign in to follow this  
Followers 0
Sparrowlord

Parsing File

9 posts in this topic

Hello,

I'm trying to parse through a txt file and change the contents on the inside of the file. I'm trying to make my script so it will read through a text file and remove all spaces and numbers. I'll show you an example of how the file would look below:

394 Noel Charlene

395 Sam Nadia

396 Guillermo Keisha

397 Graham Blanca

398 Weston Celeste

399 Lewis Maya

400 Blaine Marisol

401 Rodolfo Katharine

402 Howard Kourtney

403 Julius Larissa

404 Earl Anita

405 Akeem Corinne

406 Stephan Kendall

and after running my script I want it to change the file contents to something like:

Noel

Charlene

Sam

Nadia

Guillermo

Keisha

Graham

Blanca

Weston

Celeste

Lewis

Maya

Blaine

Marisol

Rodolfo

Katharine

Howard

Kourtney

Julius

Larissa

Earl

Anita

Akeem

Corinne

Stephan

Kendall

However the example isn't always going to be like that, it could be diifferent. Like I want it to have rules ( dont know if thats the correct word to use) like such:

Rules:

Single word per line

No spaces in a line

No Numbers or symbols in a line ( a-z only )

I just spent the last 6 hours messing with StringReplace , StringStripWS , StringStripCR and stuff like that and I've yet to get something worked out.

Could anyone please point me in the right direction for this script, I would really appreciate it.

Thanks.

Share this post


Link to post
Share on other sites



You could use StringSplit and then split on spaces. Then use the second array item in this case.

Or you could use StringRegExp. But I personally am still not good with them...


My active project(s): A-maze-ing generator (generates a maze)

My archived project(s): Pong3 (Multi-pinger)

Share this post


Link to post
Share on other sites

Hi,

$sFile = @ScriptDir & "\some.txt"

$FO = FileOpen($sFile, 0)
$FR = FileRead($FO)
FileClose($FO)

$aTmp = StringSplit(StringStripCR($FR), @LF)
$sTmp = ''
For $i = 1 To $aTmp[0]
    $sTmp &= StringReplace(StringTrimLeft($aTmp[$i], StringInStr($aTmp[$i], " ", 0, 1)), " ", @CRLF)  & @CRLF
Next

$FO = FileOpen($sFile, 2)
$FW = FileWrite($FO, StringTrimRight($sTmp, 2))
FileClose($FO)

Cheers

Share this post


Link to post
Share on other sites

My example does not read from a file, but just the line in a variable.

It seeks for 1 space, x number of characters and 1 space again. It does not care if there are more or less characters before of after it.

And it gives a warning when there is a special character in it. Instead of this warning you could call a function that replaces special characters with normal ones for example.

Try typing a special character between Noel. Like: Noél.

; Ofcourse declare the var's
Dim $lname, $sname

; Input, that could be read from a file, but this is just an example of how to use StringRegExp.
$lname = "394 Noel Charlene"

; Pattern = <1 space> & <unending number of whatever characters> & <1 space>
$sname = StringRegExp($lname, Chr(32) & "(.*)" & Chr(32), 1)

; ^ says if the thing between the [] isn't true
; the rest between the [] means every char from a to z and A to Z (because caps are different character numbers they have to be defined again)
; So the patten means: NOT a character between a-z AND A-Z
If StringRegExp($sname[0], "([^a-zA-Z])") = 1 Then MsgBox(48, "Warning!", "Special characters detected in:" & @CRLF & "-> " & $sname[0] & " <-")

; Yes, the StringRegExp returned an array, so you have to read the [0] of the array. (Cause there is only one match, it's 0)
MsgBox(0, "", "Outcome is: " & $sname[0])

My active project(s): A-maze-ing generator (generates a maze)

My archived project(s): Pong3 (Multi-pinger)

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Or using StringRegExpReplace() function

$sFile = @ScriptDir & "\some.txt"

$FR = FileRead($sFile)
$FO = FileOpen($sFile, 2)
$FW = FileWrite($FO, StringRegExpReplace(StringRegExpReplace($FR, '(\d' & Chr(32) & '{0,})', ""), '(' & Chr(32) & ')', @CRLF))
FileClose($FO)

Cheers

Hello,

I just tried this example and it works great but, there is a tab in front of all the words in the file now. How could the tabs be removed? I tried to tinker with this myself but to no avail, I perhaps thought since you know how this works you could help me.

Thanks.

Edit: I did some more testing and anytime that I copy something off the web from a table or has tabs / numbers in front of it and run it through this script it will output with a tab in front of the word.

When I went to http://en.wikipedia.org/wiki/Reporting and just copyed and pasted the whole page right into my txt file and ran it through the script it had comma's and other symbols along with blank lines.

I'm just trying to help you understand what I mean here.. I want to be able to just copy something off of a website and run it through the script and have the output follow the rules:

Rules:

Single word per line

No spaces in a line

No Numbers or symbols in a line ( a-z only )

At least two letters per line

So if something like :

Often implementation involves Extract, Transform and Load (See ETL) procedures into a reporting data warehouse and then use of one or more reporting tools. While reports can be distributed in print form or via email, they are typically accessed via a corporate intranet.

is ran thru the script it would output

Often

implementation

involves

Extract

Transform

and

Load

See

ETL

procedures

into

reporting

data

warehouse

and

then

use

of

one

or

more

reporting

tools

While

reports

can

be

distributed

in

print

form

or

via

email

they

are

typically

accessed

via

corporate

intranet

Edited by Sparrowlord

Share this post


Link to post
Share on other sites

I just spent all day trying to figure this out, didn't even come close :)

Share this post


Link to post
Share on other sites

I just spent all day trying to figure this out, didn't even come close :)

Post all your failed code attempts.

Post every possible string situation you could run into.

Post what you want the output to look like for each possible string situation.


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

I just spent all day trying to figure this out, didn't even come close :)

If this sloppy attempt works at all then you owe me 25 minutes ...lol

It still doesn't catch everything, but it catches a majority of the rules

$sFile = @ScriptDir & "\some.txt"

$Pat1 = '(\a|[\d]|\e|\f|\r|\t|\v|\.|\(|\)|,|>|<|\?|\\|/|"|:|;|\*|#|\^|!|\$|@|%|&|\-|=|\+|\||_|\]|\[|~|`|'')'
$Pat2 = '(\r\n\w\r\n)' 
$FR = StringRegExpReplace(StringReplace(StringStripWS(StringRegExpReplace(FileRead($sFile), $Pat1, ""), 7), " ", @CRLF), $Pat2, @CRLF) 
$FO = FileOpen($sFile, 2)
$FW = FileWrite($FO, $FR)
FileClose($FO)

The paragraph you posted earlier, I added *cough* some extra garbage to the file to see how it filters.

some.txt:

Often I im1ple..menta?tion U inv<olv2es Ext~~ract, T>rans.4for

^m a^nd Lo`~ad (Se:e E;TL) procedu.res int"o a repor"ting da.ta war?eh

ouse and a th(en us)e o6768f o```ne or mo@re repo|rt+in'g t[ool]]s. W_hi-l=e re''p,orts, ca*n be di#strib:::uted in pr;\/i'n!t fo;rm 'or vi1a em\\ail, the!!567!y are typically acc

es$sed vi089a a c%orpo#ra7te intr&an```et.

Cheers

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0