Sign in to follow this  
Followers 0
Knight

Clean up Text File

7 posts in this topic

I have a text file that is around 20,000 lines long and has alot of symbols and I want to remove every character that is not alpha-numeric or standard punctuation. I have tried different approaches and it always messes up. Any ideas or help would be greatly appreciated.

-k

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I have a text file that is around 20,000 lines long and has alot of symbols and I want to remove every character that is not alpha-numeric or standard punctuation. I have tried different approaches and it always messes up. Any ideas or help would be greatly appreciated.

-k

<{POST_SNAPBACK}>

1. determine ascii desired codes.
2. look at manual to determine how to find ascii value of code.
3. read file.
4. parse file 1 char at a time.
5. if char > lowest -1  and < greatest + 1 then writegoodchar to file

repeat while there are still chars to process.

refer to manual for instructions having to do w/ 

fileopen, filewrite, fileread,  select ..case..endselect

The rest is left as an exercise for the student... :)

Edited by flyingboz

Reading the help file before you post... Not only will it make you look smarter, it will make you smarter.

Share this post


Link to post
Share on other sites

$file1 = fileopen('mytext.txt', 0)
$file2 = fileopen('mynew.txt', 2)
$text = fileread($file1, 1)
If @error Then Exit
If StringIsAlNum($text) Then FileWrite($file2, $text)

Also, if you're going to use that, make sure to add the OnAutoItExit Func to close the files :)


Writing AutoIt scripts since
_DateAdd("d", -2, _NowCalcDate())

Share this post


Link to post
Share on other sites

wow is it really that simple, I was way off.. Thanks MSL.

@flying: yea I know all those, but I still couldnt get it working.

Share this post


Link to post
Share on other sites

$file1 = fileopen('mytext.txt', 0)
$file2 = fileopen('mynew.txt', 2)
$text = fileread($file1, 1)
If @error Then Exit
If StringIsAlNum($text) Then FileWrite($file2, $text)

Also, if you're going to use that, make sure to add the OnAutoItExit Func to close the files :)

<{POST_SNAPBACK}>

Will you lose end-of-line characters this way? I haven't tested, sso I'm not sure... If so, you may need to FileReadLine, parse the line character by character and then FileWriteLine...

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

eh, its not working at all. I modified it trying to fix it and nothing yet.

Share this post


Link to post
Share on other sites

Look under regular expressions:

MsgBox(0, "Regular Exp[b][/b]ression Replace Test", _
StringRegExpReplace("Where have all the flowers gone, _
long time passing? E#a%t^ m(y) {sh}o=rt\s", "[^a-z|A-Z| |,|.|1234567890]", ""))

AutoIt3, the MACGYVER Pocket Knife for computers.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0