Jump to content

Clean up Text File


Recommended Posts

I have a text file that is around 20,000 lines long and has alot of symbols and I want to remove every character that is not alpha-numeric or standard punctuation. I have tried different approaches and it always messes up. Any ideas or help would be greatly appreciated.

-k

Link to comment
Share on other sites

I have a text file that is around 20,000 lines long and has alot of symbols and I want to remove every character that is not alpha-numeric or standard punctuation. I have tried different approaches and it always messes up. Any ideas or help would be greatly appreciated.

-k

<{POST_SNAPBACK}>

1. determine ascii desired codes.
2. look at manual to determine how to find ascii value of code.
3. read file.
4. parse file 1 char at a time.
5. if char > lowest -1  and < greatest + 1 then writegoodchar to file

repeat while there are still chars to process.

refer to manual for instructions having to do w/ 

fileopen, filewrite, fileread,  select ..case..endselect

The rest is left as an exercise for the student... :)

Edited by flyingboz

Reading the help file before you post... Not only will it make you look smarter, it will make you smarter.

Link to comment
Share on other sites

$file1 = fileopen('mytext.txt', 0)
$file2 = fileopen('mynew.txt', 2)
$text = fileread($file1, 1)
If @error Then Exit
If StringIsAlNum($text) Then FileWrite($file2, $text)

Also, if you're going to use that, make sure to add the OnAutoItExit Func to close the files :)

Writing AutoIt scripts since

_DateAdd("d", -2, _NowCalcDate())
Link to comment
Share on other sites

$file1 = fileopen('mytext.txt', 0)
$file2 = fileopen('mynew.txt', 2)
$text = fileread($file1, 1)
If @error Then Exit
If StringIsAlNum($text) Then FileWrite($file2, $text)

Also, if you're going to use that, make sure to add the OnAutoItExit Func to close the files :)

<{POST_SNAPBACK}>

Will you lose end-of-line characters this way? I haven't tested, sso I'm not sure... If so, you may need to FileReadLine, parse the line character by character and then FileWriteLine...

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Look under regular expressions:

MsgBox(0, "Regular Exp[b][/b]ression Replace Test", _
StringRegExpReplace("Where have all the flowers gone, _
long time passing? E#a%t^ m(y) {sh}o=rt\s", "[^a-z|A-Z| |,|.|1234567890]", ""))

AutoIt3, the MACGYVER Pocket Knife for computers.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...