Jump to content

Processing massive txt file


Jewtus
 Share

Recommended Posts

I have a tab separated TXT file that has blocks of information that I'm trying to grab so I can parse them. The only way I can open the files is using WinVi and I'm trying to parse the files with autoit. I'm hoping I can extract the files a few hundred lines at a time, but when I use filereadline, i don't get the first line of the actual file (it looks like a bunch of junk characters). The smallest file I have to work with is 600 MB and the files can be up to 4 GB, and the machine that will be processing/running the script only has 4 GB of ram so I know that I need to segment the file to actually open it.

This is the way I've been trying to parse it (and it gives junk output)

FileOpen($file, 0)
For $i = 1 to _FileCountLines($file)
    $line = FileReadLine($file, $i)
    msgbox(0,'','the line ' & $i & ' is ' & $line)
Next
FileClose($file)

Does anyone have suggestions

Link to comment
Share on other sites

@Jewtus Junk characters? Does the file contain foreign characters? This happens when AutoIt parses a text file in the wrong encoding

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

You use FileOpen on the file, but you never use the handle returned by the function in your FileReadLine.

Is the file a standard ASCII text file, or is it a Unicode file? If it is in Unicode then use the appropriate Unicode parameter in FileOpen, use the handle returned by it in your FileReadLine, and get rid of the For loop, it's going to be VERY slow for large files if you don't use the file handle.

#include <FileConstants.au3>
#include <MsgBoxConstants.au3>
#include <WinAPIFiles.au3>

Example()

Func Example()
    Local $hFileOpen = FileOpen($File, $FO_READ)
    If $hFileOpen = -1 Then
        MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.")
        Return False
    EndIf

    Local $sFileRead 
    While 1 
        $sFileRead = FileReadLine($hFileOpen)
        If @error = -1 Then ExitLoop ; read to end of file
        ConsoleWrite($sFileRead & @CRLF)
    WEnd
    
    FileClose($hFileOpen)

EndFunc   ;==>Example

 

Edited by BrewManNH

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

@TheDcoder that is exactly what happens... it looks like chinese. How would I change the encoding?

 

@BrewManNH I tried that code block and it starts writing the content but it isn't what I expect (same kinda junk characters):

????????????????†?????????????????????††††††††††††††††?????????††††††††††††††††????????††††††††††††††††††††††?????????????????†††††††††???????????????

I'm not sure if its ASCII or Unicode and I'm not exactly sure how to find out... Someone just handed me the file

Edited by Jewtus
Link to comment
Share on other sites

You should read the documentation for FileOpen, concentrate on the mode parameter (2nd parameter), Make sure that you use the file handle this time! :P

 

TD :)

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

Ok I tried every option and the script just ends for every option (no message and no console write). Only FO_READ gives me any results and they aren't the first line of the file.

(I was using brewman's example since it already catches if the file fails to load)

 

 

Link to comment
Share on other sites

Any possibility of posting one of the file's you're trying to work with?

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

@BrewManNH I think he said that his files range from 400 600 MB to 4 GB :P, I think its possible to highly compress them with an reliable compressor (like 7zip)

Edited by TheDcoder

EasyCodeIt - A cross-platform AutoIt implementation - Fund the development! (GitHub will double your donations for a limited time)

DcodingTheWeb Forum - Follow for updates and Join for discussion

Link to comment
Share on other sites

@czardas A program could do that :P, A flat file maybe...

Well a DMBS would be a better option in that case.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Well you could look for a BOM. Please run the following code on one of the files and post the result.

Local $hFile = FileOpen("test.txt", 16) ; $FO_BINARY

Local $dBinary = FileRead($hFile, 5)

FileClose("test.txt")

ConsoleWrite($dBinary & @LF)

 

Edited by czardas
Link to comment
Share on other sites

I was hoping that a portion of the file would be enough, I don't want the whole 600MB text file just to see what format it's in.

@BrewManNH I think he said that his files range from 400 600 MB to 4 GB :P, I think its possible to highly compress them with an reliable compressor (like 7zip)

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Thanks for the responses all, and I agree... they should have a better way of getting this data. Unfortunately I cannot share the data as its personal customer data.

@czardas I tried that script and nothing was printed on the console.

Here is a little background:

The IT group at my current company really doesn't understand technology or databases. The information is available in a database, but the problem is, they wont grant any one access to the DB. In addition to that, they have no services/API's for requesting data so the way they deal with any request for data is to create file outputs (depending on who you ask, you get a different format... there is no standardization... everyone gets to do their own thing).

 

I was asked to provide a stopgap that would allow the team to not have to process the files manually (they literally open the file and copy and paste data for hours... there is an entire team that does this and this alone all day). 

 

Its stupid, but I gotta work with the tools at my disposal (I requested a database and its been 4 months and they are still trying to sort out how to set it up and get me access... should have taken 2 days tops and that is if they build the actual machine but they just use VMs)

 

The really weird thing about the file is that it will open when I created a linked file in MS Access (that's what they are currently using to process some of the data)

Edited by Jewtus
Link to comment
Share on other sites

Could this be an access database file with a text  file extension?

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Doubtful, It's an output from the mainframe. I'm going to just do this with MS Access.. I've spent too much time on something the we shouldn't even be doing. Thanks for all the help everyone.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...