Jump to content

Does FileReadToArray() function correctly?


Recommended Posts

Some crazy manipulations to the input file in this thread got me looking at FileReadToArray() and questioning whether it behaves properly. What seems to be implied in that thread is that the input file is line-delimited by @CRLF, but that individual lines may contain @CR characters.

FileReadToArray(), as presently written, tries to identify line delimiters as being either @CRLF, @CR or @LF. It's process flow is one of the following routes:

1. If a @CRLF or @LF is present, split on @CRLF and remove all @CR's from the file.

2. If a @CR is present, split on @CR

Should it remove stray @CR's when it has determined to break on @CRLF?

It does not bother removing stray @LF's. I wonder if it might be written more like this, which does not remove other characters when it splits on @CRLF, and also happens to be measurably faster:

Func _FileReadToArray($sFilePath, ByRef $aArray)
    Local $aFile = FileRead($sFilePath)
    If @error Then Return SetError(1, 0, 0);; unable to open the file
    Select
     Case StringInStr($aFile, @CRLF)
         If StringRight($aFile, 2) = @CRLF Then $aFile = StringTrimRight($aFile, 2)
         $aArray = StringSplit($aFile, @CRLF, 1)
     Case StringInStr($aFile, @CR)
         If StringRight($aFile, 1) = @CR Then $aFile = StringTrimRight($aFile, 1)
         $aArray = StringSplit($aFile, @CR)
     Case StringInStr($aFile, @LF)
         If StringRight($aFile, 1) = @LF Then $aFile = StringTrimRight($aFile, 1)
         $aArray = StringSplit($aFile, @LF)
     Case StringLen($aFile)
         Dim $aArray[2] = [1, $aFile]
     Case Else
         Return SetError(2, 0, 0) ; empty file
    EndSelect
    Return 1
EndFunc ;==>_FileReadToArray

(a part of the speed increase is due to the removal of a FileGetSize() on the FileRead() statement which seems superfluous. Maybe it is not?)

Edit: Another question might be how to handle a file with both lone @CR's and lone @LF's. The production version gives preference to @LF. The version above to @CR (although that could be easily changed). You'd likely need to pass another function parameter to allow the user to override that hierarchy by specifiying a preferred delimiter.

Edited by Spiff59
Link to comment
Share on other sites

Why should it remove stray @CR's when it has determined to break on @CRLF?

The code doesn't remove all @CRLF, @CR or @LF. It only removes the rightmost to cut off emtpy lines at the end of the file.

The code has been discussed in trac ticket #198.

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

What _FileReadToArray does is it checks to see if there are any LF in the file, if there are, it strips all CRs from the file then splits the lines by the LF characters. If there aren't any LFs in the file it splits on the CR only.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Spiff59, et al.,

The following code demonstrates what you are finding, and what Valik pointed out 5 years ago in the trac ticket. Func _my_filelisttoarray is one way that I might choose to fix the problem.

I added a parm to trim trailing blank lines.

;
;
;
#include <array.au3>
#include <file.au3>
$str =  'line 1' & @cr & _
  'line 2' & @lf & _
  'line 3' & @crlf & _
  'line 4' & @cr & _
  'line 5' & @lf & _
  'line 6' & @crlf & _
  'line 7' & @crlf & _
  'line 8' & @crlf & @crlf & @lf & @cr
if fileexists('k:temptest010.txt') then filedelete('k:temptest010.txt')
filewrite("k:temptest010.txt", $str)
local $a10
_filereadtoarray("k:temptest010.txt",$a10)
_arraydisplay($a10)
_my_filereadtoarray("k:temptest010.txt",$a10,true)
_arraydisplay($a10)
Func _my_FileReadToArray($sFilePath, ByRef $aArray, $trim_trailing_blank_lines = false)
 Local $aFile = FileRead($sFilePath)
 ; change all @crlf and @cr to @lf
 $aFile = stringregexpreplace($aFile,@crlf, @lf)
 $aFile = stringregexpreplace($aFile,@cr, @lf)
 ; trim blank lines if required
 if $trim_trailing_blank_lines then
  while stringleft(stringright($aFile,2),1) = @lf
   $aFile = stringtrimright($aFile,1)
  wend
  $aFile = stringtrimright($aFile,1)
 endif
 $aArray = stringsplit($aFile,@lf)
 if isarray($aArray) then
  return $aArray
 Else
  return $aFile
 endif
EndFunc   ;==>_FileReadToArray

I am sure that there are better ways to do the stringregexpreplace.

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

That's probably the oldest Trac ticket I've ever looked over!

I'm not referring to the behavior regarding the last line of the file.

The ticket is useful is giving an idea of what sort of files are out there, when it states:

@CRLF ; Windows

@LF ; Unix

@CR ; Mac

My questions are:

Is it proper that when the function detects a file that is either @LF or @CRLF delimited that it also removes all single @CR's?

If a file contains both single @CR's and single @LF's, which should take precedence?

(there is also the question whether files with these character combinations even exist)

Edited by Spiff59
Link to comment
Share on other sites

  • Moderators

Hi,

I use this RegEx to force all line endings into @CRLF format when I need to be sure:

#include <Array.au3>

$sBase_Text = "1" & @CRLF & "2" & @CR & "3" & @LF & "4" & @CRLF & "5"
$aArray = StringSplit($sBase_Text, @CRLF, 1)
_ArrayDisplay($aArray, "Not all @CRLF")

$sNew_Text = StringRegExpReplace($sBase_Text, "((?<!x0d)x0a|x0d(?!x0a))", @CRLF)

$aArray = StringSplit($sNew_Text, @CRLF, 1)
_ArrayDisplay($aArray, "Now all @CRLF")

It basically looks for single @CR or @LF characters not in a pair with the other. :)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Interesting regular expression Melba23.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

  • Moderators

guinness,

It certainly took a while to get it to work properly! :D

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

What about this as an improvement?

#include <Array.au3>
#include <FileConstants.au3>

Local $aArray = 0
_FileReadToArray(@ScriptFullPath, $aArray, Default)
_ArrayDisplay($aArray)

; #FUNCTION# ====================================================================================================================
; Name...........: _FileReadToArray
; Description ...: Reads the specified file into an array.
; Syntax.........: _FileReadToArray($sFilePath, ByRef $aArray [, $iFlag = 1])
; Parameters ....: $sFilePath           - Path and filename of the file to be read.
;                  $aArray              - The array to store the contents of the file.
;                  $iFlag               - [optional] 1 - Return the array count in the 0th index or 0 - Don't return the array count. Default is 1.
; Return values .: Success - Returns a 1
;                  Failure - Returns a 0
;                  @error  - 0 = No error.
;                  |1 = Error opening specified file
;                  |2 = Unable to Split the file
; Author ........: Jonathan Bennett <jon at hiddensoft dot com>, Valik - Support Windows Unix and Mac line separator
; Modified.......: Jpm - fixed empty line at the end, Gary Fixed file contains only 1 line, guinness - Optional flag to return the array count.
; Remarks .......: $aArray[0] will contain the number of records read into the array.
; Related .......: _FileWriteFromArray
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================
Func _FileReadToArray($sFilePath, ByRef $aArray, $iFlag)
    Local $hFileOpen = FileOpen($sFilePath, $FO_READ)
    If $hFileOpen = -1 Then Return SetError(1, 0, 0)
    Local $sFileRead = FileRead($hFileOpen)
    FileClose($hFileOpen)

    ; Check if to return the array count in the 0th index
    If $iFlag = Default Or $iFlag = 1 Then
        $iFlag = 1
    Else
        $iFlag = 3
    EndIf

    $sFileRead = StringRegExpReplace($sFileRead, "((?<!x0d)x0a|x0d(?!x0a))", @CRLF)
    $aArray = StringSplit($sFileRead, @CRLF, $iFlag)
    If @error Then
        If StringLen($sFileRead) Then
            Local $aReturn[2] = [1, $sFileRead]
            $aArray = $aReturn
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf
EndFunc   ;==>_FileReadToArray

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

hi guinness, does not seem to make sense, if you use the StringRegExpReplace (passing twice), then why not use the StringRegExp (passing once, very much faster) ??

#include <Array.au3>
#include <FileConstants.au3>

Local $aArray = 0
_FileReadToArray(@ScriptFullPath, $aArray, Default)
_ArrayDisplay($aArray)

; #FUNCTION# ====================================================================================================================
; Name...........: _FileReadToArray
; Description ...: Reads the specified file into an array.
; Syntax.........: _FileReadToArray($sFilePath, ByRef $aArray [, $iFlag = 1])
; Parameters ....: $sFilePath          - Path and filename of the file to be read.
;                 $aArray             - The array to store the contents of the file.
;                 $iFlag               - [optional] 1 - Return the array count in the 0th index or 0 - Don't return the array count. Default is 1.
; Return values .: Success - Returns a 1
;                 Failure - Returns a 0
;                 @error  - 0 = No error.
;                 |1 = Error opening specified file
;                 |2 = Unable to Split the file
; Author ........: Jonathan Bennett <jon at hiddensoft dot com>, Valik - Support Windows Unix and Mac line separator
; Modified.......: Jpm - fixed empty line at the end, Gary Fixed file contains only 1 line, guinness - Optional flag to return the array count.
; Remarks .......: $aArray[0] will contain the number of records read into the array.
; Related .......: _FileWriteFromArray
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================
Func _FileReadToArray($sFilePath, ByRef $aArray, $iFlag)
    Local $hFileOpen = FileOpen($sFilePath, $FO_READ)
    If $hFileOpen = -1 Then Return SetError(1, 0, 0)
    Local $sFileRead = FileRead($hFileOpen)
    FileClose($hFileOpen)

    ; Check if to return the array count in the 0th index
    If $iFlag = Default Or $iFlag = 1 Then
        $iFlag = @LF
    Else
        $iFlag = ""
    EndIf
    ; or $iFlag optional, so _FileReadToArray($sFilePath, ByRef $aArray, $iFlag = "")
    ; If $iFlag <> "" Then $iFlag = @LF
   
    $aArray = StringRegExp($iFlag & $sFileRead & @LF, "([^rn]*)(?:rn|n|r)", 3)
    If @error Then
        If StringLen($sFileRead) Then
            Local $aReturn[2] = [1, $sFileRead]
            $aArray = $aReturn
        Else
            Return SetError(2, 0, 0)
        EndIf
    Else
        If $iFlag Then $aArray[0] = UBound($aArray) - 1
    EndIf
EndFunc   ;==>_FileReadToArray

and so 200% faster, or equally as fast as the StringSplit in fuction default

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

DXRW4E,

You make a valid point. Nice function by the way.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Spiff59,

The point of what I posted was to normalize end of line characters, as M23 and DXRW4E have done. As I alluded to, my regexpreplace was weak and has since been improved. I put in the trailing blank line option to avoid the old, tired discussions about the validity and handling of trailing blanks (there was recently a long, tedious thread devoted to this).

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Spiff59,

The point of what I posted was to normalize end of line characters, as M23 and DXRW4E have done. As I alluded to, my regexpreplace was weak and has since been improved. I put in the trailing blank line option to avoid the old, tired discussions about the validity and handling of trailing blanks (there was recently a long, tedious thread devoted to this).

kylomas

I might have been involved in that old, tired discussion! If it was the one about FileCountLines() where the function would strip all the blank lines off the end of the file whether there was 1 or 1000. It bothered me that FileCountLines() rarely agreed with any editor on the planet as to how many lines were in a file.

I think these are all good examples. The point of this thread was that it seemed odd to me that in some cases FileReadToArray() could simply strip bytes from a file. They aren't treated as line delimiters. Just poof! And they're gone! It's seems the consensus that regardless of the combinations of @CRLF's, @CR's and @LF's in a file that they should all be treated as line delimiters. That does seem a more standardized rule than the exisitng logic that can cause some @CR's to simply vanish.

Edited by Spiff59
Link to comment
Share on other sites

Nice function by the way.

Thank You

DXRW4E,

Nice, I missed the whole "byref" thing!

I was editing the post, because as use the REGEXP, the options are endless, example

; #FUNCTION# ====================================================================================================================
; Name...........: _FileReadToArray
; Description ...: Reads the specified file into an array.
; Syntax.........: _FileReadToArray($sFilePath, ByRef $aArray [, $iFlag = 1])
; Parameters ....: $sFilePath         - Path and filename of the file to be read.
;                $aArray             - The array to store the contents of the file.
;                $iFlag            - Optional: (add the flags together for multiple operations):
;                |$iFlag = 0 Return the array count in the 0th index (Default)
;                |$iFlag = 1 Don't return the array count
;                |$iFlag = 2 Don't return Line empty (ignores @CRLF & @CRLF & @CRLF ect ect)
;                |$iFlag = 4 Don't return Line empty or contain only whitespace character (ignores @CRLF & " " & @CRLF & @Tab & @CRLF ect ect)
;                |$iFlag = 8 Strip Line leading white space
; Return values .: Success - Returns a 1
;                Failure - Returns a 0
;                @error  - 0 = No error.
;                |1 = Error opening specified file
;                |2 = Unable to Split the file
; Author ........: Jonathan Bennett <jon at hiddensoft dot com>, Valik - Support Windows Unix and Mac line separator
; Modified.......: Jpm - fixed empty line at the end, Gary Fixed file contains only 1 line, guinness & DXRW4E - Add Optional flag, see $iFlag.
; Remarks .......: $aArray[0] will contain the number of records read into the array.
; Related .......: _FileWriteFromArray
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================
Func _FileReadToArray($sFilePath, ByRef $aArray, $iFlag = 0)
    Local $ArrayCount, $RegExp = "(?:rn|n|r)([^rn]*)", $hFileOpen = FileOpen($sFilePath, $FO_READ)
    If $hFileOpen = -1 Then Return SetError(1, 0, 0)
    Local $sFileRead = FileRead($hFileOpen)
    FileClose($hFileOpen)

    ; Check Optional $iFlag
    If Not BitAND($iFlag, 1) Then $ArrayCount = "ArrayCount" & @LF
    If BitAND($iFlag, 2) Then $RegExp = "(?:rn|n|r)([^rn]+)"
    If BitAND($iFlag, 4) Then $RegExp = "s*(?:rn|n|r)([^rn]+)"
    If BitAND($iFlag, 8) Then $RegExp = StringReplace($RegExp, ")", ")h*", 1, 1)

    $aArray = StringRegExp(@LF & $ArrayCount & $sFileRead, $RegExp, 3)
    If @error Then
        If StringLen($sFileRead) Then
            Local $aReturn[2] = [1, $sFileRead]
            $aArray = $aReturn
        Else
            Return SetError(2, 0, 0)
        EndIf
    ElseIf $ArrayCount Then
        $aArray[0] = UBound($aArray) - 1
    EndIf
    Return 1
EndFunc   ;==>_FileReadToArray

tested with a INF File 42 MB approximately 1,000,000 line, Return in approximately 6802.4406 seconds, instead Default UDF that uses StringSplit Return in approximately 6620.1655 seconds, more or less StringRegExp and StringSplit have exactly the same speed

sorry again for my english

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

DXRW4E,

Certainly alot more advanced than the UDF version. Thanks.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Not diving in the gory details of this thread, let me add a sidenote that some regexp fans/users could use some day.

At the start of a PCRE pattern, you can force your own line break character or combination, thusly:

(*CR) carriage return

(*LF) linefeed

(*CRLF) carriage return, followed by linefeed

(*ANYCRLF) any of the three above

Note that this last flag doesn't work with PCRE as compiled into AutoIt (would need a compile-time option)

(*ANY) all Unicode newline sequences

Current AutoIt PCRE defaults to the equivalent of (*CRLF).

It's also possible to change the line break sequence in the middle of a pattern.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I'm not sure of how many people will actually have files with varying line endings unless they're doing something like this

copy filea.txt + fileb.txt filec.txt

That would be an extreme edge case I would think. Normally you'll have CRLF (Windows and numerous other file systems), CR (Normally only applies to older Apples or a few older computer systems), or LF (Unix/Linux/OSX, and a few smaller OSs). It would be a strange file indeed to have one that mixed them.

The way that _FileReadToArray works now seems fine to me. If there are LF characters, then use one method, if there are no LF characters, use the other. No need to over-complicate things. If someone has an issue with a strangely formatted text file, then I would suggest that they fix the file before using FRTA, and not have the overhead of having FRTA normalize line endings and then splitting the file. I'm sure on large files, reformatting LEs would be a considerable time waste for an insignificant gain on an insignificant number of files.

Edited by BrewManNH

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

That's my opinion too. I simply mentionned the PCRE line break modifiers as an aside, since it closely relates to this subject.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...