Sign in to follow this  
Followers 0
cherdeg

Howto strip NUL character from text file

13 posts in this topic

#1 ·  Posted (edited)

Hello altogether,

I do not seem to be able to solve the following problem: I have to create a summary from a source text file. Therefore I am reading the file to an source array and searching there for the word "fail!". All lines containing that word are copied to a destination array. Problem is that the file contains "NUL characters". At 1st I had to modify _FileCountLines() to get me the correct number of lines in the file; in its original state it stopped at the line containing the NUL character. I got rid of that by letting _MyFileCountLines() only count the CRs. But now the line copying from source to destination array stops - again at the line containing the NUL character.

For analysis I attach three files: source.txt, and a dest-correct.txt how it should look if created correctly. EDIT: Also I attached a screenshot from notepad++ using the option ">View>non-printable characters>Show all characters" (or similar, my np++ is in German). Scroll to the right part of the picture to see the problem.

Here's the code:

#include <Array.au3>
#include <File.au3>

$s_SourceFile = "source.txt"
$s_DestFile = "dest.txt"
$s_SearchValue = "fail!"
$i_SearchHits = CountOccurances($s_SourceFile, $s_SearchValue)
MsgBox("", "Search Hits", $i_SearchHits) ; DEBUGLINE

_CopyAuditHits($s_SourceFile, $s_SearchValue, $i_SearchHits, $s_DestFile)

; The function _CopyAuditHits() searches the Results file for audit hits
; ==================================================================================================
Func _CopyAuditHits($s_SourceFile, $s_SearchValue, $i_SearchHits, $s_DestFile)
    
    $i_LineCount = _MyFileCountLines($s_SourceFile)
    If ($i_LineCount = 0) Then
        Exit
    Else
        MsgBox("", "LineCount", $i_LineCount) ; DEBUGLINE
        Local $a_SourceArray[$i_LineCount]
    EndIf
    If (_FileReadToArray($s_SourceFile, $a_SourceArray) = 0) Then Exit

    Local $a_DestArray[$i_SearchHits]
    Local $i_j = 0

    For $i_i = 0 To $i_LineCount - 1
        If StringInStr($a_SourceArray[$i_i], $s_SearchValue) <> 0 Then
            $a_DestArray[$i_j] = StringReplace($a_SourceArray[$i_i],  Binary(Chr(0)) & @CR, @CR)
            _ArrayDisplay($a_DestArray) ; DEBUGLINE
            $i_j += 1
        EndIf
    Next
    
    _FileWriteFromArray($s_DestFile, $a_DestArray)
    
EndFunc


; The function CountOccurances() searches the file for audit hits and counts their number
; ==================================================================================================
Func CountOccurances($s_FileName, $s_SearchValue)
    $i_SearchHits = 0
    $s_var = FileRead($s_FileName)
    StringReplace($s_var, $s_SearchValue, "*")
    $i_SearchHits = @extended
    Return $i_SearchHits
EndFunc


; The function _MyFileCountLines() counts the number of lines in a file
; ==================================================================================================
Func _MyFileCountLines($sFilePath)
    Local $hFile, $sFileContent, $aTmp
    $hFile = FileOpen($sFilePath, 0)
    If $hFile = -1 Then Return SetError(1, 0, 0)
    $sFileContent = StringStripWS(FileRead($hFile), 2)
    FileClose($hFile)

    If StringInStr($sFileContent, @CR) Then
        $aTmp = StringSplit($sFileContent, @CR)
    Else
        If StringLen($sFileContent) Then
            Return 1
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf

    Return $aTmp[0] + 1
EndFunc   ;==>_FileCountLines

How would I get this working? I would think that as 1st step I would had to remove the NUL character from the source.txt. But how to do that?

Every kind of help is greatly appreciated...

Best Regards,

Chris

source.txt

dest-correct.txt

post-35371-12489595998677_thumb.png

Edited by cherdeg

Share this post


Link to post
Share on other sites



for your string split operation, instead of "StringSplit($sFileContent, @CR)" try "StringSplit($sFileContent , @CRLF , 3)" that will give a 0 based array with each line in an index and should remove the null chars

or you can use 1 instead of 3 to make a 1-based array with the number of elements on the 0 index.


[size="2"] "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian Kernighan[/size]

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Thanks for your answer, nekkutta, but nope, that's not it. Did you try the code? It stops at the same line with an array fault like before. I modified my function to match your advice:

; The function _MyFileCountLines() counts the number of lines in a file
; ==================================================================================================
Func _MyFileCountLines($sFilePath)
    Local $hFile, $sFileContent, $aTmp
    $hFile = FileOpen($sFilePath, 0)
    If $hFile = -1 Then Return SetError(1, 0, 0)
    $sFileContent = StringStripWS(FileRead($hFile), 2)
    FileClose($hFile)

    If StringInStr($sFileContent, @CR) Then
        $aTmp = StringSplit($sFileContent, @CRLF, 3)
    Else
        If StringLen($sFileContent) Then
            Return 1
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf

    Return UBound($aTmp) + 1
EndFunc   ;==>_FileCountLines

...but that doesn't help. Next try?

Edited by cherdeg

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

tried the code, now I know what you are talking about, the blank space at the beginning of the lines, that is simple enough to resolve, lookup StringStripWS() in the help file

you should use it before you add the string to your output array, and use the 1 flag. hope that helps.

; The function _CopyAuditHits() searches the Results file for audit hits
; ==================================================================================================
Func _CopyAuditHits($s_SourceFile, $s_SearchValue, $i_SearchHits, $s_DestFile)

    $i_LineCount = _MyFileCountLines($s_SourceFile)
    If ($i_LineCount = 0) Then
        Exit
    Else
        MsgBox("", "LineCount", $i_LineCount) ; DEBUGLINE
        Local $a_SourceArray[$i_LineCount]
    EndIf
    If (_FileReadToArray($s_SourceFile, $a_SourceArray) = 0) Then Exit

    Local $a_DestArray[$i_SearchHits]
    Local $i_j = 0

    For $i_i = 0 To $i_LineCount - 1
        If StringInStr($a_SourceArray[$i_i], $s_SearchValue) <> 0 Then
            $a_DestArray[$i_j] = StringStripWS($a_SourceArray[$i_i], 1 )
            ;$a_DestArray[$i_j] = StringReplace($a_SourceArray[$i_i],  Binary(Chr(0)) & @CR, @CR)
            _ArrayDisplay($a_DestArray) ; DEBUGLINE
            $i_j += 1
        EndIf
    Next

    _FileWriteFromArray($s_DestFile, $a_DestArray)

EndFunc

Edit: New Function

Edited by nekkutta

[size="2"] "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian Kernighan[/size]

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

StringStripWS()

I've already tried that in the beginning - won't help either. EDIT: Thanks for the reworked function; it's exactly modified the way I tried it.

BTW.: you still seem not to understand what the central problem is and where it occurs: please just download the two text files from my attachment to a directory, save copy of my posted code to a a "test.au3" in the same dir and test the whole stuff.

Maybe if you have other ideas, you could try to test them before posting...please just run my code from SciTe.

EDIT: If you open source.txt in an np++, switch on non-printable characters and delete both the NUL characters from the end of line #65, everything works just perfect.

Edited by cherdeg

Share this post


Link to post
Share on other sites

I tried it after I added the StringStripWS and it worked, Unless I really am missing something, It returned all of the failed lines.

Slightly confused.. o.O


[size="2"] "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian Kernighan[/size]

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

I tried it after I added the StringStripWS and it worked, Unless I really am missing something, It returned all of the failed lines.

Here it doesn't. Did you check with np++ in source.txt if the NUL chars are still there as you can see 'em on the screenshot?

The error autoit reports is in line 29 of my code, not in 30 (which is the one you modified in your above posting).

Edited by cherdeg

Share this post


Link to post
Share on other sites

nvm, now I see the problem, I directly DLed the source instead of copying it from my browser and now I get the error.ok, Chr ( 32 ) is your nul char, you should beable to do a quick "StringReplace($string, Chr(32), "")" to get rid of it, sorry for the mix up.


[size="2"] "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian Kernighan[/size]

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Okay, so now I use:

If StringInStr(StringReplace($a_SourceArray[$i_i], Chr(32), " "), $s_SearchValue) <> 0 Then

...which should remove the character due to your sayings. But it doesn't:

C:\Documents and Settings\Administrator\Desktop\Scripting\NTcontrol v2\shit1.au3 (29) : ==> Array variable has incorrect number of subscripts or subscript dimension range exceeded.:

If StringInStr(StringReplace($a_SourceArray[$i_i], Chr(32), " "), $s_SearchValue) <> 0 Then

If StringInStr(StringReplace(^ ERROR

BTW.: If I download source.txt by right-clicking and "Save target as" it comes in complete with the NUL characters. Edited by cherdeg

Share this post


Link to post
Share on other sites

ok, I give up, here is some code that will accomplish what you want, although it may still contain the nul chars, blame microsoft for throwing that junk in a report file, but from your source.txt It didn't return any nul in the dest.txt

$s_SourceFile = "source.txt"
$s_DestFile = "dest.txt"
$s_SearchValue = "fail!"
$s_Output = ''
$SourceRaw = FileRead($s_SourceFile)
$sourceArray = StringSplit($SourceRaw,@CRLF,3)
For $i In $sourceArray
    If Not StringInStr($i,$s_SearchValue) Then ContinueLoop
    $s_Output &= StringStripWS($i,1) & @CRLF
Next

FileWrite($s_DestFile, $s_Output)

[size="2"] "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian Kernighan[/size]

Share this post


Link to post
Share on other sites

here you go >_< works just fine :(

#include<array.au3>
#include<File.au3>
$s=FileOpen(@ScriptDir&"\source.txt" , 16)
$ss=FileRead($s)
For $i=1 To BinaryLen($ss)
    If BinaryMid($ss,$i,1)=Binary(BinaryMid(0,1,1)) Then 
        $ss=BinaryMid($ss,1,$i-1)&BinaryMid($ss,$i+1)
        $i-=1
    EndIf
Next
$data=BinaryToString($ss)
$split=StringSplit($data,@CRLF,1)
_ArrayDisplay($split)

Only two things are infinite, the universe and human stupidity, and i'm not sure about the former -Alber EinsteinPractice makes perfect! but nobody's perfect so why practice at all?http://forum.ambrozie.ro

Share this post


Link to post
Share on other sites

ok, I give up, here is some code that will accomplish what you want, although it may still contain the nul chars, blame microsoft for throwing that junk in a report file, but from your source.txt It didn't return any nul in the dest.txt.

Whoa!!!

This is f****g great! Why didn't you just point me in this direction? Apropos: it wasn't M$, it's a perl script of an US colleague of mine that is currently on vacation. So, for this single time, it's not M$ who's to blame.

Thank you so much, YOU made my day! >_<

Share this post


Link to post
Share on other sites

It's no problem, I guess I really don't like seriously correcting other peoples code, seems kinda rude to me, so I try to resolve problems within the original code before trying a major rewrite.

darn, and I wanted to blame M$ >_<


[size="2"] "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian Kernighan[/size]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0