Jump to content

Recommended Posts

I need to read log files into an array to search for errors. However when I display the array I get garbage or "chinese characters". Our developers say they are using UTF-8, but FileGetEncoding says the logs are "2048" or $FO_UTF16_BE_NOBOM (2048) = Use Unicode UTF16 Big Endian (without BOM) from the Encoding codes in FileOpen().

There is an app called Detenc that detects the encoding used by files. You have to guess, but it returns correctly when I set the Encoder for UTF-8. I understand Encoding is not etched in stone, but the first character of the file is a capital B, using HxD Hex Editor.

I even have another  topic here about running PowerShell to reencode the file so AutoIt will store the file properly in the array - See:

So I am trying to figure out why AutoIt thinks my logs are not UTF-8.

Here is sample code:

#include <array.au3>
#include <File.au3>

Local $aRetArrayFile
    _FileReadToArray("C:\Logs\Myplayer1.log", $aRetArrayFile)

I won't post the results as it is illegible, but I did attach a screenshot of the _ArrayDisplay results, and this is the first line of the Log file:

BANNER 10/10/2017 15:56:00 ======================================================================

And the Hex from the beginning of the file:

42 41 4E 4E 45 52 20 31 30 2F 31 30 2F 32 30 31 37 20 31 34 3A 33 31 3A 33 35 20 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 3D 0D 0A 42 41 4E 4E 45 52 20

So I don't understand why AutoIt thinks the file is UTF16 BE.

If I can get the Powershell script running, I have a workaround.

BTW none of my other arrays display as garbage, just the log files.


Rereading my post, what seems to be missing is the question. I guess my question is, does anyone know why these logs are being displayed incorrectly?




Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By rot3r
      Hi friends,
      I'm using json.au3 to register some data in my server, everything work fine except the data that i send in utf8 format like names saved as ????? .
      i think something mess with inetget cause this problem
      can you help me please?
      $sAdditionalData = "secret_key=" & $secret_key_verify & "&slm_action=slm_check&license_key=" & $license_key&"&first_name="&$first_name $data = _INetGetSource($sDomain&'/?'&$sAdditionalData) $object = json_decode($data) Json_Dump($data) $response = json_get($object, '.first_name')  
    • By Fenzik
      Hello All!
      i suggest to set default encoding in Scite4 for Autoit 3 to UTF 8 with Bom encoding, format recommended also in Autoit Help.
      In last editor version, when i open new script, for example Czech characters (č, ř, ž) aren't correct.
      So when i change Encoding to UTF 8 with Bom from Default Code page property state, everithing seems to be OK.
      Thank you and sorry for potentialy duplicated content.
    • By AXLE
      I am trying to find information on using UTF-8 Strings in AutoIt. After searching extensively I cannot find anything conclusive on this topic. What I need to do is FileRead() into a String variable(or Array) and keep the UTF-8 Encoding. Some articles, and even Help documents on FileOpen() suggest that AutoIT (Current Versions) can read and store UTF-8 internally but my tests on reading a test web page containing UTF-8 encoded characters into a variable fails.
      Does/Can AutoIt use Strings Encoded as UTF-8, and if so how ?
      If Not does anyone know of a UDF, or a C/Win-API routine to allow to use a UTF-8 Array in AutoIt ?
      What does AutoIt use internally for Strings ? Is it converting the UTF-8 file to UCS-2 String in the Variable ?
      The following is an example which fails for me.
      ;UTF-8 Tests #include <FileConstants.au3> #include <MsgBoxConstants.au3> #include <WinAPIFiles.au3> ;https://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html ;Also all checked in Notepad++ UTF-8 Encoding (Many Characters are scrambled) Local $sFile1 = "UTF-8 test file.htm"; 414 Lines | 76,412 characters. "UTF-8 test file.htm" = "/UTF-8-demo.html" Local $sFile2 = "test2.html" Local $hfile1 = FileOpen($sFile1, BitOr($FO_READ, $FO_UTF8_NOBOM)) If @error Then MsgBox($MB_SYSTEMMODAL, "FileOpen1", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf Local $sAm_I_UFT_8 = FileRead($hfile1, -1);Does not appear to read UTF-8 characters correctly from the "UTF-8 test file.htm" If @error Then MsgBox($MB_SYSTEMMODAL, "FileRead", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf FileClose($hfile1) Local $sAm_I_Still_UTF_8 = $sAm_I_UFT_8 ;Are these two strings stored internaly as UTF-8 ? If @error Then MsgBox($MB_SYSTEMMODAL, "String=String", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf Local $iStrLen1 = StringLen($sAm_I_UFT_8) Local $iStrLen2 = StringLen($sAm_I_Still_UTF_8) MsgBox($MB_SYSTEMMODAL, "String Lenght of $sAm_I_UFT_8", $iStrLen1); 414 Lines | 70,174 characters MsgBox($MB_SYSTEMMODAL, "String Lenght of $sAm_I_Still_UTF_8", $iStrLen2); 414 Lines | 70,174 characters Local $hfile2 = FileOpen($sFile2, BitOR($FO_OVERWRITE, $FO_BINARY)) If @error Then MsgBox($MB_SYSTEMMODAL, "FileOpen2", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf FileWrite($hfile2, $sAm_I_Still_UTF_8) ;If $sAm_I_Still_UTF_8 is actual UTF-8 it should be an exact copy of the original "UTF-8 test file.htm" If @error Then MsgBox($MB_SYSTEMMODAL, "FileOpen2", "Value of @error is: " & @error & @CRLF & "Value of @extended is: " & @extended) EndIf FileClose($hfile2)  
    • By Jibberish
      Hi all,
      I need to read a log file into an array, but the log file is encoded as $FO_UTF16_BE_NOBOM (2048) = Use Unicode UTF16 Big Endian (without BOM) per FileGetEncoding (it returns 2048).
      I have searched how to convert these log files to UTF-8 and finally found a Powershell command. Since then I have been racking my brain trying to get the function to work. The command itself works from a Powerscript prompt:
      C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -Command Get-Content C:\Logs\Myplayer_10-10-17-02-31.log | Set-Content -Encoding utf8 C:\Logs\Myplayer1.log This is my sandbox;
      #include <array.au3> #include <File.au3> Local $aArrayLogFile Local $sLogDir = "C:\Logs\" Local $sLogFile = "Myplayer_10-10-17-02-31.log" Local $sConvertedLog = "ConvertedLog.log" Local $sLogDirFile = $sLogDir&$sLogFile RunWait("C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -Command Get-Content "&$sLogDirFile&" | Set-Content -Encoding utf8 "&$sConvertedLog,$sLogDir) _FileReadToArray($sLogDirFile, $aArrayLogFile) _ArrayDisplay($aArrayLogFile) Also tried
      RunWait("C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -Command Get-Content "&$sLogDirFile&" | Set-Content -Encoding utf8 "&$sConvertedLog,$sLogDir) and
      ShellExecuteWait("C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe"," -Command Get-Content "&$sLogDirFile&" | Set-Content -Encoding utf8 "&$sConvertedLog,$sLogDir) Tried without -Command and a bunch of other parameters that were sprinkled throughout the internet from people trying to get this to work.
    • By rootx
      I need help with unicode char ü I get some text from online json but if try to read 4 example Zürich I heave  Zürich.
      How can I convert with autoit unicode to a clear character readable? thx
  • Create New...