Jump to content
mLipok

Encoding problem - after saving blob to SQL DB

Recommended Posts

Posted (edited)

I have an issue which I describe in this following story:

I have a function which was using:

Local $dStream = StringToBinary($sContent)

instead of this I should use

Local $dStream = StringToBinary($sContent, $iEncoding)

And preset desire $iEncoding as default was $SB_ANSI but I should use $SB_UTF8

This function was using $dStream to put that content to SQL DB as BLOB.
Unfortunately was stored as $SB_ANSI not as $SB_UTF8
 

 When I retrive data from SQL DB I  always use:

$sTXT_Document_content = BinaryToString($dStream, $SB_UTF8)

 

......some time later........

I had to processing this content
Retrive date:

$sTXT_Document_content = BinaryToString($dStream, $SB_UTF8)

procesing (simple @CR > @CRLF) ........ and store data:

$dStream = StringToBinary($sTXT_Document_content, $SB_UTF8)

 

ISSUE short description:
Data was saved as $SB_ANSI .... Was retrived as $SB_UTF8 ... processed (simple @CR > @CRLF) .... stored as $SB_UTF8

Data which should be stored like:

Quote

tytułu:

are displayed as:

Quote

tytu�u:


Request for help:
I know how to get data and store data from/to SQL DB, but I was trying few ways and I can not revert encoding problem.
Help please.

mLipok

 

Edited by mLipok
quoted example fixed

Signature beginning:   Wondering who uses AutoIt and what it can be used for ?
* GHAPI UDF - modest beginning - communication with GitHub REST API Forum Rules *
Include Dependency Tree (Tool for analyzing script relations)
ADO.au3 UDF     POP3.au3 UDF     XML.au3 UDF    How to use IE.au3  UDF with  AutoIt v3.3.14.x  for other useful stuff click the following button

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library *

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskSchedulerIE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) *

PDF Related:How to get reference to PDF object embeded in IE *

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

I also encourage you to check awesome @trancexx code:  * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuffOnHungApp handlerAvoid "AutoIt Error" message box in unknown errors  * HTML editor

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2021-01-03

Share this post


Link to post
Share on other sites

@mLipok
What do you get when you run this script on your PC?

#include <MsgBoxConstants.au3>
#include <StringConstants.au3>

Global $strANSI = "tytułu:", _
       $bstrANSI = Binary($strANSI), _
       $strUTF8 = BinaryToString($bstrANSI, $SB_UTF8)

MsgBox($MB_ICONINFORMATION, "Encondings:", "$strANSI = " & $strANSI & @CRLF & _
                                           "$bstrANSI = " & $bstrANSI & @CRLF & _
                                           "$strUTF8 = " & $strUTF8)

This is what I see:

image.png.256ccae5256996c16975edc0cc944bfc.png

 


Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

Here is example what was done/what happend:

#include <MsgBoxConstants.au3>
#include <StringConstants.au3>


Global $FAKE_DATA_BASE

_Example()

Func _Example()
    Local $DATA0 = "tytułu:"

    _DB_Store_Wrong_Way($DATA0)
    MsgBox(0, @ScriptLineNumber, $FAKE_DATA_BASE)
    Local $DATA1 = _DB_GetData_Proper_Way()
    MsgBox(0, @ScriptLineNumber, $DATA1)
    _DB_Store_Proper_Way($DATA1)

    _DB_Imp()
    Local $DATA2 = _DB_GetData_Proper_Way()
    MsgBox(0, @ScriptLineNumber, $DATA2)

    _DB_FIX()

    Local $DATA3 = _DB_GetData_Proper_Way()

    # RESULT SHOULD BE "True"
    MsgBox(0, 'TEST', $DATA0 == $DATA3)

EndFunc   ;==>_Example

Func _DB_Store_Wrong_Way($DATA)
    $FAKE_DATA_BASE = StringToBinary($DATA)
EndFunc   ;==>_DB_Store_Wrong_Way

Func _DB_GetData_Proper_Way()
    Return BinaryToString($FAKE_DATA_BASE, $SB_UTF8)
EndFunc   ;==>_DB_GetData_Proper_Way

Func _DB_Store_Proper_Way($DATA)
    $FAKE_DATA_BASE = StringToBinary($DATA, $SB_UTF8)
EndFunc   ;==>_DB_Store_Proper_Way

Func _DB_Imp()
    Local $DATA = $FAKE_DATA_BASE
    $DATA = BinaryToString($DATA)
    _DB_Store_Proper_Way($DATA)
EndFunc   ;==>_DB_Imp

Func _DB_FIX()
    # HERE should be done some stuff to fix $FAKE_DATA_BASE content
EndFunc   ;==>_DB_FIX

I need to find out what to do inside:  _DB_FIX() to get result True
will back to the problem in next 24h.

 

Edited by mLipok

Signature beginning:   Wondering who uses AutoIt and what it can be used for ?
* GHAPI UDF - modest beginning - communication with GitHub REST API Forum Rules *
Include Dependency Tree (Tool for analyzing script relations)
ADO.au3 UDF     POP3.au3 UDF     XML.au3 UDF    How to use IE.au3  UDF with  AutoIt v3.3.14.x  for other useful stuff click the following button

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library *

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskSchedulerIE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) *

PDF Related:How to get reference to PDF object embeded in IE *

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

I also encourage you to check awesome @trancexx code:  * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuffOnHungApp handlerAvoid "AutoIt Error" message box in unknown errors  * HTML editor

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2021-01-03

Share this post


Link to post
Share on other sites

Hi mLipok :)
I wonder if it's possible to revert the encoding, please have a look at this :

The letter  ł  (l with stroke) in $DATA0 has a Unicode code of 322 (decimal) as returned by AscW("ł")
When it was wrongly encoded using the ANSI flag, then its Ascii code was used, which is 108 (0x6C) as returned by Asc("ł")
Which means that "ł" has become an "l" (small letter l) and its Ascii code is displayed in MsgBox at line 13 of your script: $FAKE_DATA_BASE = 0x747974756C753A

At this point, it seems impossible to reverse anything because the unicode code 322 doesn't exist anymore in the binary data.

There is something I didn't understand when you say that you have this kind of display on your computer :
tytu�u:

The only obscure display I can reproduce on my computer happens when I change Scite encoding properties (menu File => Encoding) from UTF8 to Code Page Property (ANSI) then "tytułu:" is displayed "tytuÅ‚u:" in the script. Explanation :
Å is character Ascii 197 (0xC5) and ‚ is character ascii 130 (0x82) which is "a single low quotation mark"
This corresponds to an encoding of "0x74797475C582753A" which is the correct binary data when encoding is done using UTF8 flag (as ł unicode's code 322 will be correctly encoded to 0xC582 in UTF8)

Maybe our locales are different and that could explain the tytu�u: display on your computer ?
I hope you'll have other suggestions that will help you.

Share this post


Link to post
Share on other sites

@mLipok I have the exact same questions as @pixelsearch in his good answer above.

When you say it displays as tytu�u: you don't tell us what exactly you look at: a DOS display (which codepage?), a Windows display (which codepage?), a SciTE display (which codepage?)!

I understand your initial issue and I have the same here where an old program I use 100 times a day shows "ANSIfied UTF8 text" as the program doesn't cope with UTF8. I've written a little AutoIt program to convert the clipboard content from " UTF16 string containing UTF8 data" back to UCS2. I didn't bother to handle codepoints beyond UCS2 because I never get such content.

I insisted to handle the low-level conversion on a "UTF8-sequence" basis over 2 or 3 bytes. I had to first replace sequences " " ( then 0x20) by " " ( then 0xA0), and "à " (à then 0x20) by "à" (à then 0xA0) and also double single quotes to make them transparent to AutoIt string processing, hence the 3 nested StringReplace().

Mabe you can adapt the idea to your use case.

HotKeySet("!w", _unUTFme)
HotKeySet("^!w", _Exit)

While 1
    Sleep(250)
WEnd

Func _Exit()
    Exit
EndFunc

Func _unUTFme()
    ClipPut(Execute("'" & StringRegExpReplace(StringReplace(StringReplace(StringReplace(ClipGet(), "Â ", "Â "), "Ã ", "Ã "), "'", "''"), "([\xC0-\xDF].|[\xE0-\xEF]..)", "' & _UnUTF8('$1') & '") & "'"))
EndFunc

Func _UnUTF8($s)
    Local $a = StringToASCIIArray($s)
    Local $c
    If UBound($a) = 2 Then
        $c = BitOR(BitShift(BitAND($a[0], 0x1F), -6), BitAND($a[1], 0x3F))
    Else
        $c = BitOR(BitShift(BitAND($a[0], 0x3F), -12), BitOR(BitShift(BitAND($a[1], 0x3F), -6), BitAND($a[2], 0x3F)))
    EndIf
    Return ChrW($c)
EndFunc

 

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Here's another possible solution.  Since I don't know your language and the rules for how characters are translated from ANSI to UTF-8, I just used my own rules as an example.  Also since I don't know your language, I'm not sure if trying to do it this way is even possible.

Other than adding logic to the _DB_Fix() function and writing output to a GUI console instead of MsgBoxes, the script is exactly the same as your original script.

Spoiler
#AutoIt3Wrapper_AU3Check_Parameters=-w 3 -w 4 -w 5 -w 6 -d

#include <Constants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <EditConstants.au3>

#Region GUI Console
Const $CONSOLE_BACK_COLOR = 0xFFFFFF
Const $CONSOLE_FORE_COLOR = 0x000000
Const $CONSOLE_WIDTH      = 800
Const $CONSOLE_HEIGHT     = 600
Global $frmConsole = GUICreate("DB Fix Console (Press ESC to Close)", $CONSOLE_WIDTH, $CONSOLE_HEIGHT, 302, 218, BitOR($GUI_SS_DEFAULT_GUI,$WS_MAXIMIZEBOX,$WS_SIZEBOX,$WS_THICKFRAME,$WS_TABSTOP))
GUISetBkColor($CONSOLE_BACK_COLOR, $frmConsole)
Global $ctlEditBox = GUICtrlCreateEdit("", 0, 0, $CONSOLE_WIDTH, $CONSOLE_HEIGHT, BitOR($ES_AUTOVSCROLL,$ES_AUTOHSCROLL,$ES_WANTRETURN,$ES_READONLY,$WS_VSCROLL,$WS_HSCROLL))
GUICtrlSetResizing($ctlEditBox, $GUI_DOCKLEFT+$GUI_DOCKRIGHT+$GUI_DOCKTOP+$GUI_DOCKBOTTOM)
GUICtrlSetFont($ctlEditBox, 11, 500, 0, "Consolas")
GUICtrlSetColor($ctlEditBox, $CONSOLE_FORE_COLOR)
GUICtrlSetBkColor($ctlEditBox, $CONSOLE_BACK_COLOR)
GUISetState(@SW_SHOW, $frmConsole)
#EndRegion GUI Console

Global $FAKE_DATA_BASE


_example()

While 1
    If GUIGetMsg() = $GUI_EVENT_CLOSE Then ExitLoop
WEnd


Func _example()
    Local $DATA0 = "Tytuł: to jest treść testowa"

    write_console_line(@ScriptLineNumber & ": " & $DATA0 & " ($DATA0)")
    write_console_line("    " & StringToBinary($DATA0, $SB_UTF8) & @CRLF)

    _DB_Store_Wrong_Way($DATA0)

    write_console_line(@ScriptLineNumber & ": " & $FAKE_DATA_BASE & " ($FAKE_DATA_BASE)" & @CRLF)

    Local $DATA1 = _DB_GetData_Proper_Way()

    write_console_line(@ScriptLineNumber & ": " & $DATA1 & " ($DATA1)")
    write_console_line("    " & StringToBinary($DATA1, $SB_UTF8) & @CRLF)

    _DB_Store_Proper_Way($DATA1)
    _DB_Imp()

    Local $DATA2 = _DB_GetData_Proper_Way()

    write_console_line(@ScriptLineNumber & ": " & $DATA2 & " ($DATA2)")
    write_console_line("    " & StringToBinary($DATA2, $SB_UTF8) & @CRLF)

    _DB_FIX()

    Local $DATA3 = _DB_GetData_Proper_Way()

    write_console_line("=====  After DB Fix  =====" & @CRLF)
    write_console_line("$DATA0             = " & $DATA0)
    write_console_line("                   = " & StringToBinary($DATA0, $SB_UTF8) & @CRLF)
    write_console_line("$DATA3             = " & $DATA3)
    write_console_line("                   = " & StringToBinary($DATA0, $SB_UTF8) & @CRLF)
    write_console_line("($DATA0 == $DATA3) = " & ($DATA0 == $DATA3))

EndFunc

Func _DB_Store_Wrong_Way($DATA)
    $FAKE_DATA_BASE = StringToBinary($DATA)
EndFunc   ;==>_DB_Store_Wrong_Way

Func _DB_GetData_Proper_Way()
    Return BinaryToString($FAKE_DATA_BASE, $SB_UTF8)
EndFunc   ;==>_DB_GetData_Proper_Way

Func _DB_Store_Proper_Way($DATA)
    $FAKE_DATA_BASE = StringToBinary($DATA, $SB_UTF8)
EndFunc   ;==>_DB_Store_Proper_Way

Func _DB_Imp()
    Local $DATA = $FAKE_DATA_BASE
    $DATA = BinaryToString($DATA)
    _DB_Store_Proper_Way($DATA)
EndFunc   ;==>_DB_Imp

Func _DB_FIX()
    Local $sData = _DB_GetData_Proper_Way()

    ;Since I don't know the language, this is just an example that translates
    ;the characters back to their original format based on some sort of
    ;user-defined language-specific rules.
    $sData = StringReplace($sData, "ul", "uł", 0, True)
    $sData = StringReplace($sData, "sc", "ść", 0, True)

    _DB_Store_Proper_Way($sData)
EndFunc   ;==>_DB_FIX

Func write_console_line($sMsg = "")
    GUICtrlSetData($ctlEditBox, $sMsg & @CRLF, 1)
EndFunc

 

Output:

36: Tytuł: to jest treść testowa ($DATA0)
    0x54797475C5823A20746F206A65737420747265C59BC48720746573746F7761

41: 0x547974756C3A20746F206A65737420747265736320746573746F7761 ($FAKE_DATA_BASE)

45: Tytul: to jest tresc testowa ($DATA1)
    0x547974756C3A20746F206A65737420747265736320746573746F7761

53: Tytul: to jest tresc testowa ($DATA2)
    0x547974756C3A20746F206A65737420747265736320746573746F7761

=====  After DB Fix  =====

$DATA0             = Tytuł: to jest treść testowa
                   = 0x54797475C5823A20746F206A65737420747265C59BC48720746573746F7761

$DATA3             = Tytuł: to jest treść testowa
                   = 0x54797475C5823A20746F206A65737420747265C59BC48720746573746F7761

($DATA0 == $DATA3) = True

 

Edited by TheXman

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...