Jump to content

Remove Unicode (BOM) from beginning of string?


Recommended Posts

Hi all,

I'm reading the file https://msedgedriver.azureedge.net/LATEST_RELEASE_83 using WinHTTP, and the string I get back looks like this when I view this file in Scite --

xEFxBFxBF83.0.478.54

Here's what I've come up with as a means to strip off the Unicode BOM from the beginning of the string --

If Asc($sDriverLatest) = 255 Then
    ConsoleWrite("Unicode detected!" & @CRLF)
    $sDriverLatest = BinaryToString($sDriverLatest, $SB_UTF16LE)
    If Asc($sDriverLatest) = 63 Then $sDriverLatest = StringMid($sDriverLatest,2) ; skip ?
EndIf
ConsoleWrite("Latest=" & $sDriverLatest & @CRLF)

Is there a more reliable way to eliminate these unicode characters?

Dan

Link to comment
Share on other sites

  • Developers

This is more or less what I do in AutoIt3Wrapper to detect the different BOM options and later strip the inputfile, and later add it back.

Local $hTest_UTF = FileOpen("filename", 16)
Local $Test_UTF = FileRead($hTest_UTF, 4)
Local $i_Rec_Param, $i_Rec_Value, $Temp_Val
FileClose($hTest_UTF)
;~ 00 00 FE FF UTF-32, big-endian
;~ FF FE 00 00 UTF-32, little-endian
;~ FE FF UTF-16, big-endian
;~ FF FE UTF-16, little-endian
;~ EF BB BF UTF-8
Select
    Case BinaryMid($Test_UTF, 1, 4) = '0x0000FEFF'                   ; UTF-32 BE
        $UTFtype = '32BE'
    Case BinaryMid($Test_UTF, 1, 4) = '0xFFFE0000'                   ; UTF-32 LE
        $UTFtype = '32LE'
    Case BinaryMid($Test_UTF, 1, 2) = '0xFEFF'                       ; UTF-16 BE
        $UTFtype = '16BE'
    Case BinaryMid($Test_UTF, 1, 2) = '0xFFFE'                       ; UTF-16 LE
        $UTFtype = '16LE'
    Case BinaryMid($Test_UTF, 1, 3) = '0xEFBBBF'                     ; UTF-8
        $UTFtype = '8'
    Case Else
        $UTFtype = ''
EndSelect

Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

Thanks @Jos. This is what I ended up with (subset of _WD_UpdateDriver) --

Case 'msedge'
                $sVersionShort = StringLeft($sBrowserVersion, StringInStr($sBrowserVersion, ".") - 1)
                $sDriverLatest = __WD_Get('https://msedgedriver.azureedge.net/LATEST_RELEASE_' & $sVersionShort, 2) ; 2 = binary data

                If @error = $_WD_ERROR_Success Then
                    Select
                        Case BinaryMid($sDriverLatest, 1, 4) = '0x0000FEFF'                   ; UTF-32 BE
                            $iStartPos = 5
                            $iConversion = $SB_UTF16LE
                        Case BinaryMid($sDriverLatest, 1, 4) = '0xFFFE0000'                   ; UTF-32 LE
                            $iStartPos = 5
                            $iConversion = $SB_UTF16LE
                        Case BinaryMid($sDriverLatest, 1, 2) = '0xFEFF'                       ; UTF-16 BE
                            $iStartPos = 3
                            $iConversion = $SB_UTF16BE
                        Case BinaryMid($sDriverLatest, 1, 2) = '0xFFFE'                       ; UTF-16 LE
                            $iStartPos = 3
                            $iConversion = $SB_UTF16LE
                        Case BinaryMid($sDriverLatest, 1, 3) = '0xEFBBBF'                     ; UTF-8
                            $iStartPos = 4
                            $iConversion = $SB_UTF8
                        Case Else
                            $iStartPos = 1
                            $iConversion = $SB_ANSI
                    EndSelect

                    $sDriverLatest = StringStripWS(BinaryToString(BinaryMid($sDriverLatest, $iStartPos), $iConversion), $STR_STRIPTRAILING)
                    $sURLNewDriver = "https://msedgedriver.azureedge.net/" & $sDriverLatest & "/edgedriver_"
                    $sURLNewDriver &= ($lFlag64) ? "win64.zip" : "win32.zip"
                Else
                    $iErr = $_WD_ERROR_GeneralError
                EndIf
        EndSwitch

It appears to work for the current file, but honestly not sure that I've got the $iConversion stuff correct.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...