Jump to content
ShawnW

Need help with File Encoding

Recommended Posts

ShawnW

I have a script that takes a large excel file, pulls out and reorganizes certain information I need, and spits out a trimmed down csv file which I uses to upload the information on my website. Some of this information contains characters with accents or em dashes. By default it would create a csv file in ANSI which I then uploaded but had to tell my website import system it was windows-1252 in order for it to look correct.

This was all working fine except now I need to add in a non-breaking space and non-breaking hyphen into parts of my output. At first I tried using ChrW(0xA0) and ChrW(0x2011) as replacements. A quick test in the console looked correct, however opening the csv output in notepad++ showed the space correctly but a ? for the hyphen and the file was still encoded as ANSI. I tried to view it as UTF-8 instead but this just made the space appear as xAO and also other characters appeared that way like my em dashes appeared as x97 and another symbol as xA7 etc.

If I instead do a convert to UTF-8 from notepad++ then those problems go away except the hyphen still displays as ?. I then noticed on the page I linked for the non-breaking hyphen it lists the UTF-8 hex as 0xE2 0x80 0x91 (e28091). I was unsure how to enter this in autoit but several things i tried all failed to get the hyphen inserted.

I need a way to get both the space and hyphen added correctly as either ANSI or UTF-8, but if it is UTF-8 then I need a way to convert all of the other data I extracted from the excel file.

I've included a test excel file with a single line and test script to create a csv demonstrating the problem.

test.xlsx

test.au3

Share this post


Link to post
Share on other sites
Subz

What about:

#include <Array.au3>
#include <Excel.au3>

$excelFile = FileOpenDialog("Select test file", @DesktopDir, "Excel files (*.xls*)")

$oExcel = _Excel_Open(False, False, False, False, True)
$oWorkbook = _Excel_BookOpen($oExcel, $excelFile, True, False)
$table = _Excel_RangeRead($oWorkbook, Default, Default, 1, True)
_Excel_BookClose($oWorkbook)

$table[0][0] &= ". pp." & ChrW(0xA0) & $table[0][1] & ChrW(0x2014) & $table[0][2]
_ArrayColDelete($table, 2)
_ArrayColDelete($table, 1)

$sTable = _ArrayToString($table, ",")
$hFileOpen = FileOpen(@DesktopDir & "\test.csv", 258)
FileWrite($hFileOpen, $sTable)
FileClose($hFileOpen)

 

Share this post


Link to post
Share on other sites
ShawnW

Thanks for the suggestion.

The first problem there is that 0x2014 is the Em Dash that although present in the excel file data, is not what i need when I add the hyphen in that line. I need the shorter dash like on the keyboard except one that is treated as part of the word when wrapping. I'm thinking it is 0x2011 but I cannot get that to display. In researching the dashes I should honestly be using an En Dash (0x2013) for showing a range of numbers, but that breaks in wrapping, and I didn't get to set the requirements for this, I just have to come up with the solution.

Secondly, FileWrite on the array does add commas for the csv but that is all it does. It does not add quotes around cells that contain a comma already, and if it did, it wouldn't escape existing quotes either. I didn't have a line like this in my example because I did not consider a different way of creating the csv, but I do have lines that would require a more accurate csv creation.

Share this post


Link to post
Share on other sites
Subz

What happens if you use the following after Excel has created the file?  Not sure about the hyphen, if I get a chance will look into it further.

$hFileOpen = FileOpen(@DesktopDir & "\test.csv", 257)
FileClose($hFileOpen)

 

Share this post


Link to post
Share on other sites
ShawnW

That didn't seem to change the encoding. I did play with it some more though based on this idea. It seems neither that nor writing to the file in that method changed it to UTF8. I did find that if i read it, then open it in overwrite mode it and write it then it changes to UTF8.

$str = FileRead(@DesktopDir & "\test.csv")
$hFileOpen = FileOpen(@DesktopDir & "\test.csv",130)
FileWrite($hFileOpen, $str)
FileClose($hFileOpen)

But that still leaves the problem of the hyphen.

Share this post


Link to post
Share on other sites
Subz

Sorry was in a rush to get out the door this morning, have you tried uploading the file using ChrW(0x2011)?  If you open it in Excel the spreadsheet renders it correctly, in Notepad it will show special characters because not all fonts support ChrW(0x2011) for example Sans-Serif.

Share this post


Link to post
Share on other sites
jchd

I highly recommend DejaVu Sans Mono to be used in Notepad++ (and Scite as well). It covers a large range of Unicode and is particularly readable for programming and general text edit. Both mentionned codepoints show correctly, among many others.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
ShawnW
15 hours ago, Subz said:

Sorry was in a rush to get out the door this morning, have you tried uploading the file using ChrW(0x2011)?  If you open it in Excel the spreadsheet renders it correctly, in Notepad it will show special characters because not all fonts support ChrW(0x2011) for example Sans-Serif.

If try and upload it still shows as '?'. Also, I don't get the same result after opening in Excel 2016. Running on test file and opening in Excel shows the '?' on both the UTF8 and ANSI versions.

I tried something simpler and copied the actual hyphen and just added it in string form. It looks correct in the scite editor, and if I output it to the console when running it shows correctly there as well. Yet after Excel writes it to the csv it breaks, and no matter where I open it, or what font I use, it shows as a '?'.

It seems the problem is in the way excel makes the csv, I may just do a more complex csv manual creation building on your 1st example. It seems the display problem i had with the hyphen after writing the file without excel was indeed the font problem you refer to. If I can't do it that way though I may just have to fight every OCD bone in my body and add it as the html code &#x2011; since it is only ever going to be displayed on a webpage that way.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • yasha
      By yasha
      i want am trying to select a nimber to run a program and then select where to save the excel result at before hand
      the problem is that it does not save in the folder i want but the folder before any solutions
      #.................
      $sFolder = ""
          ; Create a constant variable in Local scope of the message to display in FileSelectFolder.
          Local Const $sMessage = "Select a folder"
          ; Display an open dialog to select a file.
          $sFileSelectFolder = FileSelectFolder($sMessage, $sFolder)
          If @error Then
              ; Display the error message.
              MsgBox($MB_SYSTEMMODAL, "", "No folder was selected.")
          Else
              ; Display the selected folder.
              MsgBox($MB_SYSTEMMODAL, "", "You chose the following folder:" & @CRLF & $sFileSelectFolder)
          EndIf
      .......
      ........
      $oExcel = ObjCreate("Excel.Application")                   ; Create an Excel Object
      $oExcel.Visible = 1                                        ; Let Excel show itself
      $oExcel.Workbooks.Open("J:\OPS\OPS_Share\Planners\2 - Weekly Reports\Auto download\"& $YY & $MM & $DD & " ORDER.xls",0)
      $oExcel.ActiveWorkbook.Saveas ( $sFileSelectFolder,""& $YY & $MM & $DD & " ORDER.xlsx", 1)
      $oExcel.ActiveWorkBook.Close
      $oExcel.Quit
      i only want to save it as ddmmyy order inside documents but it saves in libraries as documents ddmmyy order.
    • PiyushJhawar
      By PiyushJhawar
      I am part of QA team of an analytics application. We support third party tools like Excel , Tableue .
      I have to write automation script that connect Excel to our analytics application. In short i want below to automate
      > Open Excel
      > Click on "Data" option available in header and then click on "From Other Services " then click on "From Analysis Services"
      > It will open pop up and then need to write username password there.
      I am new in this tool . Can any one please provide me link of any document that help me to create above script
    • nooneclose
      By nooneclose
      How to use _Excel_RangeSort to sort my excel file by three different headers Column A1, B1, and C1 have headers on which I want to sort by. The headers on which I want to sort are department, employee type, and name.
      I still really new to AutoIt so I do not actually know how to properly start this line or lines of code, to be honest. The example code is the best I can do.
      _Excel_RangeSort($OpenWorkbook, Default, "A1:C1", "1:1", $xlDescending, Default, $xlYes, Default, $xlSortRows) I just need to sort by those three headers in that order of department, employee type, and name, plus in descending order.
       
      any and all help would be greatly appreciated.  Thank you!
    • nooneclose
      By nooneclose
      My program has to first search for names in Column D that do not match up with column C. I got that search to work using arrays. It was slow and I could not figure out how to delete them so I just manually put coded the names that do not belong. I found their cell location but I do not know how to store that location and delete it.
      This is what I have so far.
      Local $NameToDelete1[6]  = _Excel_RangeFind($OpenWorkbook, "Smith, Bill") _ArrayDisplay($NameToDelete1, "Excel UDF: _Excel_RangeFind Example 1", "", 0, "|", "Sheet|Name|Cell|Value|Formula|Comment") _Excel_RangeDelete($OpenWorkbook.ActiveSheet, $NameToDelete1[2], $xlShiftUp)  
      Please help, I wanted to have this program done yesterday but I did not see this problem until yesterday. 
    • nooneclose
      By nooneclose
      How do I properly convert this to Autoit? This is a VBA macro that I recorded in Excel.
       ActiveSheet.Outline.ShowLevels RowLevels:=2 I need this to close my subtotal once it is finished. 
      any help will be greatly appreciated. 
×