Jump to content
ShawnW

Need help with File Encoding

Recommended Posts

I have a script that takes a large excel file, pulls out and reorganizes certain information I need, and spits out a trimmed down csv file which I uses to upload the information on my website. Some of this information contains characters with accents or em dashes. By default it would create a csv file in ANSI which I then uploaded but had to tell my website import system it was windows-1252 in order for it to look correct.

This was all working fine except now I need to add in a non-breaking space and non-breaking hyphen into parts of my output. At first I tried using ChrW(0xA0) and ChrW(0x2011) as replacements. A quick test in the console looked correct, however opening the csv output in notepad++ showed the space correctly but a ? for the hyphen and the file was still encoded as ANSI. I tried to view it as UTF-8 instead but this just made the space appear as xAO and also other characters appeared that way like my em dashes appeared as x97 and another symbol as xA7 etc.

If I instead do a convert to UTF-8 from notepad++ then those problems go away except the hyphen still displays as ?. I then noticed on the page I linked for the non-breaking hyphen it lists the UTF-8 hex as 0xE2 0x80 0x91 (e28091). I was unsure how to enter this in autoit but several things i tried all failed to get the hyphen inserted.

I need a way to get both the space and hyphen added correctly as either ANSI or UTF-8, but if it is UTF-8 then I need a way to convert all of the other data I extracted from the excel file.

I've included a test excel file with a single line and test script to create a csv demonstrating the problem.

test.xlsx

test.au3

Share this post


Link to post
Share on other sites

What about:

#include <Array.au3>
#include <Excel.au3>

$excelFile = FileOpenDialog("Select test file", @DesktopDir, "Excel files (*.xls*)")

$oExcel = _Excel_Open(False, False, False, False, True)
$oWorkbook = _Excel_BookOpen($oExcel, $excelFile, True, False)
$table = _Excel_RangeRead($oWorkbook, Default, Default, 1, True)
_Excel_BookClose($oWorkbook)

$table[0][0] &= ". pp." & ChrW(0xA0) & $table[0][1] & ChrW(0x2014) & $table[0][2]
_ArrayColDelete($table, 2)
_ArrayColDelete($table, 1)

$sTable = _ArrayToString($table, ",")
$hFileOpen = FileOpen(@DesktopDir & "\test.csv", 258)
FileWrite($hFileOpen, $sTable)
FileClose($hFileOpen)

 

Share this post


Link to post
Share on other sites

Thanks for the suggestion.

The first problem there is that 0x2014 is the Em Dash that although present in the excel file data, is not what i need when I add the hyphen in that line. I need the shorter dash like on the keyboard except one that is treated as part of the word when wrapping. I'm thinking it is 0x2011 but I cannot get that to display. In researching the dashes I should honestly be using an En Dash (0x2013) for showing a range of numbers, but that breaks in wrapping, and I didn't get to set the requirements for this, I just have to come up with the solution.

Secondly, FileWrite on the array does add commas for the csv but that is all it does. It does not add quotes around cells that contain a comma already, and if it did, it wouldn't escape existing quotes either. I didn't have a line like this in my example because I did not consider a different way of creating the csv, but I do have lines that would require a more accurate csv creation.

Share this post


Link to post
Share on other sites

That didn't seem to change the encoding. I did play with it some more though based on this idea. It seems neither that nor writing to the file in that method changed it to UTF8. I did find that if i read it, then open it in overwrite mode it and write it then it changes to UTF8.

$str = FileRead(@DesktopDir & "\test.csv")
$hFileOpen = FileOpen(@DesktopDir & "\test.csv",130)
FileWrite($hFileOpen, $str)
FileClose($hFileOpen)

But that still leaves the problem of the hyphen.

Share this post


Link to post
Share on other sites

Sorry was in a rush to get out the door this morning, have you tried uploading the file using ChrW(0x2011)?  If you open it in Excel the spreadsheet renders it correctly, in Notepad it will show special characters because not all fonts support ChrW(0x2011) for example Sans-Serif.

Share this post


Link to post
Share on other sites

I highly recommend DejaVu Sans Mono to be used in Notepad++ (and Scite as well). It covers a large range of Unicode and is particularly readable for programming and general text edit. Both mentionned codepoints show correctly, among many others.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
15 hours ago, Subz said:

Sorry was in a rush to get out the door this morning, have you tried uploading the file using ChrW(0x2011)?  If you open it in Excel the spreadsheet renders it correctly, in Notepad it will show special characters because not all fonts support ChrW(0x2011) for example Sans-Serif.

If try and upload it still shows as '?'. Also, I don't get the same result after opening in Excel 2016. Running on test file and opening in Excel shows the '?' on both the UTF8 and ANSI versions.

I tried something simpler and copied the actual hyphen and just added it in string form. It looks correct in the scite editor, and if I output it to the console when running it shows correctly there as well. Yet after Excel writes it to the csv it breaks, and no matter where I open it, or what font I use, it shows as a '?'.

It seems the problem is in the way excel makes the csv, I may just do a more complex csv manual creation building on your 1st example. It seems the display problem i had with the hyphen after writing the file without excel was indeed the font problem you refer to. If I can't do it that way though I may just have to fight every OCD bone in my body and add it as the html code &#x2011; since it is only ever going to be displayed on a webpage that way.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • By Rskm
      Hi, I am using excel as input media for my program. The excel file (i tried with .xls, .xlsx and .xlsm format) has inputs which the autoit script reads during the run and performs few calculations. Some times (not always), after the run, when i try to open the excel file manually, the file doesnt open at all in excel. see the screenshot attached. However, if the execute the autoit script, the scripts still reads the existing data from that excel and performs the calcs. I copied the excel file to another computer and there too, it doesnt open.  So, after this, i cannot edit the excel forever (if i need to change any inputs). It is only this particular file that got affected. other excel files works normal.  What could be the problem here.  please help as this is a new challenge for me during my program development. 

    • By Taxyo
      Hi,
       
      I've been trying to automate modification of an excel file and the last thing I am stuck on is deleting all the rows where the value of Column 13 is 0. 
      I believe the error is due to me not fully understanding the syntax so this is where I'm stuck: 
       
      Func Hotkey2() Global $aUsedRange = _Excel_RangeRead($oWorkbook, 1) _ArrayDisplay($aUsedRange) For $iRow = UBound($aUsedRange) - 1 to 3 Step -1 If $aUsedRange[$iRow][13] = 0 Then _Excel_RangeDelete($oWorkbook.Worksheets(1), $aUsedRange[$iRow] & ":" & $aUsedRange[$iRow], default, 1) If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeDelete Example 2", "Error deleting rows." & @CRLF & "@error = " & @error & ", @extended = " & @extended) EndIf Next EndFunc  
      While my script properly locates the row which contains value 0 in Column 13, I am not sure how to set it to the corresponding row in the excel workbook?  My above experiment gives me $vRange error and I've been toying around with it to no avail. The only way I get the Script to delete a row is by actually specifying "4:4" or "6:8" etc. 
      Where am I going wrong?
       
      Thanks! 
    • By Most
      #include <Array.au3> #include <Excel.au3> #include <MsgBoxConstants.au3> ; Create application object and open an example workbook Local $oExcel = _Excel_Open() If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeRead Example", "Error creating the Excel application object." & @CRLF & "@error = " & @error & ", @extended = " & @extended) Local $oWorkbook = _Excel_BookOpen($oExcel, @ScriptDir & "\trans.xlsx") If @error Then MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeRead Example", "Error opening workbook '" & @ScriptDir & "\trans.xlsx'." & @CRLF & "@error = " & @error & ", @extended = " & @extended) _Excel_Close($oExcel) Exit EndIf ; ***************************************************************************** ; Read data from a single cell on the active sheet of the specified workbook ; ***************************************************************************** Local $sResult = _Excel_RangeRead($oWorkbook, Default, "A1") If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeRead Example 1", "Error reading from workbook." & @CRLF & "@error = " & @error & ", @extended = " & @extended) MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeRead Example 1", "Data successfully read." & @CRLF & "Value of cell A1: " & $sResult) Hi, all.
      Ok, here is the deal. I have simple excel file called trans.xlsx. It's located in the directory of script. In general i don't care where to store it. 
      What i do need is to open excel file and copy one by one numbers from cells. I've tried different ways, examples. But i only get error, says: error = 3, extended = 1. I saw different posts from different years. I even tried to use simple example from manual file. But always get error.

      In general my goal get numbers one by one and post it to let's say search filed in my PC one by one. Or to notepad (but one by one, in kind of loop). 
      I've learned how to copy or show in message box some info from other apps. But with excel i'm stuck. 

      I'm able to open needed window based on "title" of excel. But i don't succeed of copying info from cells. 

      Would be appreciate for any help. 
      So, in this code i'm trying at least to read from cell A1. Doesn't matter what Sheet. 

      I use Windows 10, Excel for Office 365. 
      Thank you in advance. 
    • By 3Jake
      Hello Smart People!
      I have a multi-column .CSV that I would like to draw information from, in order to populate (send) that info to some fields in an Oracle form.  This would be for account-creation in Oracle.

      I have the basic script to navigate (tab) through the fields in the Oracle form, but I am SUPER-new to AutoIT and don’t quite see how to get the script to import and use variables from a .CSV

      For example:

      My script just waits for the page to be active, then enters data and tabs between fields like this:

      ; Wait for the window to be active WinWaitActive("Oracle is Cool - E-Business is the Best") ; User Name Send("jsmith") Send("{TAB}") ; Password Send("Password1") Send("{TAB}") Send("Password1") Send("{TAB}") ; Description Send("John Smith") Send("{TAB}") (etc. etc. etc.…)

      The winning solution would take one row at a time, feed it into fields on the Oracle form, save, and then start in on the next row

      It looks like I’d need a combination of “FileReadToArray” ( or“FileReadLine”?) and “StringSplit” to loop through the values and set them to variables, which would then replace my current "hard-coded" values to be typed them into the form?

      I found the following example in the “Help” snippets for stringsplit()… this seems like it’s in the ballpark but I’m having some trouble wrapping this around what I’ve put together, since each entry in the array would have multiple elements per line instead of just “Day”:

      Func Example() Local $aDays = StringSplit("Mon,Tues,Wed,Thur,Fri,Sat,Sun", ",") ;Split the string of days using the delimiter "," and the default flag value. #cs The array returned will contain the following values: $aDays[1] = "Mon" $aDays[2] = "Tues" $aDays[3] = "Wed" ... $aDays[7] = "Sun" #ce For $i = 1 To $aDays[0] ; Loop through the array returned by StringSplit to display the individual values. MsgBox($MB_SYSTEMMODAL, "", "$aDays[" & $i & "] - " & $aDays[$i]) Next EndFunc ;==>Example

      This post also seems like a similar example, but I don’t need it to be so selective RE: one specific column:
      https://www.autoitscript.com/forum/topic/166261-how-to-read-csv-specific-row-and-columns


      Any advice would be appreciated!  Thanks for taking a minute to look!  
    • By VinMe
      Dear all, i am unable to open a xml file to excel in the "xml table format" Please help me out in where i am missing
      Local $strFileToOpen = _WinAPI_OpenFileDlg('Select xml file', @WorkingDir, 'All Files(*.*)', 1, '', '', BitOR($OFN_PATHMUSTEXIST, $OFN_FILEMUSTEXIST, $OFN_HIDEREADONLY)) Global $xlXmlLoadImportToList = 2 ; Places the contents of the XML data file in an XML table $oExcel = _Excel_Open() $oWorkbook1=$oExcel.Workbooks.OpenXML($strFileToOpen, "", $xlXmlLoadImportToList) If $strFileToOpen <> False Then     Local $oWorkbook1 = _Excel_BookOpen($oExcel, $strFileToOpen) EndIf Error i am getting is:
      ......\81e_Compare_v1.au3" (46) : ==> The requested action with this object has failed.:
      $oWorkbook1=$oExcel.Workbooks.OpenXML($strFileToOpen, "", $xlXmlLoadImportToList)
      $oWorkbook1=$oExcel.Workbooks^ ERROR
      >Exit code: 1    Time: 7.338
×
×
  • Create New...