Jump to content
ShawnW

Need help with File Encoding

Recommended Posts

I have a script that takes a large excel file, pulls out and reorganizes certain information I need, and spits out a trimmed down csv file which I uses to upload the information on my website. Some of this information contains characters with accents or em dashes. By default it would create a csv file in ANSI which I then uploaded but had to tell my website import system it was windows-1252 in order for it to look correct.

This was all working fine except now I need to add in a non-breaking space and non-breaking hyphen into parts of my output. At first I tried using ChrW(0xA0) and ChrW(0x2011) as replacements. A quick test in the console looked correct, however opening the csv output in notepad++ showed the space correctly but a ? for the hyphen and the file was still encoded as ANSI. I tried to view it as UTF-8 instead but this just made the space appear as xAO and also other characters appeared that way like my em dashes appeared as x97 and another symbol as xA7 etc.

If I instead do a convert to UTF-8 from notepad++ then those problems go away except the hyphen still displays as ?. I then noticed on the page I linked for the non-breaking hyphen it lists the UTF-8 hex as 0xE2 0x80 0x91 (e28091). I was unsure how to enter this in autoit but several things i tried all failed to get the hyphen inserted.

I need a way to get both the space and hyphen added correctly as either ANSI or UTF-8, but if it is UTF-8 then I need a way to convert all of the other data I extracted from the excel file.

I've included a test excel file with a single line and test script to create a csv demonstrating the problem.

test.xlsx

test.au3

Share this post


Link to post
Share on other sites

What about:

#include <Array.au3>
#include <Excel.au3>

$excelFile = FileOpenDialog("Select test file", @DesktopDir, "Excel files (*.xls*)")

$oExcel = _Excel_Open(False, False, False, False, True)
$oWorkbook = _Excel_BookOpen($oExcel, $excelFile, True, False)
$table = _Excel_RangeRead($oWorkbook, Default, Default, 1, True)
_Excel_BookClose($oWorkbook)

$table[0][0] &= ". pp." & ChrW(0xA0) & $table[0][1] & ChrW(0x2014) & $table[0][2]
_ArrayColDelete($table, 2)
_ArrayColDelete($table, 1)

$sTable = _ArrayToString($table, ",")
$hFileOpen = FileOpen(@DesktopDir & "\test.csv", 258)
FileWrite($hFileOpen, $sTable)
FileClose($hFileOpen)

 

Share this post


Link to post
Share on other sites

Thanks for the suggestion.

The first problem there is that 0x2014 is the Em Dash that although present in the excel file data, is not what i need when I add the hyphen in that line. I need the shorter dash like on the keyboard except one that is treated as part of the word when wrapping. I'm thinking it is 0x2011 but I cannot get that to display. In researching the dashes I should honestly be using an En Dash (0x2013) for showing a range of numbers, but that breaks in wrapping, and I didn't get to set the requirements for this, I just have to come up with the solution.

Secondly, FileWrite on the array does add commas for the csv but that is all it does. It does not add quotes around cells that contain a comma already, and if it did, it wouldn't escape existing quotes either. I didn't have a line like this in my example because I did not consider a different way of creating the csv, but I do have lines that would require a more accurate csv creation.

Share this post


Link to post
Share on other sites

That didn't seem to change the encoding. I did play with it some more though based on this idea. It seems neither that nor writing to the file in that method changed it to UTF8. I did find that if i read it, then open it in overwrite mode it and write it then it changes to UTF8.

$str = FileRead(@DesktopDir & "\test.csv")
$hFileOpen = FileOpen(@DesktopDir & "\test.csv",130)
FileWrite($hFileOpen, $str)
FileClose($hFileOpen)

But that still leaves the problem of the hyphen.

Share this post


Link to post
Share on other sites

Sorry was in a rush to get out the door this morning, have you tried uploading the file using ChrW(0x2011)?  If you open it in Excel the spreadsheet renders it correctly, in Notepad it will show special characters because not all fonts support ChrW(0x2011) for example Sans-Serif.

Share this post


Link to post
Share on other sites

I highly recommend DejaVu Sans Mono to be used in Notepad++ (and Scite as well). It covers a large range of Unicode and is particularly readable for programming and general text edit. Both mentionned codepoints show correctly, among many others.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
15 hours ago, Subz said:

Sorry was in a rush to get out the door this morning, have you tried uploading the file using ChrW(0x2011)?  If you open it in Excel the spreadsheet renders it correctly, in Notepad it will show special characters because not all fonts support ChrW(0x2011) for example Sans-Serif.

If try and upload it still shows as '?'. Also, I don't get the same result after opening in Excel 2016. Running on test file and opening in Excel shows the '?' on both the UTF8 and ANSI versions.

I tried something simpler and copied the actual hyphen and just added it in string form. It looks correct in the scite editor, and if I output it to the console when running it shows correctly there as well. Yet after Excel writes it to the csv it breaks, and no matter where I open it, or what font I use, it shows as a '?'.

It seems the problem is in the way excel makes the csv, I may just do a more complex csv manual creation building on your 1st example. It seems the display problem i had with the hyphen after writing the file without excel was indeed the font problem you refer to. If I can't do it that way though I may just have to fight every OCD bone in my body and add it as the html code &#x2011; since it is only ever going to be displayed on a webpage that way.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • By TheDcoder
      Hello everyone, I am working on a project which requires reading a few values from Excel, the catch is that I need it to be very fast... unfortunatley I found out that read operations using the supplied Excel UDF are very slow, more than 150 ms for each operation on average
      Here is my testing setup that I made:
      #include <Excel.au3> #include <MsgBoxConstants.au3> Global $iTotalTime = 0 Test() Func Test() Local $oExcel = _Excel_Open() Local $oBook = _Excel_BookAttach("Test.xlsx", "FileName", $oExcel) Local $sSheet = "Sheet1" If @error Then Return MsgBox($MB_ICONERROR, "Excel Failed", "Failed to attach to Excel") Local $iNum For $iRow = 1 To 6 Time() Local $iNum = Number(_Excel_RangeRead($oBook, $sSheet, "A" & $iRow)) If ($iNum = 1) Then ConsoleWrite("Row " & $iRow & " is 1 and value of column B is " & _Excel_RangeRead($oBook, $sSheet, "B" & $iRow)) Else ConsoleWrite("Row " & $iRow & " is not 1") EndIf ConsoleWrite(". Reading took: ") Time() Next ConsoleWrite("The whole operation took " & $iTotalTime & " milliseconds." & @CRLF) EndFunc Func Time() Local Static $hTimer Local Static $bRunning = False If $bRunning Then Local $iTime = Round(TimerDiff($hTimer), 2) $iTotalTime += $iTime ConsoleWrite($iTime & @CRLF) Else $hTimer = TimerInit() EndIf $bRunning = Not $bRunning EndFunc And Test.xlsx in CSV format:
      1,-1 -1,1 1,-1 1,1 -1,-1 1,1 Here is the actual xlsx but it should expire in a week: https://we.tl/t-EVkxGp1kc6
      And finally output from my script:
      Row 1 is 1 and value of column B is -1. Reading took: 276.06 Row 2 is not 1. Reading took: 163.36 Row 3 is 1 and value of column B is -1. Reading took: 302.58 Row 4 is 1 and value of column B is 1. Reading took: 294.65 Row 5 is not 1. Reading took: 152.33 Row 6 is 1 and value of column B is 1. Reading took: 284.92 The whole operation took 1473.9 milliseconds.  
      Taking ~1.5 seconds for reading 6 rows of data is bad for my script, which needs to run as fast as possible . It would be nice if I can bring this down to 100 ms somehow, I am not very experienced working with MS office so I thought about asking you folks for help and advice on how I can optimize my script to squeeze out every bit of performance that I can get from this script
       
      Thanks for the help in advance!
    • By SlackerAl
      I have been using some AutoIt scripts to manipulate Excel for a few weeks now. Today they stopped working. I have rebooted the PC and I'm not aware of any other significant changes. I can start and use Excel conventionally without a problem, but any attempt to create an excel object from AutoIt fails. E.g
      #include <MsgBoxConstants.au3> #include <Excel.au3> Local $oExcel = _Excel_Open() If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_Open Example 1", "Error creating the Excel application object." & @CRLF & "@error = " & @error & ", @extended = " & @extended) Produces:
      @error = 1
      @extended = -2147221005
      I can't find that COM error listed anywhere... Anyone have any ideas?
    • By Eggie6
      Hi,
      I'm trying to write an formula to the excel, but it actually does not write anything into the cell, any ideas?
      $formula1="=MID(B"&$utakmice&",SEARCH("-",B"&$utakmice&")-1,1)" _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $formula1, "C"&$smth)  
    • By XinYoung
      Hello all,
      Preface: Column A is full of courses. Column B is full of usernames. If course (A1) exists, we check to see if username (B1) is enrolled. If user is found, the user is un-enrolled. Then Loop.
      I am working on a script that loops through an Excel file and pastes the content from A1 into a textbox in IE, does some stuff, then pastes the content from B1 into a different textbox. Then it loops around until all used rows in columns A and B have been accounted for.
      For some reason, column A loops properly but column B doesn't. B1 is pasted over and over again. So, as it loops, B1 is constantly being pasted, first accompanying A1's loop, then A2, and so on.
      The OpenExcel() func opens the Excel file the user specifies in an earlier function. It's supposed to gather the entire used range of columns A and B. The SearchCourse() func only uses column A, pasting its content into a Search tool in IE. This seems to be working fine. SearchResult() puts "Course Not Found" into column C if the search fails. If the search is successful, however, we move onto... EnterCourse(). This simply gets us to the place where column B's content comes into play. UnenrollNow(). Here, we paste the variable $_userName into a textbox. I don't know why it's always B1  Func OpenExcel() If Not WinExists($hWnd) Then MsgBox(16, "Window closed", "The Internet Explorer window was closed. Please start over.") _Exit() Else $oExcel = _Excel_Open() Global $oWorkbook = _Excel_BookOpen($oExcel, $ChosenFileName, Default, True, True) $oExcel.Sheets("CopyCourses").Activate ;~ Get all used cells in columns A and B $aSearchItems = _Excel_RangeRead($oWorkbook, 1, $oWorkbook.Sheets("CopyCourses").Usedrange.Columns("A:B")) ;~ Create the $aSearchResult array ReDim $aSearchResult[UBound($aSearchItems)] ;~ Loop through the array starting at 0 until the end of the array which is (Ubound($aSearchItems) - 1) For $i = 0 To UBound($aSearchItems) - 1 ;~ Column 0 ConsoleWrite($aSearchItems[$i][0] & @CRLF) ;~ Column 1 ConsoleWrite($aSearchItems[$i][1] & @CRLF) $aSearchResult[$i] = SearchCourse($aSearchItems[$i][0], $aSearchItems[$i][1]) Next _Excel_RangeWrite($oWorkbook, Default, $aSearchResult, "C1") ;~ When the loop is complete, run the Finished function. Finished() EndIf EndFunc ;==>OpenExcel Func SearchCourse($_sSearchResult, $_userName) If Not WinExists($hWnd) Then MsgBox(16, "Window closed", "The Internet Explorer window was closed. Please start over.") _Exit() Else _IENavigate($oIE, $urlBBCourseSearch) _IELoadWait($oIE) Local $oForm = _IEGetObjByName($oIE, "courseManagerFormSearch") Local $oSearchString = _IEFormElementGetObjByName($oForm, "courseInfoSearchText") _IEFormElementSetValue($oSearchString, $_sSearchResult) _IEFormSubmit($oForm) _IELoadWait($oIE) Local $oBBTable = _IETableGetCollection($oIE, 2) $aBBTableData = _IETableWriteToArray($oBBTable) Return SearchResult() EndIf EndFunc ;==>SearchCourse Func SearchResult() If Not WinExists($hWnd) Then MsgBox(16, "Window closed", "The Internet Explorer window was closed. Please start over.") _Exit() Else Local $sResult $iSearchIndex = _ArraySearch($aBBTableData, "Course ID", 0, 0, 0, 1, 1, 0) If $iSearchIndex = -1 Then $sResult = "Course Not Found" _Excel_RangeWrite($oWorkbook, Default, $aSearchResult, "C1") Else EnterCourse() $sResult = "UnEnrolled!" _Excel_RangeWrite($oWorkbook, Default, $aSearchResult, "C1") EndIf Return $sResult EndIf EndFunc ;==>SearchResult Func EnterCourse() If Not WinExists($hWnd) Then MsgBox(16, "Window closed", "The Internet Explorer window was closed. Please start over.") _Exit() Else Local $clickFail = "DATE CREATED" _IELinkClickByIndex($oIE, 34) _IELoadWait($oIE) $sourceCode = _IEBodyReadHTML($oIE) If StringInStr($sourceCode, $clickFail) <> 0 Then _IELinkClickByIndex($oIE, 35) _IELoadWait($oIE) EndIf $oLink = _IEGetObjById($oIE, "controlpanel.users.and.groups_groupExpanderLink") _IEAction($oLink, "click") Sleep(500) _IELinkClickByText($oIE, "Users") _IELoadWait($oIE) $aSearchResult[$i] = UnenrollNow($aSearchItems[$i][0], $aSearchItems[$i][1]) EndIf EndFunc ;==>EnterCourse Func UnenrollNow($_sourceCourseId, $_userName) If Not WinExists($hWnd) Then MsgBox(16, "Window closed", "The Internet Explorer window was closed. Please start over.") _Exit() Else Local $UserError = "No users found" Local $sResult $criteriaUsername = _IEGetObjById($oIE, "userInfoSearchKeyString") _IEFormElementOptionSelect($criteriaUsername, 0, 1, 'byIndex') $criteriaUsername = _IEGetObjById($oIE, "userInfoSearchOperatorString") _IEFormElementOptionSelect($criteriaUsername, 0, 1, 'byIndex') ;Paste whats copied from column B into the Username text box. Local $oForm = _IEGetObjByName($oIE, "userManagerSearchForm") Local $oSearchString = _IEFormElementGetObjByName($oForm, "userInfoSearchText") ;PROBLEM HERE... _IEFormElementSetValue($oSearchString, $_userName) ;^^^^^^ WHY IS $_userName ALWAYS B1 ??? Sleep(1000) _IEFormSubmit($oForm) _IELoadWait($oIE) $sourceCode = _IEBodyReadHTML($oIE) If StringInStr($sourceCode, $UserError) <> 0 Then $sResult = "User Not Found" _Excel_RangeWrite($oWorkbook, Default, $aSearchResult, "C1") Return $sResult Else MsgBox(0, "Unenrollment READY!", "We're ready to unenroll foreal") $sResult = "DUN" _Excel_RangeWrite($oWorkbook, Default, $aSearchResult, "C1") Return $sResult EndIf EndIf EndFunc ;==>UnenrollNow Please let me know if any further information is needed.
      If you see other problems or redundancies in my code, please let me know.
      Thank you!
       
    • By SlackerAl
      I have an issue when starting Excel with the following code
      #include <Excel.au3> #include <GUIConstantsEx.au3> Opt("GUIOnEventMode", 1) GUICreate("Excel Test", 600, 440) GUISetOnEvent($GUI_EVENT_CLOSE, "MenuExit") GUISetState(@SW_SHOW) ; Create application object Local $oExcel = _Excel_Open() If @error Then Exit MsgBox(0, "Excel UDF: _Excel_RangeFind Example", "Error creating the Excel application object." & @CRLF & "@error = " & @error & ", @extended = " & @extended) ; sit here forever with an option to react every 10ms While 1 Sleep(10) WEnd Exit Func MenuExit() GUIDelete() Exit EndFunc If the Excel is a standard install, everything is OK. If Excel has the Kutools add-in (https://www.extendoffice.com/product/kutools-for-excel.html) installed and active the excel process runs (and is visible in task manager until killed), but it never displays. Disabling the add-in restores normal functionality.
      If I add some additional code to interact with the excel application then still nothing happens if the add-in is active. However, if Excel is started first and then the AutoIt code is run, it is able to interact with the Excel session as normal.
      Summary: The Excel add-in Kutools prevents excel being started with the _Excel_Open() command from AutoIt. Any AutoIt side work-arounds for this?
×
×
  • Create New...