I have a script that takes a large excel file, pulls out and reorganizes certain information I need, and spits out a trimmed down csv file which I uses to upload the information on my website. Some of this information contains characters with accents or em dashes. By default it would create a csv file in ANSI which I then uploaded but had to tell my website import system it was windows-1252 in order for it to look correct.
This was all working fine except now I need to add in a non-breaking space and non-breaking hyphen into parts of my output. At first I tried using ChrW(0xA0) and ChrW(0x2011) as replacements. A quick test in the console looked correct, however opening the csv output in notepad++ showed the space correctly but a ? for the hyphen and the file was still encoded as ANSI. I tried to view it as UTF-8 instead but this just made the space appear as xAO and also other characters appeared that way like my em dashes appeared as x97 and another symbol as xA7 etc.
If I instead do a convert to UTF-8 from notepad++ then those problems go away except the hyphen still displays as ?. I then noticed on the page I linked for the non-breaking hyphen it lists the UTF-8 hex as 0xE2 0x80 0x91 (e28091). I was unsure how to enter this in autoit but several things i tried all failed to get the hyphen inserted.
I need a way to get both the space and hyphen added correctly as either ANSI or UTF-8, but if it is UTF-8 then I need a way to convert all of the other data I extracted from the excel file.
I've included a test excel file with a single line and test script to create a csv demonstrating the problem.
Good morning guys
I'd like to know if there is a way to convert a PDF in CSV or, eventually, in TXT, in order to read from it, like a database...
I have a PDF and I think ( I dind't search a lot on the forum ) with AutoIt, but I'd like work with Excel styles...
Does anyone know a good program which convert PDF to CSV?
PS: the PDF file is 5 MB, and it contains 439 pages...
Thanks everyone for the help
I'd like to write a piece of tool that would allow me to update a certain field in our Active Directory from a comma separated csv file composed like this:
This file, automatically generated, can hold more than 10k lines.
Thus, I need column A to be in one variable, column B in a second one and column C in a third one.
I'm really missing this part as updating the AD is fairly easy once the 3 variable are populated.
I see things like this:
Here's my attempts at the moment:
#include <File.au3> #include <Array.au3> Global $csv_file = @DesktopDir & "\Book1.csv" Global $aRecords If Not _FileReadToArray($csv_file,$aRecords) Then MsgBox(4096,"Error", " Error reading log to Array error:" & @error) Exit EndIf For $x = 1 to $aRecords Msgbox(0,'Record:' & $x, $aRecords[$x]) ; Shows the line that was read from file $csv_line_values = StringSplit($aRecords[$x], ",",1) ; Splits the line into 2 or more variables and puts them in an array ; _ArrayDisplay($csv_line_values) ; Shows what's in the array you just created. ; $csv_line_values holds the number of elements in array ; $csv_line_values holds the value ; $csv_line_values holds the value ; etc Msgbox(0, 0, $csv_line_values) Next Any help on this please?
Thanks in advance
I've had issues in the past dealing with excel so I decided to cut out the middle man and build a script that would take any file that opens in excel (csv, xml, xls, etc) and convert it into an array so I can handle the raw data in a cleaner way. I used czardas' CSV parser to do this and added a simple save in excel to save it as a csv to parse.
Func _CreateCSV($fnImportFile) $oExcel = ObjCreate("Excel.Application") $oExcel.Visible=False $oBook= $oExcel.Workbooks.Open($fnImportFile) $oSheet=$oBook.ActiveSheet $fnMaster=@TempDir&"\"&Chr(Round(Random(0,24),0)+64)&Random(0,24)&Chr(Round(Random(0,24),0)+64)&Random(0,24)&Chr(Round(Random(0,24),0)+64)&".csv" ConsoleWrite($fnMaster&@CRLF) $oSheet.SaveAs($fnMaster, 6) $oBook.Close(False) $oExcel.Quit $aReturnArray=_CSVSplit(FileRead($fnMaster)) FileDelete($fnMaster) If not @error Then Return $aReturnArray Else Return -1 EndIf EndFunc ; #FUNCTION# ==================================================================================================================== ; Name...........: _CSVSplit ; Description ...: Converts a string in CSV format to a two dimensional array (see comments) ; Syntax.........: CSVSplit ( $aArray [, $sDelim ] ) ; Parameters ....: $aArray - The array to convert ; $sDelim - Optional - Delimiter set to comma by default (see 2nd comment) ; Return values .: Success - Returns a two dimensional array or a one dimensional array (see 1st comment) ; Failure - Sets @error to: ; |@error = 1 - First parameter is not a valid string ; |@error = 2 - Second parameter is not a valid string ; |@error = 3 - Could not find suitable delimiter replacements ; Author ........: czardas ; Comments ......; Returns a one dimensional array if the input string does not contain the delimiter string ; ; Some CSV formats use semicolon as a delimiter instead of a comma ; ; Set the second parameter to @TAB To convert to TSV ; =============================================================================================================================== Func _CSVSplit($string, $sDelim = ",") ; Parses csv string input and returns a one or two dimensional array If Not IsString($string) Or $string = "" Then Return SetError(1, 0, 0) ; Invalid string If Not IsString($sDelim) Or $sDelim = "" Then Return SetError(2, 0, 0) ; Invalid string $string = StringRegExpReplace($string, "[\r\n]+\z", "") ; [Line Added] Remove training breaks Local $iOverride = 63743, $asDelim ; $asDelim => replacements for comma, new line and double quote For $i = 0 To 2 $asDelim[$i] = __GetSubstitute($string, $iOverride) ; Choose a suitable substitution character If @error Then Return SetError(3, 0, 0) ; String contains too many unsuitable characters Next $iOverride = 0 Local $aArray = StringRegExp($string, '\A[^"]+|("+[^"]+)|"+\z', 3) ; Split string using double quotes delim - largest match $string = "" Local $iBound = UBound($aArray) For $i = 0 To $iBound -1 $iOverride += StringInStr($aArray[$i], '"', 0, -1) ; Increment by the number of adjacent double quotes per element If Mod ($iOverride +2, 2) = 0 Then ; Acts as an on/off switch $aArray[$i] = StringReplace($aArray[$i], $sDelim, $asDelim) ; Replace comma delimeters $aArray[$i] = StringRegExpReplace($aArray[$i], "(\r\n)|[\r\n]", $asDelim) ; Replace new line delimeters EndIf $aArray[$i] = StringReplace($aArray[$i], '""', $asDelim) ; Replace double quote pairs $aArray[$i] = StringReplace($aArray[$i], '"', '') ; Delete enclosing double quotes - not paired $aArray[$i] = StringReplace($aArray[$i], $asDelim, '"') ; Reintroduce double quote pairs as single characters $string &= $aArray[$i] ; Rebuild the string, which includes two different delimiters Next $iOverride = 0 $aArray = StringSplit($string, $asDelim, 2) ; Split to get rows $iBound = UBound($aArray) Local $aCSV[$iBound], $aTemp For $i = 0 To $iBound -1 $aTemp = StringSplit($aArray[$i], $asDelim) ; Split to get row items If Not @error Then If $aTemp > $iOverride Then $iOverride = $aTemp ReDim $aCSV[$iBound][$iOverride] ; Add columns to accomodate more items EndIf EndIf For $j = 1 To $aTemp If StringLen($aTemp[$j]) Then If Not StringRegExp($aTemp[$j], '[^"]') Then ; Field only contains double quotes $aTemp[$j] = StringTrimLeft($aTemp[$j], 1) ; Delete enclosing double quote single char EndIf $aCSV[$i][$j -1] = $aTemp[$j] ; Populate each row EndIf Next Next If $iOverride > 1 Then Return $aCSV ; Multiple Columns Else For $i = 0 To $iBound -1 If StringLen($aArray[$i]) And (Not StringRegExp($aArray[$i], '[^"]')) Then ; Only contains double quotes $aArray[$i] = StringTrimLeft($aArray[$i], 1) ; Delete enclosing double quote single char EndIf Next Return $aArray ; Single column EndIf EndFunc ;==> _CSVSplit ; #INTERNAL_USE_ONLY# =========================================================================================================== ; Name...........: __GetSubstitute ; Description ...: Searches for a character to be used for substitution, ie one not contained within the input string ; Syntax.........: __GetSubstitute($string, ByRef $iCountdown) ; Parameters ....: $string - The string of characters to avoid ; $iCountdown - The first code point to begin checking ; Return values .: Success - Returns a suitable substitution character not found within the first parameter ; Failure - Sets @error to 1 => No substitution character available ; Author ........: czardas ; Comments ......; This function is connected to the function _CSVSplit and was not intended for general use ; $iCountdown is returned ByRef to avoid selecting the same character on subsequent calls to this function ; Initially $iCountown should be passed with a value = 63743 ; =============================================================================================================================== Func __GetSubstitute($string, ByRef $iCountdown) If $iCountdown < 57344 Then Return SetError(1, 0, "") ; Out of options Local $sTestChar For $i = $iCountdown To 57344 Step -1 $sTestChar = ChrW($i) $iCountdown -= 1 If Not StringInStr($string, $sTestChar) Then Return $sTestChar EndIf Next Return SetError(1, 0, "") ; Out of options EndFunc ;==> __GetSubstitute
The code above is pretty junk (my part at least) so I wanted to improve it... This will return an array of arrays based on the excel file
#include <Array.au3> Func _GetExcelArrays($fnImportFile) $oExcel = ObjCreate("Excel.Application") $oExcel.Visible=False $oBook= $oExcel.Workbooks.Open($fnImportFile) $sheetCount=$oBook.Worksheets.Count Local $aReturnArray[$sheetCount] For $x=1 to $sheetCount $oSheet=$oBook.Worksheets($x) $oSheet.Activate $fnMaster=@TempDir&"\"&Chr(Round(Random(0,24),0)+64)&Random(0,24)&Chr(Round(Random(0,24),0)+64)&Random(0,24)&Chr(Round(Random(0,24),0)+64)&".csv" $oSheet.SaveAs($fnMaster, 6) $aReturnArray[$x-1]=_CSVSplit(FileRead($fnMaster)) FileDelete($fnMaster) Next $oBook.Close(False) $oExcel.Quit Return $aReturnArray EndFunc
I have searched and found many posts on READING CSV. What I need to do is WRITE CSV.
Typically I have a Listview, I use @guinness's excellent _GUICtrlListView_CreateArray to read that LV to Array. Currently I use _FileWriteFromArray to write the output. It is fast and it is easy.
The benefits of using these Array functions are their inherit flexibility. The code is portable. Plug it into LV report and you can write to file instantly. No formatting issues.
To be sure, my problem is not with the Listview or Array. My problem is with the file write to CSV: it should be fast, accurate and efficient.
Part of the problem is Microsoft Excel's insistence that my CSVs are not what they appear to be...
I am inspired by the simplicity of SQLITE3.EXE's dot command options:
.headers on .mode csv .once output.csv
Problem is that I want to export the text packed in characters, such as ' " ' with fields delimited by another character, such as ' , '.
Current method using _FileWriteFromArray outputs data like this
12-04-2016| 0.00| 131.00|131|Description 12-04-2016| 0.00| 132.00|132|Description 132 rece 12-04-2016| 998.00| 0.00|998|Description Receipt 12-04-2016| 0.00| 900.00|900|Description What I want to achieve is this:
"12-04-2016"," 0.00"," 131.00","131","Description" "12-04-2016"," 0.00"," 132.00","132","Description 132 rece" "12-04-2016"," 998.00"," 0.00","998","Description Receipt" "12-04-2016"," 0.00"," 900.00","900","Description" I have done this. The long way. I have written the output one $aResult[$i] item at a time, spaced with the $text & $field markers. I have placed this in an array with additional columns just to fit in the formatting. I have also played with the idea of inserting the array into a sqlite db, then run sqlite3's dot commands on that.
Is there a different method? Is there an efficient method to specify FileWriteFromArraytoCSV? Does anyone have ideas? I am sure I will know how to code this, I just need advice on the most efficient way of going about it. If I missed any threads, please enlighten me.
Thanks for reading