Jump to content

read and write xlsx files without Excel


Recommended Posts

Hi,

I discovered this udf on the German forum where you posted different evolutions, I was already working on a similar udf _XLSXReadToArray in my program. And now i use xlsxNative.
Congratulations, i love it, It's an excellent work, thank you very much because it's really very convenient to get a array without having to open excel.
It would be nice for a future evolution to obtain the BookLists, it will be great for me in order to work on the good workbook 🙂

Great Job !

Eric

Link to post
Share on other sites
1 minute ago, ermar said:

It would be nice for a future evolution to obtain the BookLists

A good suggestion - it goes on my to-do list.

First and foremost, I'm not 100% sure about the compatibility with as many xlsx subformats as possible.
The internal structure can be very different, so I can't guarantee that the UDF can handle all of them.
At least it has been able to handle all the files I have fed it so far.

Link to post
Share on other sites

I tested with

#include <xlsxNative.au3>

; create 2D array
Global $A[][] = [[1, 2, 3, 4, 5], ["", "6", 7, "", "8"], [], [9, "", "10", 11, True]]
_ArrayDisplay($A, "source array")
$iStart = TimerInit()
; convert the Array into a xlsx-file:
_xlsx_WriteFromArray(@ScriptDir & "\Text.xlsx", $A)
$iDiff = TimerDiff($iStart)
ConsoleWrite($iDiff & 'ms. for saving ' & @CRLF)
; read this xlsx-file into a 2D-Array:
$aSheet = _xlsx_2Array(@ScriptDir & "\Text.xlsx")
ConsoleWrite('Error: ' & @error & ' | ' & @extended & @CRLF)
$iDiff2 = TimerDiff($iStart)
ConsoleWrite($iDiff2-$iDiff & 'ms. for loading ' & @CRLF)
ConsoleWrite($iDiff2 & 'ms. total ' & @CRLF)
_ArrayDisplay($aSheet, "imported data from xlsx")

Console output:

11563.3009ms. for saving 
Error: 0 | 0
3713.4193ms. for loading 
15276.7202ms. total 
+>17:49:52 AutoIt3.exe ended.rc:0
+>17:49:52 AutoIt3Wrapper Finished.
>Exit code: 0    Time: 73.55

 

Edited by AutoBert
Link to post
Share on other sites

And now put a 7za.exe in the folder and see how long this takes.

For comparison - these are my results with your script when the 7za.exe is in the @Scriptdir:

180.0047ms. for saving 
Error: 0 | 0
99.362ms. for loading 
279.3667ms. total

 

Edited by AspirinJunkie
Link to post
Share on other sites
On 1/19/2023 at 6:16 PM, ermar said:

It would be nice for a future evolution to obtain the BookLists, it will be great for me in order to work on the good workbook 🙂

I'll just interpret that by "workbook" you meant "worksheets" instead - right?
As I understand it, there is only one workbook in an xlsx file - am I right?

So I added a new function _xlsx_getWorkSheets() which returns a list of the existing worksheets.

This is being used as follows: 
 

#include "xlsxNative.au3"

Local $sFile = @ScriptDir & "\Test.xlsx"

; determine the worksheets of a file
Local $aSheets = _xlsx_getWorkSheets($sFile)
_ArrayDisplay($aSheets, "Sheet List", "", 64 + 32 , Default, "ID|Name")

 

Edited by AspirinJunkie
Link to post
Share on other sites
16 hours ago, AspirinJunkie said:

For comparison - these are my results with your script when the 7za.exe is in the @Scriptdir

Consoleoutput after downloading 7za.exe to @scriptdir: 

>Running:(3.3.16.0):C:\Program Files (x86)\AutoIt3\autoit3_x64.exe "C:\Users\(len)Bert\AutoIt\Downloaded\AspirinJunkey\xlsxNativeTEST.au3"    
+>Setting Hotkeys...--> Press Ctrl+Alt+Break to Restart or Ctrl+BREAK to Stop.
1119.4133ms. for saving 
Error: 3 | 2
177.7074ms. for loading 
1297.1207ms. total 
+>15:16:56 AutoIt3.exe ended.rc:0

Saving ~ 1/10 time, but error while loading. I think it's MS defender which blocks writing. xlsxWork is created, but nothing in.

Changing to

Local $pthWorkDir = @ScriptDir & "\xlsxWork\" ;original @tempdir

brings throws same error: 3 | 2.

Edited by AutoBert
Link to post
Share on other sites
7 minutes ago, AutoBert said:

Error: 3 | 2

[...]

but error while loading. 

This combination of @error and @extended should only be able to occur when writing with _xlsx_WriteFromArray().
Are you sure that it occurs when loading (with _xlsx_2Array() )?
In case of _xlsx_WriteFromArray() this combination would mean that an error occurred during the FileWrite of the file `_rels\.rels`.

I cannot reproduce this for now, so it is unclear what is causing this.
I have adjusted the @error handling - especially of DirCreate and uploaded a new version.
I have - if it would be a rights problem - the folder creation in suspicion.

Link to post
Share on other sites
8 minutes ago, AspirinJunkie said:

Are you sure that it occurs when loading (with _xlsx_2Array() )?

Yes, i changed my testscript to

#include <xlsxNative.au3>

; create 2D array
Global $A[][] = [[1, 2, 3, 4, 5], ["", "6", 7, "", "8"], [], [9, "", "10", 11, True]]
_ArrayDisplay($A, "source array")
$iStart = TimerInit()
; convert the Array into a xlsx-file:
_xlsx_WriteFromArray(@ScriptDir & "\Text.xlsx", $A)
ConsoleWrite('_xlsx_WriteFromArray Error: ' & @error & ' | ' & @extended & @CRLF)
$iDiff = TimerDiff($iStart)
ConsoleWrite($iDiff & 'ms. for saving ' & @CRLF)
; read this xlsx-file into a 2D-Array:
$aSheet = _xlsx_2Array(@ScriptDir & "\Text.xlsx")
ConsoleWrite('_xlsx_2Array Error: ' & @error & ' | ' & @extended & @CRLF)
$iDiff2 = TimerDiff($iStart)
ConsoleWrite($iDiff2-$iDiff & 'ms. for loading ' & @CRLF)
ConsoleWrite($iDiff2 & 'ms. total ' & @CRLF)
_ArrayDisplay($aSheet, "imported data from xlsx")

resulting console:

+>Setting Hotkeys...--> Press Ctrl+Alt+Break to Restart or Ctrl+BREAK to Stop.
_xlsx_WriteFromArray Error: 0 | 0
950.3584ms. for saving 
_xlsx_2Array Error: 3 | 2
187.4876ms. for loading 
1137.846ms. total 
+>16:00:27 AutoIt3.exe ended.rc:0

as XLS-Format isn't supported (~1k) , i have no real need. For the view Files (<5) in XLSx-Format i can live with 10* slower time.

Link to post
Share on other sites

Like it !, now, some error checking would be nice:

; function to share one single xmldom-object over the functions but without beeing a global variable
Func __xlsx_getXMLObject()
    Local Static $c = 0
    Local Static $oX = ObjCreate("Microsoft.XMLDOM")
    If @error Then Return SetError(@error, @extended, 0)
.......
Func __xlsx_getSubFiles($pthWorkDir = @TempDir & "\xlsxWork\")
    Local $oXML = __xlsx_getXMLObject()
    If @error Then Return SetError(@error, @extended, 0)
.......

...and I believe that a local static is like a global internally for AutoIt, so is just a coding preference. ( I just wanted to say it for some reason ? ) :) 

Link to post
Share on other sites

The error occurs when calling 7za.exe.
Can you run the following manually from the command line from your script folder and see if this manual call works?:

7za.exe x "Text.xlsx" -o"%TEMP%\xlsxWork" shared*.xml sheet.xml sheet1.xml *.rels -r -tzip -bd -bb0 -aoa

 

Link to post
Share on other sites
1 minute ago, argumentum said:

now, some error checking would be nice:

Deal - new version is up now.

8 minutes ago, argumentum said:

...and I believe that a local static is like a global internally for AutoIt, so is just a coding preference.

It is a value which remains in memory throughout the entire program run.
Whether the variable belongs global or to another scope, however, refers purely to the identifier of the variable.

If I can address a variable by its name from anywhere - then it is global.
In the concrete example this is not the case - therefore the variable is also not global.

Link to post
Share on other sites
C:\Users\(len)Bert\AutoIt\Downloaded\AspirinJunkey>7za.exe x "Text.xlsx" -o "%TEMP%\xlsxWork" shared*.xml sheet.xml sheet1.xml *.rels -r -tzip -bd -bb0 -aoa

Resulting

Error:
Incorrect command line

7za.exe Just analyzed: https://www.virustotal.com/gui/file/4e15b455930cf0f0dd0732aee62a1482e90f88838eac2e38d0d248bae5934b20/details

Edited by AutoBert
Link to post
Share on other sites

Where did you dig up this ancient version of 7za.exe? Are you an archeologist?
The version you are using is from 2010 - so it is already 13 years old!

So I can imagine that there are some commands in it that were not there at that time.

In the opening post I have a link to a current 7za.exe - so your errors should no longer appear. 

Link to post
Share on other sites

Your linked file results with:

---------------------------
Nicht unterstützte 16 Bit-Anwendung
---------------------------
Das Programm bzw. das Feature "\??\C:\Users\(len)Bert\AutoIt\Downloaded\AspirinJunkey\7za.exe" kann aufgrund einer Inkompatibilität mit 64 Bit-Versionen von Windows nicht gestartet bzw. ausgeführt werden. Wenden Sie sich an den Softwarehersteller, um zu erfahren, ob eine mit 64 Bit-Windows kompatible Version verfügbar ist.


---------------------------
OK   
---------------------------

so i downloaded the https://7-zip.org/a/7z2201-x64.exe

and after installed i copied 7z.exe as 7za.exe to @scritdir. And yet it works:

+>Setting Hotkeys...--> Press Ctrl+Alt+Break to Restart or Ctrl+BREAK to Stop.
_xlsx_WriteFromArray Error: 0 | 0
1134.811ms. for saving 
_xlsx_2Array Error: 0 | 0
595.9215ms. for loading 
1730.7325ms. total

 

Link to post
Share on other sites
3 minutes ago, AutoBert said:

Your linked file results with:

My linked file is not the 7za.exe - it is a 7z-archive which contains the 7za.exe. You have to unpack the file before.

4 minutes ago, argumentum said:

...I meant internally, in the stub at the C++ level, not script wise

AutoIt manages its variables in a table which is divided in lists. 
In the old AutoIt source code these were two different lists - one for the globals and one for the locals.
In the lists themselves all variables are equal. The assignment to a scope results from the affiliation to a certain list.
If the variable would be part of the global list, it would be globally accessible.

In the concrete case the variable remains a local variable and is not accessible for scopes outside (except just by the return). 
It must handled a bit differently in the AutoIt interpreter, because it is static (we don't know how exactly, because the source code for this is not available) but it doesn't end up in the global list.

A global variable on C++ level would also mean that it can be addressed by C++ via its identifier.
AutoIt variables, however, do not get a C++ identifier at all but get their name via their list entry as structure _VarNode with the attribute szName. 

The static persistence has nothing to do with scope for now. 
It is still not a global variable - not even on C++ level.

Link to post
Share on other sites

Hi AspirinJunkie,

Thanks for the new function _xlsx_getWorkSheets().
Exactly, workbook is for me Worksheets tab, sorry i'm french user and i'm not comfortable with the US version of excel.
With my xlsx file, there are several worksheets tab. I tested the function with a file test and i want provide further clarification :

The function gives the all worksheets, it's perfect 🙂 but the first column "ID" 's values does not necessarily correspond to the order of the worksheets that we see when i open the file.

image.png.dd05184464c7a97c8857596ddf7fc3bb.png

For example, I moved the fourth worksheet and if I use_xlsx_2Array and I want to read the first worksheet ($sSheetNr=1), I can't use this identification value because the value is "4" in the array, while the expected value should be "1".

Great Job, 🤗

Would it be possible to provide an additional functionality with _xlsx_2Array ? We can select the numbers of rows to extract but could we choose some columns to extract ?

it would be great ! Thank you.

 

 

Link to post
Share on other sites
On 1/23/2023 at 2:23 PM, ermar said:

The function gives the all worksheets, it's perfect 🙂 but the first column "ID" 's values does not necessarily correspond to the order of the worksheets that we see when i open the file.

The fact that the IDs no longer correspond to the order is not that bad.
Worse, the IDs don't match the internal sheet file numbering, so you can't use them in the xlsx_2Array() to select the sheet.

I have therefore adjusted it a bit so that the correct IDs appear first.
However, the sheet determination is currently still very quick and dirty and works with many files but possibly not with all.
When I get some time, I will therefore redo everything so that the workbook.xml.rels is evaluated properly so that all the assignments are evaluated more cleanly.
Edit: Done!

  

On 1/23/2023 at 2:23 PM, ermar said:

Would it be possible to provide an additional functionality with _xlsx_2Array ? We can select the numbers of rows to extract but could we choose some columns to extract ?

Yes sure - i can do that.

Edited by AspirinJunkie
Link to post
Share on other sites
On 1/23/2023 at 2:23 PM, ermar said:

Would it be possible to provide an additional functionality with _xlsx_2Array ? We can select the numbers of rows to extract but could we choose some columns to extract ?

it would be great ! Thank you.

Done.
The _xlsx_2Array()-function now has two additional parameters to restrict to specific columns.

Besides that, the extraction of the relevant files has been completely rewritten and should now be more stable and thus cover more shapes.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...