Jump to content

HighMem - Maximise your RAM access on x64


RTFC
 Share

Recommended Posts

@RTFC.  Very interesting.  Is the problem that you describe in your intro inherent to 64-bit systems in general, or is this soley an issue when attempting to use the 64-bit version of AutoIt?  Same question would apply to your scripts use: can I use it if I am running a 64-bit system but the 32-bit version of Autoit, or would I need to have the 64-bit version of Autoit in order to get any use out of this script?

Link to comment
Share on other sites

@MattHiggs: Hi, thanks. The problem I'm describing (running out of memory when trying to allocate multiple large chunks of data) occurs solely when running AutoIt scripts as 32-bit (even though I'm on x64 machine + x64 OS). But it would of course also occur on a 32-bit machine+OS. I haven't really tested this library in partial 32-bit environments, but as far as I can tell, you would need all three aspects (machine/OS/AutoIt) to be running in x64-mode for this to work. I presume that if you were to use the directive "#AutoIt3Wrapper_UseX64=Y" in a 32-bit environment (machine and/or OS), it would fail, and @AutoitX64 would still return False (0). If the OS is not x64, then how is AutoIt going to receive 64-bit memory pointers from it (which are needed to address the entire virtual memory space, in my case 128 TB (with 16 GB physical RAM))? Ditto for machine. Since my environment only required me to get the AutoIt part up to x64 standard (and my own Eigen4AutoIt.dll to be compiled for x64), that's what I did.

It's because I deal with large datasets in large computations all the time, and so far my E4A library couldn't handle these in full, so I either had to write complicated segmented computational codes:x or develop a computation in E4A on a small dummy dataset, iron out all the kinks, and then translate that into a separate (C++) Eigen code (which is quite painful to debug, hence using E4A first, which is much more user-friendly). Using HighMem, that's all history now. (Well, except for the initial debugging, that is.;))

Another advantage of using this memory mapping (that I hadn't mentioned yet) is that it can be shared between processes, so conceivably parallelised processing becomes possible (even networked, if #RequireAdmin can be added to the script). So for me it was the next step. Not sure it can be of any use to others (such as yourself), but it's a safe bet to assume that in x86-mode (i.e., 32-bit), there's no point in using it as your mappable address space would be restricted to 2-4 GB, which is the standard Windows-allotted virtual RAM per process anyway. Does that answer your question?:)

Edited by RTFC
clarification
Link to comment
Share on other sites

9 minutes ago, RTFC said:

in my case 128 TB

:blink:  Damn!  Dude.  Do you have the entire internet stored on this machine or what?  I cannot even comprehend having that amount of storage, much less a scenario where it would be necessary.  And I thought my 6 TB  file server was hot shit.  Now I feel emasculated..:>

As for the rest of your explanation, if I am understanding your explanation correctly, this udf essentially removes 32 bit operating system max memory allocation and allows operations which require a large amount of memory (more than what a 32-bit system is capable of supporting normally) to be performed on 32 bit systems.  That correct?

Edited by MattHiggs
Link to comment
Share on other sites

This is virtual memory, not physical memory.:lol: And I don't use all of it, I just need more than 4 GB in individual computations. Used memory pages get swapped in and out of physical RAM as needed. Explanation here.

RE. your second question, not quite. This library is purely for x64 environments. It's a RAM allocation manager that pre-allocates a named space of desired size, in which you can then allocate smaller chunks that can be larger than 4GB each (identified by their base address) for your personal use. If you're mapping AutoIt structs to these (see second provided test script), these will still be restricted to 2GB (-1), but my E4A x64 dll is native-x64, so larger-sized matrices are also supported there. And when you subsequently release some of these chunks they can be re-assigned for new allocations, just like a file manager would for a harddrive. I cannot use AutoIt structs for matrices >2GB; that's why HighMem provides an alternative for larger chunks (but only in x64).

 

 

Edited by RTFC
typos
Link to comment
Share on other sites

2 minutes ago, RTFC said:

This is virtual memory, not physical memory

As in the "pagefile.sys" for Windows operating systems?  As for the rest, I believe I am starting to understand.  It specifically for 64 bit systems, but, especially for memory intesive processes, there was not a way to limit the size of the chuncks of memory that a process would utilize while running (at least until this UDF), and would normally be determined by the operating system, which might not make the best decision as to the size of the these chunks which could lead to errors included in first post (Os might think 1GB of memory available is plenty of memory to allocate to a process requesting access to RAM, without realizing that the chunk of memory that the process will be using is 1 1/2 GB of memory).  So a more effective memory management tool.  That closer?

Link to comment
Share on other sites

Yap. Although it's not so much the OS that's the bottleneck for me as it is the maximum size of AutoIt structs. I used to stuff my matrices in structs, which is fine in 32-bit up to 2 GB. But now I need larger containers, so this library provides me with an easy way to manage these (on x64), without either creating or mapping (AutoIt) structs. Of course, you could also stick a large file in there (e.g. to compute a hash, as trancexx has shown here), or use it as a large graphics buffer for animations, for example.

Edited by RTFC
Link to comment
Share on other sites

4 minutes ago, RTFC said:

Yap. Although it's not so much the OS that's the bottleneck for me as it is the maximum size of AutoIt structs. I used to stuff my matrices in structs, which is fine in 32-bit up to 2 GB. But now I need larger containers, so this library provides me with an easy way to manage these (on x64), without either creating or mapping (AutoIt) structs. Of course, you could also stick a large file in there (e.g. to compute a hash, as trancexxhas shownhere), or use it as a large graphics buffer for animations.

I see.  Ok.  That answers my questions.  Very cool udf.  Thank you sir

Link to comment
Share on other sites

Since I did a less than stellar job of explaining yesterday:baby:, I'll try to correct myself here.:> Although AutoIt_x64 will not generate an @error when attempting to allocate a struct >2GB, only that maximum amount will be allocated, as far as I can tell (at least, I am unable to read/write data from/to addresses higher than that). Here's an example (for users on an x64 machine + x64 OS):

;#AutoIt3Wrapper_UseX64=N   ; always FAILS, even on x64 machine + x64 OS
#AutoIt3Wrapper_UseX64=Y    ; works on x64 machine + x64 OS

Global $GB=1024^3
Global $GBtoAllocate=6
Global $structs[$GBtoAllocate+1]
Global $lastbyte=$GB * 1
Global $testvalue=123       ; use 0-255 only (byte)

ConsoleWrite("Attempting to allocate " & $GBtoAllocate & " 1 GB chunks of memory..." & @CRLF)
For $cc=1 To $GBtoAllocate
    $structs[$cc]=DllStructCreate("byte[" & 1 * $GB & "]")  ; should work on x64
    if @error Then
        ConsoleWrite("Creating struct " & $cc & " has failed; aborting." & @CRLF)
        Exit(-1)
    EndIf
    ConsoleWrite("Setting last value: " & ((DllStructSetData($structs[$cc],1,$testvalue,$lastbyte)=$testvalue)?("works"):("FAILS!")) & @CRLF)
Next

ConsoleWrite(@CRLF & "Attempting to allocate a SINGLE chunk of " & $GBtoAllocate & " GB now..." & @CRLF)
$cc=0
Global $lastbyte=$GB*$GBtoAllocate
$structs[$cc]=DllStructCreate("byte[" & $lastbyte & "]")    ; partial failure (silent)
if @error Then
    ConsoleWrite("Creating struct " & $cc & " has failed; aborting." & @CRLF)
    Exit(-1)
EndIf
ConsoleWrite("Setting last value: " & ((DllStructSetData($structs[$cc],1,$testvalue,$lastbyte)=$testvalue)?("works"):("FAILS!")) & @CRLF)

So if you just need to allocate lots of small structs that together exceed 4 GB, you don't need HighMem (on x64 machine + OS that is, as long as you set the UseX64 directive; on x86 machine+OS you're up the proverbial creek). But if any single memory region you need exceeds 2 GB, HighMem may be quite useful on x64. Another advantage is control; you decide in advance how much total space you need, and within that space, HighMem will spatially optimise allocations of various sizes automatically, and keep track of additional free space when memory regions are released again. The drawback is that you'll still have to figure out yourself how to access that space. One way would be to map it with consecutive 1 GB structs. The WInAPI provides other ways. My E4A wrapper library now uses DllCalls for all matrix memory I/O.

And regarding that TB-sized virtual memory, of course I don't have a pagefile that size. Virtual memory is a joint address space shared by, and accessible to all processes on your machine. Only a tiny fraction thereof will be in use, and only a proportion of that will get swapped to disk if/when physical RAM is unable to cope. It's like the Dewey libary book topic classification; just because it spans thousands of sub-classes does not mean that every library using the system will have all existing books on each possible subtopic on its shelves.:D

I hope the above goes some way towards clearing the fog left by my earlier ramblings.:think:

Edited by RTFC
Link to comment
Share on other sites

Beta version 0.9 released.:dance:

This is a major upgrade, enabling HighMem allocations to be shared between multiple processes (after parsing their relative offset through any means of IPC). An example script (#20) with mutex negotiation will be included in the next release (4.1) of my matrix computing environment, coming soon(ish).

Edited by RTFC
Link to comment
Share on other sites

Hi RTFC

I´m specially interested into use it for sharing memory betwen processes.

Even still without mutexes, could you give a simple example that:

  1. create an allocation (any size)
  2. we can map au3 variables there? For example, an array that I can use plain in my Au3 code? Like Global  $ARR [1000][1000] .... $ARR [$M][$N] = 1234
  3. If not, could you give examples on how to write/read a variable there? And, that would be efficient?
  4. how we will use (or declare) that allocation at the other process?

Thanks:  Jose

Edited by joseLB
Link to comment
Share on other sites

Hi Jose.:) My first impulse was to just refer you to my Eigen4AutoIt environment (v4.1), which contains a working example (test #20 in the EIgenTest subdirectory) of a matrix (struct) being created by one process, an subsequently written two in alternating fashion by a second process and the first (both filling it with their PID). That example illustrates mutex usage as well, as you'll see each process failing to obtain the mutex until the other one releases it (after which a successful write occurs).

My second impulse was to refer you to the _WinAPI_CreateFileMapping example in the AutoIt Help, which shows how to share a joint memory allocation between two processes without mutex. But of course, that one doesn't involve the HighMem environment, which takes care of all the nitty gritty of keeping your memory allocations organised.

So my third impulse, followed here, is to share two tiny test scripts from my earliest testing phase of _HighMem. Both use a struct as a string buffer for one process to store user-defined text in and the other process to read it back, just like in the CreateFileMapping example (but using the HighMem environment). Obviously, you can stick anything you like into a struct, including an array (e.g., convert to string and store in struct at one end, and do the reverse at the other end). If you need more extensive examples of how to use structs as generic tmporary containers for any type of variable, I would suggest you study my Pool IPC environment (link in my sig below).

Start this script first (the HighMem equivalent of the AutoIt Help file example referred to above), but don't fill the inputbox yet...

#AutoIt3Wrapper_UseX64=Y
#NoTrayIcon

#include <MsgBoxConstants.au3>
#include <WinAPI.au3>
#include <WinAPIFiles.au3>
#include "D:\Autoit\HighMem\HighMem.au3"
$_HighMem_verbose=True

Opt('WinWaitDelay', 0)

Global Const $g_sTitle = 'TestHighMem'
Global $sText
Global $mapobjectname=$_HighMem_NamePrefix & '123'  ; fake PID

_HighMem_StartUp(1)
_HighMem_MapExternalMemory(123, False)  ; fake PID

Local $sText,$tData = DllStructCreate('wchar[1024]', $_HighMem_ExternalBaseOffset[1])

While WinWait($g_sTitle, '', 1)
    Sleep(200)
    $sText = DllStructGetData($tData, 1)
    DllStructSetData($tData, 1, '')
    If $sText Then MsgBox(BitOR($MB_ICONINFORMATION, $MB_SYSTEMMODAL), $g_sTitle & " (receiver)", "                                               " & @CRLF & $sText)
WEnd
_HighMem_CleanUp()

Then start the second one below, then type something and press Ok.

#AutoIt3Wrapper_UseX64=Y
#NoTrayIcon

#include <MsgBoxConstants.au3>
#include <WinAPI.au3>
#include <WinAPIFiles.au3>
#include "D:\Autoit\HighMem\HighMem.au3"
$_HighMem_verbose=True

Opt('WinWaitDelay', 0)

Global Const $g_sTitle = 'TestHighMem'
Global $sText
Global $mapobjectname=$_HighMem_NamePrefix & '123'

_HighMem_StartUp(1,"GB",$mapobjectname)

Local $sText,$tData = DllStructCreate('wchar[1024]', $_HighMem_BaseOffset)

While WinWaitClose($g_sTitle)
    $sText = StringStripWS(InputBox($g_sTitle & " (sender)", 'Type some text', '', '', -1, 171), 3)
    If Not $sText Then ExitLoop

    DllStructSetData($tData, 1, $sText)
    If Not WinWait($g_sTitle, '', 1) Then ExitLoop
WEnd
_HighMem_CleanUp()

IMPORTANT: this simple example works for a single allocation only, because it maps the struct directly to the shared memory base pointer (the start of the total pre-allocated region). But obviously, only the first allocation has an offset of zero. Any allocation made thereafter by script 1 will have a relative offset larger than zero. So you have to start parsing those offsets to your second (and third, fourth, etc) script(s) using some form of IPC (for example, using my Pool environment :shifty:). And the second script then calls _HighMem_AllocateExternal ( $PID, $offset, $size, $unit="B"), after first calling _HighMem_MapExternalMemory( $PID ). Maybe I can provide a nexample of that some other time.

I'm not sure by what you mean by "efficient" in this context. Using a shared resource and just communicating a pointer once is certainly less data traffic than moving a 1000x1000 cell array back and forth every time, if that's what you mean. And yes, these allocations can be any size that fits into your virtual memory. Does that clarify things for you?

Edited by RTFC
Link to comment
Share on other sites

Thanks RTFC, yes, helped a lot

My problem was a lack of knowledge on dllstruct and similar ones. But now that is solved !

I also checked _WinAPI_CreateFileMapping and seems that it can work too.

About performance, later I will do some loops and compare against array direct cells write/read. I can suppose that it is much slower, but much faster than, for example, use a file as a way to exange data betwen processes.

But  I thint that if AU3 variables had an "AT" keyword, that would be great.... For example Global/Local AT $ptr  $var1, $var2[1000][1000], etc.

Some years ago I suggested that in MikroPascal, and hoppefully they implemented that. Today we can map a variable over another, and over PIC microcontrolers register bits, etc. => Var  XXX: byte AT portB; Var ZZZ: sbit at portB.1; .....

 

Edited by joseLB
Link to comment
Share on other sites

That example I posted yesterday was a bit poor, as the second process does not use my dedicated PID-bases HighMem functions to map another process's shared memory and individual allocations. When I have some spare time I'll post a proper example.

Secondly, I failed yesterday to emphasise the difference between using shared mem for exchanging data between processes (IPC), and for creating a joint workspace where multiple processes can act simultanuously or (with mutex handling) consecutively (as I do in E4A). If you need to send large arrays back and forth all the time (including string conversion and dllstruct I/O), then esp. those string conversions will be slow. Instead, I would suggest you use the joint workspace approach, with one process creating a database-like struct, and other processes performing repeated data I/O there. As for speed, as long as your contiguous allocations fit in physical RAM (as opposed to virtual RAM, which is far bigger, but may swap to your pagefile), access times should be about three orders of magnitude faster than using regular file I/O (when the latter is/were not swapped into/out of RAM).

Of course, if your data are numeric only, then you should really look into using Eigen4AutoIt (E4A, link in my sig), which already integrates HighMem in x64, and has extensive (numeric) data manipulation functionality (copy,swap,sort,reverse,cell-wise arithmetic, conditional replace, etc). If you're using arrays that contain variable-length strings, however, then you should probably adopt the struct approach, pre-allocating fixed-width fields within the struct (structs allow indexed access of their individual members, which can be different vartypes). That's basically your requested "AT" feature, if I understand your description correctly (haven't used Pascal since secondary school). Check DllstructCreate in the help file for more info.

Finally, re. mutexes (mutices?), I think I mentioned elsewhere that these are just system-wide flags that multiple processes can query, set, and reset. They do not block access in any way. You don't need them if you're just reading, but I would suggest you do use them if mutliple processes can write to the same location and use the information therein for other actions.

Edited by RTFC
Link to comment
Share on other sites

Thanks a lot for all your explanations RTFC 

I understand mutexes, etc. On my case, it´s more something like you told, creating a joint workspace where multiple processes can act simultanuously.

That´s the reason I suggested the "AT". Au3, as any other language, has to allocate space for it´s variables, no mater if they are dynamic typing like AU3.

Anyway, when you declare a variable, area must be allocated to  it (or when  you assign a value to it).

So, if we could tell AU3 "use the area starting at this point" for the variables created at this Global or Local declaration line, that would make much more simple and so fast as plain variables to share data betwen processes. While we do not have "AT", seemos to me the best thing for me is to give a deeper look at your E4A to see how you handle this common workspace, and use dllCreate, .... Later I will worry about mutexes. Use of mutexes depends largely on how you code and how specific or generic is your code. For example, windows has a bunch of them with names and forms like fork trees, queues, etc., etc. 

I believe that AU3 manages just one "memory allocation area" for it´s dynamic variables creation. To implement "AT", it shoud manage distinct "memory allocation areas". While this seems not so complicated, it is deep change and probably would not justify the effort versus the number of people interested on that. But, on ohter hand, parallel computing with so many processors avaiable at any PC, that could be an improvement.

Link to comment
Share on other sites

Link to comment
Share on other sites

6 hours ago, LarsJ said:

joseLB, If your final goal is faster array manipulations you should also take a look at Accessing AutoIt Variables.

Thanks a lot Lars, it is one of my goals, associated to these variables to be at a shared mem area. I just did a quick look at your Accessing AutoIt Variables, as I´m at job now. It seems very promising. I will go for it later, by sure!  Maybe I can change the pointer of an array variant created on AU3 to the shared area or vice versa, I mean, create an area that mimics an array and make AU3 to think that´s a plain AU3 array? Who knows... I need to read more your post and the UDF...

It´s a class in AU3 internals...

Edited by joseLB
Link to comment
Share on other sites

AutoIt arrays are internally stored as safearrays of variants.

All AutoIt variables are variants. An AutoIt variable that represents an array is a variant where the value of the vt field is VT_ARRAY + VT_VARIANT = 0x200C and the value of the data field is a pointer to the safearray.

The essence of the example is to get a pointer to a by-value copy of the safearray. Through that pointer you can manipulate the safearray with real compiled code.

 

With AutoItObject it's possible to register a system global object that represents an AutoIt array. You can find an example here.

Link to comment
Share on other sites

  • Developers

@joseLB&@LarsJ, Please continue your dialog in another thread and let's get this one back on topic.

Thanks, Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...