Jump to content
Beege

MS Classic Bounce Multithreading Example with 128 threads

Recommended Posts

That's kinda cool code :thumbsup: Thanks for sharing.

 

After a long time of assembler abstinence I currently try some ASM code to speed up my TGA loader. Currently I'm stuck with x64 ASM code... 


Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites

Thanks UEZ!

I took a look at your loader and have been getting caught up on TGA format. I cant believe I never heard of it. I love the simple header. Replacing those For loops would definitely speed things up. The parts for me that I always get stuck on are dealing with floats.  Have you got asm code working for 32bit yet?

Share this post


Link to post
Share on other sites
33 minutes ago, Beege said:

Thanks UEZ!

I took a look at your loader and have been getting caught up on TGA format. I cant believe I never heard of it. I love the simple header. Replacing those For loops would definitely speed things up. The parts for me that I always get stuck on are dealing with floats.  Have you got asm code working for 32bit yet?

Well, for the TGA loader loading 32-bit image should be relativ fast as it a 1d loop. Only the 2d loops take very long time for larger images.

Currently I've done the x86 ASM code for 15/16 bit images but for x64 the code doesn't work. E.g. loading a 15-bit image with 2789x3500 dim. using native AutoIt it takes on my machine more than 134992 ms, with ASM 37 ms -> 3.648x faster!

This applies only to 8/15/16/24-bit images whose width is not a divider of 4. 

Here the part of the UDF which can be replaced with the non-pro ASM code

Case 15, 16, 24, 32 ;15/16/24/32-bit, as the bitmap format is the same we can use memcpy to copy the pixel data directly to the memory.
                            ;Exeptions are 15/16/24-bit images whose width is not a divider of 4!
            If BitOR($iPxDepth = 15, $iPxDepth = 16, $iPxDepth = 24) And Mod($iW, 4) Then
                Switch $iPxDepth
                    Case 15, 16
                        Local Const $bBinASM1516_x86 = Binary("0x5589E58B5D188B4D1C89C8F7651089452489C8F765148945285389D8D1E08B552801C203550C8B752401C60375200375088B066689024B83FB0075DE5B4983F90075C65DC22400")
                        Local $tBinASM1516_x86 = DllStructCreate("byte asm[" & BinaryLen($bBinASM1516_x86) & "]")
                        $tBinASM1516_x86.asm = $bBinASM1516_x86
                        Local $tMemVar1 = DllStructCreate("dword var"), $tMemVar2 = DllStructCreate("dword var")
                        DllCallAddress("none", DllStructGetPtr($tBinASM1516_x86), _
                                       "ptr", DllStructGetPtr($tSrcBmp), "ptr", DllStructGetPtr($tDestBmp), _
                                       "dword", $iW * 2, "dword", $stride, "dword", $iW - 1, "dword", $iH - 1, "dword", $pitch, _
                                       "ptr", DllStructGetPtr($tMemVar1), "ptr", DllStructGetPtr($tMemVar2))
                    Case 24

The ASM code:

#cs _ASM1516_x86
    use32
    ;pushad

    define tSrcBmp  dword[ebp + 08]
    define tDestBmp dword[ebp + 12]
    define strideS  dword[ebp + 16]
    define strideD  dword[ebp + 20]
    define width    dword[ebp + 24]
    define height   dword[ebp + 28]
    define pitch    dword[ebp + 32]
    define tMemVar1 dword[ebp + 36]
    define tMemVar2 dword[ebp + 40]

    push ebp
    mov ebp, esp

    mov ebx, width ;exc = w - 1
    mov ecx, height ;ecx = h - 1

;~  _ASMDBG_()

    _y:
        mov eax, ecx
        mul strideS
        mov tMemVar1, eax

        mov eax, ecx
        mul strideD
        mov tMemVar2, eax
        push ebx
        _x:
            mov eax, ebx
            shl eax, 1

            mov edx, tMemVar2
            add edx, eax
            add edx, tDestBmp

            mov esi, tMemVar1
            add esi, eax
            add esi, pitch
            add esi, tSrcBmp

            mov eax, [esi]
            mov word[edx], ax

            dec ebx
            cmp ebx, 0
            jne _x
        pop ebx
        dec ecx
        cmp ecx, 0
        jne _y

    pop ebp
    ;popad

    ret 36
#ce _ASM1516_x86

Any idea how to convert is to a working x64 version?

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • By Beege
      Here is the latest assembly engine from Tomasz Grysztar, flat assembler g as a dll which I compiled using original fasm engine. He doesn't have it compiled in the download package but it was as easy as compiling a exe in autoit if you ever have to do it yourself. Just open up the file in the fasm editor and press F5. 
      You can read about what makes fasmg different from the original fasm HERE if you want . The minimum you should understand is that this engine is bare bones by itself not capable of very much. The macro engine is the major difference and it uses macros for basically everything now including implementing the x86 instructions and formats. All of these macros are located within the include folder and you should keep that in its original form.  
      When I first got the dll compiled I couldn't get it to generate code in flat binary format. It was working but size of output was over 300 bytes no matter what the assembly code and could just tell it was outputting a different format than binary. Eventually I figured out that within the primary "include\win32ax.inc"', it executes a macro "format PE GUI 4.0" if x86 has not been defined. I underlined macro there because at first I (wasted shit loads of time because I) didn't realize it was a macro (adding a bunch of other includes) since in version 1 the statement "format binary" was a default if not specified and specifically means add nothing extra to the code. So long story short, the part that I was missing is including the cpu type and extensions from include\cpu folder. By default I add x64 type and SSE4 ext includes. Note that the x64 here is not about what mode we are running in, this is for what instructions your cpu supports. if you are running on some really old hardware that may need to be adjusted or if your on to more advanced instructions like the avx extensions, you may have to add those includes to your source. 
      Differences from previous dll function
      I like the error reporting much better in this one. With the last one we had a ton error codes and a variable return structure depending on what kind of error it had. I even had an example showing you what kind of an error would give you correct line numbers vs wouldn't. With this one the stdout is passed to the dll function and it simply prints the line/details it had a problem with to the console. The return value is the number of errors counted.
      It also handles its own memory needs automatically now . If the output region is not big enough it will virtualalloc a new one and virtualfree the previous.  
      Differences in Code
      Earlier this year I showed some examples of how to use the macros to make writing assembly a little more familiar. Almost all the same functionality exists here but there are a couple syntax sugar items gone and slight change in other areas. 
      Whats gone is FIX and PTR. Both syntax sugar that dont really matter. 
      A couple changes to structures as well but these are for the better. One is unnamed elements are allowed now, but if it does not have a name, you are not allowed to initialize those elements during creation because they can only be intialized via syntax name:value . Previously when you initialized the elements, you would do by specifying values in a comma seperated list using the specific order like value1,value2,etc, but this had a problem because it expected commas even when the elements were just padding for alignment so this works out better having to specify the name and no need for _FasmFixInit function. "<" and ">" are not longer used in the initializes ether.
      OLD: $sTag = 'byte x;short y;char sNote[13];long odd[5];word w;dword p;char ext[3];word finish' _(_FasmAu3StructDef('AU3TEST', $sTag));convert and add definition to source _(' tTest AU3TEST ' & _FasmFixInit('1,222,<"AutoItFASM",0>,<41,43,43,44,45>,6,7,"au3",12345', $sTag));create and initalize New: $sTag = 'byte x;short y;char sNote[13];long odd[5];word w;dword p;char ext[3];word finish' _(_fasmg_Au3StructDef('AU3TEST', $sTag)) ;convert and add definition to source _(' tTest AU3TEST x:11,y:22,sNote:"AutoItFASM",odd:41,odd+4:42,odd+8:43,w:6,p:7,ext:"au3",finish:12345');create and initalize Extra Includes
      I created a includeEx folder for the extra macros I wrote/found on the forums. Most of them are written by Thomaz so they may eventually end up in the standard library. 
       Align.inc, Nop.inc, Listing.inc
      The Align and Nop macros work together to align the next statement to whatever boundary you specified and it uses multibyte nop codes to fill in the space. Filling the space with nop is the default but you can also specify a fill value if you want. Align.assume is another macro part of align.inc that can be used to set tell the engine that a certain starting point is assumed to be at a certain boundary alignment and it will do its align calculations based on that value. 
      Listing is a macro great for seeing where and what opcodes are getting generated from each line of assembly code.  Below is an example of the source and output you would see printed to the console during the assembly. I picked this slightly longer example because it best shows use of align, nop, and then the use of listing to verify the align/nop code. Nop codes are instructions that do nothing and one use of them is to insert nop's as space fillers when you want a certian portion of your code to land on a specific boundary offset. I dont know all the best practices here with that (if you do please post!) but its a type of optimization for the cpu.  Because of its nature of doing nothing, I cant just run the code and confirm its correct because it didnt crash. I need to look at what opcodes the actual align statements made and listing made that easy. 
      source example:
      _('procf _main stdcall, pAdd') _(' mov eax, [pAdd]') _(' mov dword[eax], _crc32') _(' mov dword[eax+4], _strlen') _(' mov dword[eax+8], _strcmp') _(' mov dword[eax+12], _strstr') _(' ret') _('endp') _('EQUAL_ORDERED = 1100b') _('EQUAL_ANY = 0000b') _('EQUAL_EACH = 1000b') _('RANGES = 0100b') _('NEGATIVE_POLARITY = 010000b') _('BYTE_MASK = 1000000b') _('align 8') _('proc _crc32 uses ebx ecx esi, pStr') _(' mov esi, [pStr]') _(' xor ebx, ebx') _(' not ebx') _(' stdcall _strlen, esi') _(' .while eax >= 4') _(' crc32 ebx, dword[esi]') _(' add esi, 4') _(' sub eax, 4') _(' .endw') _(' .while eax') _(' crc32 ebx, byte[esi]') _(' inc esi') _(' dec eax') _(' .endw') _(' not ebx') _(' mov eax, ebx') _(' ret') _('endp') _('align 8, 0xCC') ; fill with 0xCC instead of NOP _('proc _strlen uses ecx edx, pStr') _(' mov ecx, [pStr]') _(' mov edx, ecx') _(' mov eax, -16') _(' pxor xmm0, xmm0') _(' .repeat') _(' add eax, 16') _(' pcmpistri xmm0, dqword[edx + eax], 1000b') ;EQUAL_EACH') _(' .until ZERO?') ; repeat loop until Zero flag (ZF) is set _(' add eax, ecx') ; add remainder _(' ret') _('endp') _('align 8') _('proc _strcmp uses ebx ecx edx, pStr1, pStr2') ; ecx = string1, edx = string2' _(' mov ecx, [pStr1]') ; ecx = start address of str1 _(' mov edx, [pStr2]') ; edx = start address of str2 _(' mov eax, ecx') ; eax = start address of str1 _(' sub eax, edx') ; eax = ecx - edx | eax = start address of str1 - start address of str2 _(' sub edx, 16') _(' mov ebx, -16') _(' STRCMP_LOOP:') _(' add ebx, 16') _(' add edx, 16') _(' movdqu xmm0, dqword[edx]') _(' pcmpistri xmm0, dqword[edx + eax], EQUAL_EACH + NEGATIVE_POLARITY') ; EQUAL_EACH + NEGATIVE_POLARITY ; find the first *different* bytes, hence negative polarity' _(' ja STRCMP_LOOP') ;a CF or ZF = 0 above _(' jc STRCMP_DIFF') ;c cf=1 carry _(' xor eax, eax') ; the strings are equal _(' ret') _(' STRCMP_DIFF:') _(' mov eax, ebx') _(' add eax, ecx') _(' ret') _('endp') _('align 8') _('proc _strstr uses ecx edx edi esi, sStrToSearch, sStrToFind') _(' mov ecx, [sStrToSearch]') _(' mov edx, [sStrToFind]') _(' pxor xmm2, xmm2') _(' movdqu xmm2, dqword[edx]') ; load the first 16 bytes of neddle') _(' pxor xmm3, xmm3') _(' lea eax, [ecx - 16]') _(' STRSTR_MAIN_LOOP:') ; find the first possible match of 16-byte fragment in haystack') _(' add eax, 16') _(' pcmpistri xmm2, dqword[eax], EQUAL_ORDERED') _(' ja STRSTR_MAIN_LOOP') _(' jnc STRSTR_NOT_FOUND') _(' add eax, ecx ') ; save the possible match start') _(' mov edi, edx') _(' mov esi, eax') _(' sub edi, esi') _(' sub esi, 16') _(' @@:') ; compare the strings _(' add esi, 16') _(' movdqu xmm1, dqword[esi + edi]') _(' pcmpistrm xmm3, xmm1, EQUAL_EACH + NEGATIVE_POLARITY + BYTE_MASK') ; mask out invalid bytes in the haystack _(' movdqu xmm4, dqword[esi]') _(' pand xmm4, xmm0') _(' pcmpistri xmm1, xmm4, EQUAL_EACH + NEGATIVE_POLARITY') _(' ja @b') _(' jnc STRSTR_FOUND') _(' sub eax, 15') ;continue searching from the next byte _(' jmp STRSTR_MAIN_LOOP') _(' STRSTR_NOT_FOUND:') _(' xor eax, eax') _(' ret') _(' STRSTR_FOUND:') _(' sub eax, [sStrToSearch]') _(' inc eax') _(' ret') _('endp') Listing Output:
      00000000: use32 00000000: 55 89 E5 procf _main stdcall, pAdd 00000003: 8B 45 08 mov eax, [pAdd] 00000006: C7 00 28 00 00 00 mov dword[eax], _crc32 0000000C: C7 40 04 68 00 00 00 mov dword[eax+4], _strlen 00000013: C7 40 08 90 00 00 00 mov dword[eax+8], _strcmp 0000001A: C7 40 0C D8 00 00 00 mov dword[eax+12], _strstr 00000021: C9 C2 04 00 ret 00000025: localbytes = current 00000025: purge ret?,locals?,endl?,proclocal? 00000025: end namespace 00000025: purge endp? 00000025: EQUAL_ORDERED = 1100b 00000025: EQUAL_ANY = 0000b 00000025: EQUAL_EACH = 1000b 00000025: RANGES = 0100b 00000025: NEGATIVE_POLARITY = 010000b 00000025: BYTE_MASK = 1000000b 00000025: 0F 1F 00 align 8 00000028: 55 89 E5 53 51 56 proc _crc32 uses ebx ecx esi, pStr 0000002E: 8B 75 08 mov esi, [pStr] 00000031: 31 DB xor ebx, ebx 00000033: F7 D3 not ebx 00000035: 56 E8 2D 00 00 00 stdcall _strlen, esi 0000003B: 83 F8 04 72 0D .while eax >= 4 00000040: F2 0F 38 F1 1E crc32 ebx, dword[esi] 00000045: 83 C6 04 add esi, 4 00000048: 83 E8 04 sub eax, 4 0000004B: EB EE .endw 0000004D: 85 C0 74 09 .while eax 00000051: F2 0F 38 F0 1E crc32 ebx, byte[esi] 00000056: 46 inc esi 00000057: 48 dec eax 00000058: EB F3 .endw 0000005A: F7 D3 not ebx 0000005C: 89 D8 mov eax, ebx 0000005E: 5E 59 5B C9 C2 04 00 ret 00000065: localbytes = current 00000065: purge ret?,locals?,endl?,proclocal? 00000065: end namespace 00000065: purge endp? 00000065: CC CC CC align 8, 0xCC 00000068: 55 89 E5 51 52 proc _strlen uses ecx edx, pStr 0000006D: 8B 4D 08 mov ecx, [pStr] 00000070: 89 CA mov edx, ecx 00000072: B8 F0 FF FF FF mov eax, -16 00000077: 66 0F EF C0 pxor xmm0, xmm0 0000007B: .repeat 0000007B: 83 C0 10 add eax, 16 0000007E: 66 0F 3A 63 04 02 08 pcmpistri xmm0, dqword[edx + eax], 1000b 00000085: 75 F4 .until ZERO? 00000087: 01 C8 add eax, ecx 00000089: 5A 59 C9 C2 04 00 ret 0000008F: localbytes = current 0000008F: purge ret?,locals?,endl?,proclocal? 0000008F: end namespace 0000008F: purge endp? 0000008F: 90 align 8 00000090: 55 89 E5 53 51 52 proc _strcmp uses ebx ecx edx, pStr1, pStr2 00000096: 8B 4D 08 mov ecx, [pStr1] 00000099: 8B 55 0C mov edx, [pStr2] 0000009C: 89 C8 mov eax, ecx 0000009E: 29 D0 sub eax, edx 000000A0: 83 EA 10 sub edx, 16 000000A3: BB F0 FF FF FF mov ebx, -16 000000A8: STRCMP_LOOP: 000000A8: 83 C3 10 add ebx, 16 000000AB: 83 C2 10 add edx, 16 000000AE: F3 0F 6F 02 movdqu xmm0, dqword[edx] 000000B2: 66 0F 3A 63 04 02 18 pcmpistri xmm0, dqword[edx + eax], EQUAL_EACH + NEGATIVE_POLARITY 000000B9: 77 ED ja STRCMP_LOOP 000000BB: 72 09 jc STRCMP_DIFF 000000BD: 31 C0 xor eax, eax 000000BF: 5A 59 5B C9 C2 08 00 ret 000000C6: STRCMP_DIFF: 000000C6: 89 D8 mov eax, ebx 000000C8: 01 C8 add eax, ecx 000000CA: 5A 59 5B C9 C2 08 00 ret 000000D1: localbytes = current 000000D1: purge ret?,locals?,endl?,proclocal? 000000D1: end namespace 000000D1: purge endp? 000000D1: 0F 1F 80 00 00 00 00 align 8 000000D8: 55 89 E5 51 52 57 56 proc _strstr uses ecx edx edi esi, sStrToSearch, sStrToFind 000000DF: 8B 4D 08 mov ecx, [sStrToSearch] 000000E2: 8B 55 0C mov edx, [sStrToFind] 000000E5: 66 0F EF D2 pxor xmm2, xmm2 000000E9: F3 0F 6F 12 movdqu xmm2, dqword[edx] 000000ED: 66 0F EF DB pxor xmm3, xmm3 000000F1: 8D 41 F0 lea eax, [ecx - 16] 000000F4: STRSTR_MAIN_LOOP: 000000F4: 83 C0 10 add eax, 16 000000F7: 66 0F 3A 63 10 0C pcmpistri xmm2, dqword[eax], EQUAL_ORDERED 000000FD: 77 F5 ja STRSTR_MAIN_LOOP 000000FF: 73 30 jnc STRSTR_NOT_FOUND 00000101: 01 C8 add eax, ecx 00000103: 89 D7 mov edi, edx 00000105: 89 C6 mov esi, eax 00000107: 29 F7 sub edi, esi 00000109: 83 EE 10 sub esi, 16 0000010C: @@: 0000010C: 83 C6 10 add esi, 16 0000010F: F3 0F 6F 0C 3E movdqu xmm1, dqword[esi + edi] 00000114: 66 0F 3A 62 D9 58 pcmpistrm xmm3, xmm1, EQUAL_EACH + NEGATIVE_POLARITY + BYTE_MASK 0000011A: F3 0F 6F 26 movdqu xmm4, dqword[esi] 0000011E: 66 0F DB E0 pand xmm4, xmm0 00000122: 66 0F 3A 63 CC 18 pcmpistri xmm1, xmm4, EQUAL_EACH + NEGATIVE_POLARITY 00000128: 77 E2 ja @b 0000012A: 73 0F jnc STRSTR_FOUND 0000012C: 83 E8 0F sub eax, 15 0000012F: EB C3 jmp STRSTR_MAIN_LOOP 00000131: STRSTR_NOT_FOUND: 00000131: 31 C0 xor eax, eax 00000133: 5E 5F 5A 59 C9 C2 08 00 ret 0000013B: STRSTR_FOUND: 0000013B: 2B 45 08 sub eax, [sStrToSearch] 0000013E: 40 inc eax 0000013F: 5E 5F 5A 59 C9 C2 08 00 ret 00000147: localbytes = current 00000147: purge ret?,locals?,endl?,proclocal? 00000147: end namespace 00000147: purge endp?  
      procf and forcea macros
      In my previous post I spoke about the force macro and why the need for it. I added two more macros (procf and forcea) that combine the two and also sets align.assume to the same function. As clarified in the previous post, you should only have to use these macros for the first procedure being defined (since nothing calls that procedure). And since its the first function, it should be the starting memory address which is a good place to initially set the align.assume address to. 
      Attached package should include everything needed and has all the previous examples I posted updated. Let me know if I missed something or you have any issues running the examples and thanks for looking
      fasmg 10-26-2019.zip
      Previous versions:
       
       
    • By Beege
      Heres a function for searching for a bitmap within another bitmap. The heart of it is written assembly (source included) and working pretty quick I feel. I have included an example which is pretty basic and should be easily enough for anyone to get the concept. 
      You will be given a small blue window that will take a screencapture of that size:

       
      It will then take a full screenshot and highlight all locations that it found

      Please let me know if you have any issues or questions. Thanks!
       
      Update 8/5/2019:
      Rewrote for fasmg. Added full source with everything needed to modify
      BmpSearch_8-5-2019.7z
      BmpSearch.zip
       
      GAMERS - Asking for help with ANY kind of game automation is against the forum rules. DON'T DO IT.
    • By Beege
      Years ago I tried to put some functionality together to do some of this here. I started off in the right direction but it ended up getting out of control. Any new thing I learned along the way (as I was creating it), I kept trying to add in and it all became a mess. One of my primary goals with that was to make sure the code could always be pre-compiled and still run. That part did work and I was able create a couple of good projects with it, but still a lot of parts I wouldn't consider correct now and certainly not manageable. 
      Here is a redo of what I was going for there only this time I'm not going to be generating any of the assembly code. That's all going to be done using the built in macro engine already within fasm.dll and the macros written by Tomasz Grysztar (creator of fasm) so this time I don't have to worry about any of the code that gets generated. Im not going to touch the source at all. In fact there is not even going to be _fasmadd or global variables tracking anything. None of that is needed with the added basic and extended headers that you can read more about in the fasm documentation. You can use almost all of whats in the documentation section for basic/extended headers but ignore the parts about import,exports,resources,text encoding. doesn't really apply here.
      Here are examples I came up with that covers a lot of core functionality to write assembly code in a manner that you already know how. If/while using multiple conditional logic statements,  multiple functions, local variables, global variables, structures, COM interfaces, strings as parameters, nesting function calls. These are all things you dont even have to think about when your doing it in autoit and I'm hoping this helps bring some of that same comfort to fasm. 
      These 3 simple callback functions will be used through out the examples  
      Global $gConsoleWriteCB = DllCallbackRegister('_ConsoleWriteCB', 'dword', 'str;dword'), $gpConsoleWriteCB = DllCallbackGetPtr($gConsoleWriteCB) Global $gDisplayStructCB = DllCallbackRegister('_DisplayStructCB', 'dword', 'ptr;str'), $gpDisplayStructCB = DllCallbackGetPtr($gDisplayStructCB) Global $gSleepCB = DllCallbackRegister('_SleepCB', 'dword', 'dword'), $gpSleepCB = DllCallbackGetPtr($gSleepCB) Func _ConsoleWriteCB($sMsg, $iVal) ConsoleWrite($sMsg & $iVal & @CRLF) EndFunc ;==>_ConsoleWriteCB Func _DisplayStructCB($pStruct, $sStr) _WinAPI_DisplayStruct(DllStructCreate($sStr, $pStruct), $sStr, 'def=' & $sStr) EndFunc ;==>_DisplayStructCB Func _SleepCB($iSleep) Sleep($iSleep) EndFunc ;==>_SleepCB  
      proc/endp - like func and endfunc with some extra options. "uses" statement will preserve the registers specified. stdcall is the default call type if not specified. DWORD is the default parameter size if not specified. ret value is also handled for you. You don't have to worry about adjusting a number every time you throw on an extra parameter. In fact you don't ever have to specify/touch ebp/esp at all with these macros. See Basic headers -> procedures for full description.
      force - just a macro I added for creating a anonymous label for the first/primary function to ensure the code gets generated. The problem we are getting around is this: in our example, _main is never actually called anywhere within fasm code and fasm engine detects that and thinks the code is doing nothing. Because of that it wants to skip generating that code and all code that was called by it leaving you with nothing. This is actually a great feature but we obviously want to make an exception for our main/initial/primary function that starts it all off so thats all this does.
      Func _Ex_Proc() $g_sFasm = '' _('force _main') _('proc _main uses ebx, parm1, parm2') ; _('proc _main stdcall uses ebx, parm1:DWORD, parm2:DWORD'); full statement _(' mov ebx, [parm1]') _(' add ebx, [parm2]') _(' mov eax, ebx') _(' ret') _('endp') Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) Local $iAdd = DllCallAddress('dword', DllStructGetPtr($tBinary), 'dword', 5, 'dword', 5) ConsoleWrite('Parm1+Parm2=' & $iAdd[0] & @CRLF) EndFunc ;==>_Ex_Proc  
          Here Im showing you calling _ConsoleWriteCB autoit function we set up as a callback. Its how you would call any function in autoit from fasm.
          Strings - Notice Im creating and passing "edx = " string to the function on the fly. So helpful!
          invoke - same as a stdcall with brackets []. Use this for when calling autoit functions
       
      Func _Ex_Callback() $g_sFasm = '' _('force _main') _('proc _main, pConsoleWriteCB, parm1, parm2') _(' mov edx, [parm1]') _(' add edx, [parm2]') _(' invoke pConsoleWriteCB, "edx = ", edx') ; ;~ _(' stdcall [pConsoleWriteCB], "edx = ", edx') ; same as invoke _(' ret') _('endp') Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) DllCallAddress('ptr', DllStructGetPtr($tBinary), 'ptr', $gpConsoleWriteCB, 'dword', 5, 'dword', 5) EndFunc ;==>_Ex_Callback  
      Showing .while/.endw, .if/.elseif/.else/.endif usage. .repeat .until are also macros you can use. See Extended Headers -> Structuring the source. Ignore .code, .data, .end - Those are gonna be more for a full exe.
      invokepcd/invokepd - these are macros I added that are the same as invoke, just preserve (push/pop) ECX or both ECX and EDX during the call. Below is also a good example of what can happen when you don't preserve registers that are caller saved (us calling the function) vs callie saved (us creating the function). EAX,ECX,EDX are all caller saved so when we call another function like the autoit callback _ConsoleWriteCB, those registers could have very different values then what was in them before the call. This function below should do at least two loops, but it doesn't (at least on my pc) without preserving ECX because ECX is no longer zero when the function returns.
      Keep the same thought in mind for registers EBX,ESI,EDI when you are creating assembly functions (callie saved). If your functions uses those registers, You need to preserve and restore them before your code returns back to autoit or else you could cause a similar effect to autoit. "trashing" registers is a term I've seen used alot when referring to these kind of mistakes
      Func _Ex_IfElseWhile() $g_sFasm = '' _('force _main') _('proc _main uses ebx, pConsoleWriteCB') _(' xor edx, edx') ; edx=0 _(' mov eax, 99') ; _(' mov ebx, 10') _(' xor ecx, ecx') ; ecx=0 _(' .while ecx = 0') _(' .if eax<=100 & ( ecx | edx )') ; not true on first loop _(' inc ebx') _(' invokepcd pConsoleWriteCB, "Something True - ebx=", ebx') _(' ret') _(' .elseif eax < 99') ; Just showing you the elseif statement _(' inc ebx') _(' .else') ;~ _(' invokepcd pConsoleWriteCB, "Nothing True - ebx=", ebx') ; comment this and uncomment the line below _(' invoke pConsoleWriteCB, "Nothing True - ebx=", ebx') _(' inc edx') ; this will make next loop true _(' .endif') _(' .endw') _(' ret') _('endp') Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) DllCallAddress('dword', DllStructGetPtr($tBinary), 'ptr', $gpConsoleWriteCB) EndFunc ;==>_Ex_IfElseWhile  
          Sub Functions : You already understand this. Not really "sub", its just another function you call. And those functions call other functions and so on.
          fix : syntax sugar - Look how easy it was to replace invoke statement with our actual autoit function name
          ptr : more sugar - same thing as using brackets [parm1]
          Nesting : In subfunc1 we pass the results of two function calls to the same function we are calling
      Func _Ex_SubProc() $g_sFasm = '' ;replace all '_ConsoleWriteCB' statments with 'invoke pConsoleWriteCB' before* assembly _('_ConsoleWriteCB fix invoke pConsoleWriteCB') _('force _main') _('proc _main uses ebx, pConsoleWriteCB, parm1, parm2') _(' mov ebx, [parm1]') _(' add ebx, [parm2]') _(' _ConsoleWriteCB, "ebx start = ", ebx') _(' stdcall _subfunc1, [pConsoleWriteCB], [parm1], [parm2]') _(' _ConsoleWriteCB, "ebx end = ", ebx') _(' ret') _('endp') ; _('proc _subfunc1 uses ebx, pConsoleWriteCB, parm1, parm2') _(' mov ebx, [parm1]') _(' _ConsoleWriteCB, " subfunc1 ebx start = ", ebx') _(' stdcall _SubfuncAdd, <stdcall _SubfuncAdd, [parm1], [parm2]>, <stdcall _SubfuncAdd, ptr parm1, ptr parm2>') ; Nesting functions _(' _ConsoleWriteCB, " _SubfuncAdd nested <5+5><5+5> = ", eax') _(' _ConsoleWriteCB, " subfunc1 ebx end = ", ebx') _(' ret') _('endp') ; _('proc _SubfuncAdd uses ebx, parm1, parm2') _(' mov ebx, [parm1]') _(' add ebx, [parm2]') _(' mov eax, ebx') _(' ret') _('endp') Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) DllCallAddress('dword', DllStructGetPtr($tBinary), 'ptr', $gpConsoleWriteCB, 'dword', 5, 'dword', 5) EndFunc ;==>_Ex_SubProc  
      This demonstrates the struct macro. See basic headers -> Structures for more info
      _FasmAu3StructDef will create an equivalent formated structure definition. All elements already have a sizeof.#name created internally. So in this example sizeof.AUTSTRUCT.x would equal 8. sizeof.AUTSTRUCT.z would equal 16 (2*8). I have added an additional one sot.#name (sizeoftype) for any array that gets created. Below is the source of what gets generate from 'dword x;dword y;short z[8]'. Also dont get confused that in fasm data definitions,  d is for data as in db (data byte) or dw (data word). Not double like it is in autoit's dword (double word). See intro -> assembly syntax -> data definitions
         
      struct AUTSTRUCT x dd ? y dd ? z dw 8 dup ? ends define sot.AUTSTRUCT.z 2 Func _Ex_AutDllStruct() $g_sFasm = '' Local Const $sTag = 'dword x;dword y;short z[8]' _(_FasmAu3StructDef('AUTSTRUCT', $sTag)) _('force _main') _('proc _main uses ebx, pDisplayStructCB, pAutStruct') _(' mov ebx, [pAutStruct]') ; place address of autoit structure in ebx _(' mov [ebx+AUTSTRUCT.x], 1234') _(' mov [ebx+AUTSTRUCT.y], 4321') _(' xor edx, edx') _(' mov ecx, 5') ; setup ecx for loop instruction _(' Next_Z_Index:') ; set elements 1-6 (0-5 here in fasm) _(' mov [ebx+AUTSTRUCT.z+(sot.AUTSTRUCT.z*ecx)], cx') ; cx _(' loop Next_Z_Index') _(' invoke pDisplayStructCB, [pAutStruct], "' & $sTag & '"') _(' mov [ebx+AUTSTRUCT.z+(sot.AUTSTRUCT.z*6)], 666') _(' mov [ebx+AUTSTRUCT.z+(sot.AUTSTRUCT.z*7)], 777') _(' ret') _('endp') Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) Local $tAutStruct = DllStructCreate($sTag) DllCallAddress('ptr', DllStructGetPtr($tBinary), 'ptr', $gpDisplayStructCB, 'struct*', $tAutStruct) _WinAPI_DisplayStruct($tAutStruct, $sTag) EndFunc ;==>_Ex_AutDllStruct  
      Here shows the locals/endl macros for creating local variables. See basic headers -> procedures. We create a local string and the same dll structure as above. Notice that you can initialize all the values of the structure on creation. There is a catch to this though that I will show you in next example.
      addr macro - This will preform the LEA instruction in EDX and then push the address on to the stack. This is awesome, just remember its using EDX to perform that and does not preserve it. You'll pretty much want to use that for any local variables you are passing around.
      Edit: I shouldn't say things like that so causally.  Use the addr macro as much as you want but remember that it is adding a couple of extra instuctions each time you use it so if your calling invoke within a loop and ultimate performance is one of your goals, you should probably perform the LEA instructions before the loop and save the pointer to a separate variable that your would then use in the loop. 
      Func _Ex_LocalVarsStruct() $g_sFasm = '' Local Const $sTag = 'dword x;dword y;short z[8]' _(_FasmAu3StructDef('POINT', $sTag)) _('force _main') _('proc _main, pDisplayStructCB') _(' locals') _(' sTAG db "' & $sTag & '", 0') ; define local string. the ', 0' at the end is to terminate the string. _(' tPoint POINT 1,2,<0,1,2,3,4,5,6,7>') ; initalize values in struct _(' endl') _(' invoke pDisplayStructCB, addr tPoint, addr sTAG') _(' mov [tPoint+POINT.x], 4321') _(' mov [tPoint+POINT.z+sot.POINT.z*2], 678') _(' invoke pDisplayStructCB, addr tPoint, addr sTAG') _(' ret') _('endp') Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) Local $ret = DllCallAddress('ptr', DllStructGetPtr($tBinary), 'ptr', $gpDisplayStructCB) EndFunc ;==>_Ex_LocalVarsStruct  
      Back to the catch. Alignment is the problem here but only with the initializes. I'm handling all the alignment ok so you don't have to worry about that for creating structures that need alignment, only if you are using the one liner initialize in locals. The problem comes from extra padding being defined to handle the alignment, but fasm doesn't really know its just padding so without adding extra comma's to the initiator statement, your data ends up in the padding or simply fails. The _FasmFixInit will throw in the extra commas needed to skip the padding.
      Func _Ex_LocalVarStructEx() $g_sFasm = '' $sTag = 'byte x;short y;char sNote[13];long odd[5];word w;dword p;char ext[3];word finish' _(_FasmAu3StructDef('POINT', $sTag)) _('force _main') _('proc _main, pDisplayStructCB') _(' locals') _(' tPoint POINT ' & _FasmFixInit('1,222,<"AutoItFASM",0>,<41,43,43,44,45>,6,7,"au3",12345', $sTag)) _(' endl') _(' invoke pDisplayStructCB, addr tPoint, "' & $sTag & '"') _(' ret') _('endp') Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) DllCallAddress('dword', DllStructGetPtr($tBinary), 'ptr', $gpDisplayStructCB) EndFunc ;==>_Ex_LocalVarStructEx  
      I love this one and it is really not even that hard to explain. We got multiple functions and want to be able to call them individually. Here I simply use the primary function to tell me where all the functions are. I load all the offsets (byte distance from start of code) of each each function in to a dllstruct, then once its passed back to autoit, adjust all the offsets by where they are actually located in memory (pointer to dll). From there you can call each individual function as shown previously. full code is in the zip. 
      String functions came from link below. I ended up modifying strcmp to get a value I understand. CRC32 func is all mine. Made it so easy being able to call _strlen and then use while statements like I normally would    https://www.strchr.com/strcmp_and_strlen_using_sse_4.2
      Func _Ex_SSE4_Library() $g_sFasm = '' _('force _main') _('proc _main stdcall, pAdd') _(' mov eax, [pAdd]') _(' mov dword[eax], _crc32') _(' mov dword[eax+4], _strlen') _(' mov dword[eax+8], _strcmp') _(' mov dword[eax+12], _strstr') _(' ret') _('endp') _('proc _crc32 uses ebx ecx esi, pStr') ; _('endp') _('proc _strlen uses ecx edx, pStr') ; _('endp') _('proc _strcmp uses ebx ecx edx, pStr1, pStr2') ; ecx = string1, edx = string2' ; _('endp') _('proc _strstr uses ecx edx edi esi, sStrToSearch, sStrToFind') ; _('endp') Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) Local $pBinary = DllStructGetPtr($tBinary) Local $sFunction_Offsets = 'dword crc32;dword strlen;dword strcmp;dword strstr' $tSSE42 = DllStructCreate($sFunction_Offsets) $ret = DllCallAddress('ptr', $pBinary, 'struct*', $tSSE42) _WinAPI_DisplayStruct($tSSE42, $sFunction_Offsets, 'Function Offsets') ;Correct all addresses $tSSE42.crc32 += $pBinary $tSSE42.strlen += $pBinary $tSSE42.strcmp += $pBinary $tSSE42.strstr += $pBinary $sTestStr = 'This is a test string!' ConsoleWrite('$sTestStr = ' & $sTestStr & @CRLF) $iCRC = DllCallAddress('int', $tSSE42.crc32, 'str', $sTestStr) ConsoleWrite('CRC32 = ' & Hex($iCRC[0]) & @CRLF) $aLen = DllCallAddress('int', $tSSE42.strlen, 'str', $sTestStr) ConsoleWrite('string len = ' & $aLen[0] & ' :1:' & @CRLF) $aFind = DllCallAddress('int', $tSSE42.strcmp, 'str', $sTestStr, 'str', 'This iXs a test') ConsoleWrite('+strcmp = ' & $aFind[0] & @CRLF) $aStr = DllCallAddress('int', $tSSE42.strstr, 'str', 'This is a test string!', 'str', 'test') ConsoleWrite('Strstr = ' & $aStr[0] & @CRLF) EndFunc ;==>_Ex_SSE4_Library  
      I'm extremely happy I got a com interface example working. I AM. That being said.. I'm pretty fucking annoyed I cant find the original pointer when using using built in ObjCreateInterface I've tired more than just whats commented out. It anyone has any input (I know someone here does!) that would be great. Using the __ptr__ from _autoitobject works below. Example will delete the tab a couple times.
      Edit: Got that part figured out. Thanks again trancexx!
      Func _Ex_ComObjInterface() $g_sFasm = '' ;~ _AutoItObject_StartUp() ;~ Local Const $sTagITaskbarList = "QueryInterface long(ptr;ptr;ptr);AddRef ulong();Release ulong(); HrInit hresult(); AddTab hresult(hwnd); DeleteTab hresult(hwnd); ActivateTab hresult(hwnd); SetActiveAlt hresult(hwnd);" ;~ Local $oList = _AutoItObject_ObjCreate($sCLSID_TaskbarList, $sIID_ITaskbarList, $sTagITaskbarList) Local Const $sCLSID_TaskbarList = "{56FDF344-FD6D-11D0-958A-006097C9A090}", $sIID_ITaskbarList = "{56FDF342-FD6D-11D0-958A-006097C9A090}" Local Const $sTagITaskbarList = "HrInit hresult(); AddTab hresult(hwnd); DeleteTab hresult(hwnd); ActivateTab hresult(hwnd); SetActiveAlt hresult(hwnd);" Local $oList = ObjCreateInterface($sCLSID_TaskbarList, $sIID_ITaskbarList, $sTagITaskbarList) _('interface ITaskBarList,QueryInterface,AddRef,Release,HrInit,AddTab,DeleteTab,ActivateTab,SetActiveAlt') ; _('force _main') _('proc _main uses ebx, pSleepCB, oList, pGUIHwnd') _(' comcall [oList],ITaskBarList,HrInit') _(' xor ebx, ebx') _(' .repeat') _(' invoke pSleepCB, 500') ; wait _(' comcall [oList],ITaskBarList,DeleteTab,[pGUIHwnd]') ; delete _(' invoke pSleepCB, 500') ; wait _(' comcall [oList],ITaskBarList,AddTab,[pGUIHwnd]') ; add back _(' comcall [oList],ITaskBarList,ActivateTab,[pGUIHwnd]') ; actvate _(' inc ebx') _(' .until ebx=4') _(' ret') _('endp') Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) Local $GUI = GUICreate("_Ex_ComObjInterface ------ DeleteTab") GUISetState() ;~ DllCallAddress('ptr', DllStructGetPtr($tBinary), 'ptr', $gpSleepCB, 'ptr', $oList.__ptr__, 'dword', Number($GUI)) DllCallAddress('ptr', DllStructGetPtr($tBinary), 'ptr', $gpSleepCB, 'ptr', $oList(), 'dword', Number($GUI)) EndFunc ;==>_Ex_ComObjInterface  
      Lastly here is an example of how to use a global variable. Without using the org statement, this value is just an offset like the functions in the library example. In order for your code to know that location, it needs to know where the real starting address is so we have to pass that to our functions. Once you have it, if you write your code proper and preserve registers correctly, you can just leave in EBX. From what I understand, if all functions are following stdcall rules, that register shouldn't change in less you change it. Something cool and important to remember is these variables will hold whatever values left in them till you wipe the memory (dll structure) holding your code. keep that in mind if you made your dll structure with a static keyword. If thats the case treat them like static variables
      Func _Ex_GlobalVars() $g_sFasm = '' _('_ConsoleWriteCB fix invoke pConsoleWriteCB') ; _('force _main') _('proc _main uses ebx, pMem, pConsoleWriteCB, parm1') _(' mov ebx, [pMem]') ; This is where are code starts in memory. _(' mov [ebx + g_Var1], 111') _(' add [ebx + g_Var1], 222') _(' _ConsoleWriteCB, "g_Var1 = ", [ebx + g_Var1]') _(' stdcall subfunc1, [pMem], [pConsoleWriteCB], [parm1]') _(' mov eax, g_Var1') _(' ret') _('endp') ; _('proc subfunc1 uses ebx, pMem, pConsoleWriteCB, parm1') _(' mov ebx, [pMem]') _(' mov [ebx + g_Var1], 333') _(' _ConsoleWriteCB, "g_Var1 from subfunc1= ", [ebx + g_Var1]') _(' stdcall subfunc2, [pConsoleWriteCB], [parm1]') ; no memory ptr passed. ebx should be callie saved _(' _ConsoleWriteCB, "g_Var1 from subfunc1= ", [ebx + g_Var1]') _(' stdcall subfunc2, [pConsoleWriteCB], [parm1]') _(' ret') _('endp') ; _('proc subfunc2, pConsoleWriteCB, parm1') _(' add [ebx + g_Var1], 321') _(' _ConsoleWriteCB, "g_Var1 from subfunc2= ", [ebx + g_Var1]') _(' ret') _('endp') ; _('g_Var1 dd ?') ; <--------- Global Var Local $tBinary = _FasmAssemble($g_sFasm) If @error Then Exit (ConsoleWrite($tBinary & @CRLF)) Local $iOffset = DllCallAddress('dword', DllStructGetPtr($tBinary), 'struct*', $tBinary, 'ptr', $gpConsoleWriteCB, 'dword', 55)[0] ConsoleWrite('$iOffset = ' & $iOffset & @CRLF) Local $tGVar = DllStructCreate('dword g_Var1', DllStructGetPtr($tBinary) + $iOffset) ConsoleWrite('Directly access g_Var1 -> ' & $tGVar.g_Var1 & @CRLF) ; direct access EndFunc ;==>_Ex_GlobalVars  
      FasmEx.zip
×
×
  • Create New...