Jump to content

MS Classic Bounce Multithreading Example with 128 threads


Beege
 Share

Recommended Posts

That's kinda cool code :thumbsup: Thanks for sharing.

 

After a long time of assembler abstinence I currently try some ASM code to speed up my TGA loader. Currently I'm stuck with x64 ASM code... 

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Thanks UEZ!

I took a look at your loader and have been getting caught up on TGA format. I cant believe I never heard of it. I love the simple header. Replacing those For loops would definitely speed things up. The parts for me that I always get stuck on are dealing with floats.  Have you got asm code working for 32bit yet?

Link to comment
Share on other sites

33 minutes ago, Beege said:

Thanks UEZ!

I took a look at your loader and have been getting caught up on TGA format. I cant believe I never heard of it. I love the simple header. Replacing those For loops would definitely speed things up. The parts for me that I always get stuck on are dealing with floats.  Have you got asm code working for 32bit yet?

Well, for the TGA loader loading 32-bit image should be relativ fast as it a 1d loop. Only the 2d loops take very long time for larger images.

Currently I've done the x86 ASM code for 15/16 bit images but for x64 the code doesn't work. E.g. loading a 15-bit image with 2789x3500 dim. using native AutoIt it takes on my machine more than 134992 ms, with ASM 37 ms -> 3.648x faster!

This applies only to 8/15/16/24-bit images whose width is not a divider of 4. 

Here the part of the UDF which can be replaced with the non-pro ASM code

Case 15, 16, 24, 32 ;15/16/24/32-bit, as the bitmap format is the same we can use memcpy to copy the pixel data directly to the memory.
                            ;Exeptions are 15/16/24-bit images whose width is not a divider of 4!
            If BitOR($iPxDepth = 15, $iPxDepth = 16, $iPxDepth = 24) And Mod($iW, 4) Then
                Switch $iPxDepth
                    Case 15, 16
                        Local Const $bBinASM1516_x86 = Binary("0x5589E58B5D188B4D1C89C8F7651089452489C8F765148945285389D8D1E08B552801C203550C8B752401C60375200375088B066689024B83FB0075DE5B4983F90075C65DC22400")
                        Local $tBinASM1516_x86 = DllStructCreate("byte asm[" & BinaryLen($bBinASM1516_x86) & "]")
                        $tBinASM1516_x86.asm = $bBinASM1516_x86
                        Local $tMemVar1 = DllStructCreate("dword var"), $tMemVar2 = DllStructCreate("dword var")
                        DllCallAddress("none", DllStructGetPtr($tBinASM1516_x86), _
                                       "ptr", DllStructGetPtr($tSrcBmp), "ptr", DllStructGetPtr($tDestBmp), _
                                       "dword", $iW * 2, "dword", $stride, "dword", $iW - 1, "dword", $iH - 1, "dword", $pitch, _
                                       "ptr", DllStructGetPtr($tMemVar1), "ptr", DllStructGetPtr($tMemVar2))
                    Case 24

The ASM code:

#cs _ASM1516_x86
    use32
    ;pushad

    define tSrcBmp  dword[ebp + 08]
    define tDestBmp dword[ebp + 12]
    define strideS  dword[ebp + 16]
    define strideD  dword[ebp + 20]
    define width    dword[ebp + 24]
    define height   dword[ebp + 28]
    define pitch    dword[ebp + 32]
    define tMemVar1 dword[ebp + 36]
    define tMemVar2 dword[ebp + 40]

    push ebp
    mov ebp, esp

    mov ebx, width ;exc = w - 1
    mov ecx, height ;ecx = h - 1

;~  _ASMDBG_()

    _y:
        mov eax, ecx
        mul strideS
        mov tMemVar1, eax

        mov eax, ecx
        mul strideD
        mov tMemVar2, eax
        push ebx
        _x:
            mov eax, ebx
            shl eax, 1

            mov edx, tMemVar2
            add edx, eax
            add edx, tDestBmp

            mov esi, tMemVar1
            add esi, eax
            add esi, pitch
            add esi, tSrcBmp

            mov eax, [esi]
            mov word[edx], ax

            dec ebx
            cmp ebx, 0
            jne _x
        pop ebx
        dec ecx
        cmp ecx, 0
        jne _y

    pop ebp
    ;popad

    ret 36
#ce _ASM1516_x86

Any idea how to convert is to a working x64 version?

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...