Jump to content

Base64 Converter


mikeytown2
 Share

Recommended Posts

I thought it would be useful if for example you were encoding multiple JPGs into a web page - you'd want a total progress bar for the page, and then child progress bars to show the progress for each individual image encoding. You might even want a parent progress bar if you are encoding multiple pages. My Progress Bar UDF was designed for that type of nesting. I just added another post to try to explain that better.

I was looking at your code and it appears that it only works if the progress bar is one that you made. Try this to see what I'm saying... (Everyone Else: this code requires this udf)

#include <MWR_Progress.au3>

ProgressOn("MWR_Progress UDF Demo", "The Big One")
_ProgressCreate('Looping $i... 0% Done', '', 'MWR_Progress UDF Demo')
For $i = 1 to 3
    _ProgressCreate('Looping $j... 0% Done')
    For $j = 1 to 4
        _ProgressCreate('Looping $k... 0% Done')
        For $k = 1 to 5
            _ProgressCreate('Looping $l... 0% Done')
            For $l = 1 to 20
                sleep(10)
                _ProgressUpdate($l * 5, $l, 'Looping $l... ' & Round($l * 100 / 20) & '% Done')
            Next
            _ProgressDelete()
            _ProgressUpdate($k * 20, '', 'Looping $k... ' & Round($k * 100 / 5) & '% Done')
        Next
        _ProgressDelete()
        _ProgressUpdate($j * 25, '', 'Looping $j... ' & Round($j * 100 / 4) & '% Done')
    Next
    _ProgressDelete()
    ProgressSet($i * 33)
    _ProgressUpdate($i * 33, '', 'Looping $i... ' & Round($i * 100 / 3) & '% Done')
Next
sleep(1000)
_ProgressDelete()
ProgressOff()
If your code could grab the all ready created ProgressOn or grab the ProgressOn when it becomes created then it would be perfect. Also everything comes to a halt if you click/move your gui, which is not a good thing. Don't get me wrong its some cool code :geek: but as is I cant use it. If you know how to pass the percentage out of the function while its running so that it can be updated (multi threading?) then that would work. But i don't see an easy way of doing this. If you want to make a separate Base64 file with examples that includes your progress bar, by all means please do it. But that version of this code will be rejected as a udf because it will call your own udf. My main interest is making a udf that will get accepted.

Minor issue in your example is that you write to the files in append mode, so each time you run this they just get bigger. Also you never compare the original file against the decoded file to prove that the encode/decode was flawless.

I did a quick search and found this, it works with binary now so i used this as a base even though the code is trivial: Compare Files. I also set the write mode in the examples to 2 for the encode/decode example, i left the html example as an append. Thanks for the Tips! It makes testing this a lot easier now. Get the new _Base64_Example file in the second post

Thank You for your input, if you/anyone has any more thoughts/ideas/bugs I would like to hear it :o

Link to comment
Share on other sites

I was looking around and i remembered how auto it now supported the &= operator. Anyway long story short HUGE SPEED BOOSTS! The encoder seems to get the biggest boost. On a file that used to take 177 seconds to encode it now takes 15 seconds. The bigger the file the bigger the change will be compared to the old code. The decoder is quicker as well, but not as much as i was hoping for.

Go grab the new _Base64 file from the second post!

Link to comment
Share on other sites

OK I made a version that uses my multi-line progress bars. Ugh, I never realized how slow they are, especially when you update them within a tight loop. I'll have to optimize them, or more than likely just make them run in another thread or something. Ugh. To try to speed it up, I only update every 76 characters instead of every 4 (like the newlines) but it still slow - about 60% slower. The benefits are a more detailed progress display, and a cancellation feature.

I'll throw the zip file up here, if anyone cares to see it. I also put 2 pics in there - sorry about the content, I just went to my pictures folder and those were the 2 smallest pics I had. If you want to try your own pics, uncomment the FileOpenDialog line near the beginning of the script. The whole thing takes about 25 seconds to run on my machine (P4 2.2GHz)

Enjoy!

(Oh yeah, and I used the &= trick mikeytown2 mentioned, and also did some line-trimming to make the code a little more readable)

20060303_MWR_Base64.zip

Edited by blindwig
Link to comment
Share on other sites

OK I made a version that uses my multi-line progress bars. Ugh, I never realized how slow they are, especially when you update them within a tight loop. I'll have to optimize them, or more than likely just make them run in another thread or something. Ugh. To try to speed it up, I only update every 76 characters instead of every 4 (like the newlines) but it still slow - about 60% slower. The benefits are a more detailed progress display, and a cancellation feature.

I'll throw the zip file up here, if anyone cares to see it. I also put 2 pics in there - sorry about the content, I just went to my pictures folder and those were the 2 smallest pics I had. If you want to try your own pics, uncomment the FileOpenDialog line near the beginning of the script. The whole thing takes about 25 seconds to run on my machine (P4 2.2GHz)

Enjoy!

(Oh yeah, and I used the &= trick mikeytown2 mentioned, and also did some line-trimming to make the code a little more readable)

Awesome script! I would set an error if you press the cancel button. I have a amd +2600 and it took 12 seconds... didn't expect that big of a difference. Thanks for coding it.
Link to comment
Share on other sites

I made some more changes, and some good progress!

Encoder:

25% faster when using linewrap and/or progressbars. I got this by removing the mod() function and using a nested loop to tell when to break a line or update the progress bar. This also saves a little math (don't have to multiply everything by 3 anymore)

Decoder:

60% faster - a big improvement! Decoding is almost as fast as encoding now. I removed the decoding sub-function and put it a hard-coded look-up table (like what we did for the encoding routine back at the beginning of this thread). It's much faster now!

I tried this on several different sized files, so hopefully no bugs this time.

20060306_MWR_Base64.zip

Link to comment
Share on other sites

I made some more changes, and some good progress!

Encoder:

25% faster when using linewrap and/or progressbars. I got this by removing the mod() function and using a nested loop to tell when to break a line or update the progress bar. This also saves a little math (don't have to multiply everything by 3 anymore)

Decoder:

60% faster - a big improvement! Decoding is almost as fast as encoding now. I removed the decoding sub-function and put it a hard-coded look-up table (like what we did for the encoding routine back at the beginning of this thread). It's much faster now!

I tried this on several different sized files, so hopefully no bugs this time.

Awesome Code! It looks good to me.

Now that we are processing larger files the preprocessing for the decoder is taking about 30% of the time. Its a noticeable difference now. So i brought it back as an option again. The decode function is now as fast as the encode function if you calculate the speed based on the number of bytes processed. I also made the file look more like an official UDF file. As such i kept your change of removing the bin and dec functions. They can be grabbed here if anyone needs them.

I think we have hit the limit here. I'm out of ideas on how to make this faster other then hard-coding every possible calculation. Which would be a huge task and it wouldn't be worth it.

Do you have any more ideas on how to speed this up? I put in my 2 cents over here Ideas

_Base64.au3 and _Base64_Example.au3 Updated. Grab from my Second Post

Link to comment
Share on other sites

Now that we are processing larger files the preprocessing for the decoder is taking about 30% of the time. Its a noticeable difference now. So i brought it back as an option again.

You could auto-detect wether or not the data needs to be scrubbed by checking it with a regular expression before doing the replace. Or use StringInStr - I don't know which one is faster. Just look for any character that doesn't belong.

I think we have hit the limit here. I'm out of ideas on how to make this faster other then hard-coding every possible calculation. Which would be a huge task and it wouldn't be worth it.

Hmm, well a 3byte -> 4byte lookup would be over 100MB in just data (no overhead included), and a 4byte->3byte would be over 30GB. So you're right, I don't think that would be a good idea.

However, maybe if we cut those tables in half, and build a 12bit->16bit lookup, and do 2 lookups and bitshift them together... I'll try that next time I find some time to work on this...

Link to comment
Share on other sites

For the Encoder:

For every 3 bytes, the current encoder has to 4 look-ups in an 8-bit table and do a total of 8 bit-operations for the lookup.

I wrote a new routine to only require 2 look-ups in a 12-bit table and needs to do only 4 bit-operations for the look-up.

Speed up by about 30%

For the Decoder:

I put in a routine I was working on to save time when using the progress bar. Speed gained is about 2-3%

I made it always strip CR/LF from the source.

I made it check for non-base64 characters before filtering for them.

I made the progress bar more informative to help find where slow-downs are.

I started working on a 16bit->12bit reverse look-up, but it got messy and I have to revert to a working copy. I'll try again later.

Mikeytown2 is going to hate me because I put all my new stuff in global constants, but it's just easier for me to develop that way.

Building the 12-bit table takes longer than building the 8-bit table (mathematically speaking, 16 times longer) so I put a progress bar on it for slower machines.

EDIT

Fix attachment

20060307_MWR_Base64.zip

Edited by blindwig
Link to comment
Share on other sites

For the Encoder:

For every 3 bytes, the current encoder has to 4 look-ups in an 8-bit table and do a total of 8 bit-operations for the lookup.

I wrote a new routine to only require 2 look-ups in a 12-bit table and needs to do only 4 bit-operations for the look-up.

Speed up by about 30%

For the Decoder:

I put in a routine I was working on to save time when using the progress bar. Speed gained is about 2-3%

I made it always strip CR/LF from the source.

I made it check for non-base64 characters before filtering for them.

I made the progress bar more informative to help find where slow-downs are.

I started working on a 16bit->12bit reverse look-up, but it got messy and I have to revert to a working copy. I'll try again later.

Mikeytown2 is going to hate me because I put all my new stuff in global constants, but it's just easier for me to develop that way.

Building the 12-bit table takes longer than building the 8-bit table (mathematically speaking, 16 times longer) so I put a progress bar on it for slower machines.

Cool Stuff! The encoder is a little slower for very small files, but it's not noticeable. For big files this is great! I took your hint and made the progress bar more informative just about everywhere. I took out the optional filter because StringRegExp almost has no speed effect. Give the decoder any input and it should not fail.

I got rid of all the global constants and got rid of the functions that used them. Oh and i dont mind fixing the global constants, its not that hard.

Grab the latest files from my second post

Once again thanks blindwig! Let me know when you think this is done so it can be submited as a UDF.

Link to comment
Share on other sites

I guess I still don't understand why you want to rebuild the static tables every time you call your routine. It doesn't make sense to have to rebuild something that never changes - that's the whole point of having pre-calculated look-up tables. But if that's the way you want it, you might as well do it faster:

;Build a 6-bit lookup table
Local $ac_B64[64]=['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P', _
                   'Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f', _
                   'g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v', _
                   'w','x','y','z','0','1','2','3','4','5','6','7','8','9','+','/']

;12bit -> 16bit lookup table
For $i_Count = 0 To 63
    For $i_Count2 = 0 To 63
        $as_Base64Table12[$i_Count * 64 + $i_Count2] = $ac_B64[$i_Count] & $ac_B64[$i_Count2]
    Next
Next

On my test file, this changed the time from about 1.8 seconds to 1.6 seconds.

And with the progress meter:

;Build a 6-bit lookup table
    If $s_ProgressTitle Then ProgressSet(18, 'Pre-Processing Input: Building Base64 Table 1')
    Local $ac_B64[64]=['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P', _
                       'Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f', _
                       'g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v', _
                       'w','x','y','z','0','1','2','3','4','5','6','7','8','9','+','/']
    
;12bit -> 16bit lookup table
    If $s_ProgressTitle Then 
        ProgressSet(36, 'Pre-Processing Input: Building Base64 Table 2')
        For $i_Count = 0 To 63
            For $i_Count2 = 0 To 63
                $as_Base64Table12[$i_Count * 64 + $i_Count2] = $ac_B64[$i_Count] & $ac_B64[$i_Count2]
            Next
            ProgressSet(36 + $i_Count)
        Next
    Else
        For $i_Count = 0 To 63
            For $i_Count2 = 0 To 63
                $as_Base64Table12[$i_Count * 64 + $i_Count2] = $ac_B64[$i_Count] & $ac_B64[$i_Count2]
            Next
        Next
    EndIf
Edited by blindwig
Link to comment
Share on other sites

And until (if) you do a 16-bit reverse in the decode routine, here is the 8-bit reverse hard-coded:

Local $ai_Base64Reverse[256]=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, _
                                  -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,62,-1,-1,-1,63,52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-1,-1,-1, _
                                  -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1, _
                                  -1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,-1,-1,-1,-1,-1, _
                                  -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, _
                                  -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, _
                                  -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, _
                                  -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]
Link to comment
Share on other sites

you do have a good point there, i'm assuming that some one will use this script once or twice for their script and thats it. But if their going 10+ encodes then I should make one global variable to hold the big table. Then it checks the table before it runs the encode and it its not built then it build the table. I'll work on that. I'll make decode have a global as well.

Link to comment
Share on other sites

blindwig how long will it take to build a 16bit table, i mean thats 65536 iterations. Would it even be worth it?

I'm working now on building a 14-bit table (I didn't bother with a 16-bit table, since on the reverse, the highest ascii value you have to deal with is 'z' which is 122, which can be represented with only 7 bits. So 2 7-bit characters can be used for a 14-bit look-up.

The thing is, whether I make a 16-bit or a 14-bit, the actual data is only 12 bits (4096 iterations) so it wouldn't be any slower than the 12->16 bit table we're building now.

I'm working now on a 14-bit lookup table for the decode routine. My first attempt was actually slower becasue of all the setup involved to do the lookup offset the time gained during the lookup. I'm looking into another idea now that would save time by eliminating as many conversions (chr and asc) as possible.

Link to comment
Share on other sites

I just discovered something cool -

You can use the BinaryString() function to convert a 32-bit number to a 4-byte string. This is much better than using 4 Chr() calls along with all the BtShift() and BitAND() functions. I'm working on using this trick to speed up the decoding routine (I don't know that it would be useful in the encoding routine, but I'll have to think about that) and so far speed is increased by about 50%. I'll post the final here when I get it done.

Link to comment
Share on other sites

OK, a problem with BinaryString() - it needs the bits set up differently than my function aligns them, so I have to do some math to move the bits around, and that costs time. I'm down from 50% increase to about 10% because of the extra math.

I'm also noticing that assigning and checking variables in AutoIt is really slow, compared with just nesting functions and returns, so I'm working on taking out variables that I don't need. Increase so far on the decoder is back up to 40%, and I'm not done yet.

Link to comment
Share on other sites

OK, here's my latest. Both Encoder and Decoder are up by about 50%. I moved the tables back out to global variables, and stopped passing the lookup to _Base64Conv because even passing it ByRef was costing time. Now _Base64Conv pulls from the global table directly.

We're back to the optimal rate, where the decoder is about 33% slower than the encoder, which makes sense since it has 33% more work to do.

I took out the array of bytes in the decoder and put all the functions on 2 lines. It's hard to read, but it's a lot faster! I left the loose code in a comment block so that you can see how it used to work. Uncomment that and you'll see how much slower it is - the exact same code, just using more variables, and it slows down that much.

Wow! We sure have come a long way since the first iteration! I'm running out of idea on how to tighten it down any further...

Have a look, enjoy!

20060308_MWR_Base64.zip

Link to comment
Share on other sites

All i can say is wow. this is huge! I'll look through it and keep the speed, yet get rid of some the variables outside of the functions. I'll post back when its done. Oh and now i'm completely lost on the decoder but it works so good job!

Link to comment
Share on other sites

Here is the comment that blindwig was talking about. I took it out of the base64 file, so thats why its posted here. This is for the decoder

#cs
;read 4 characters
    $ai_Bytes[0] = Asc($as_CypherText[$i_Count + 0])
    $ai_Bytes[1] = Asc($as_CypherText[$i_Count + 1])
    $ai_Bytes[2] = Asc($as_CypherText[$i_Count + 2])
    $ai_Bytes[3] = Asc($as_CypherText[$i_Count + 3])
    
;Setup for the lookup
    $ai_Bytes[4] = BitShift($ai_Bytes[0],-7) + $ai_Bytes[1]
    $ai_Bytes[5] = BitShift($ai_Bytes[2],-7) + $ai_Bytes[3]
    
;Do the lookup, setup for the BinaryString conversion
    $ai_Bytes[6] = BitShift($gcai_Base64Table14[$ai_Bytes[4]], -12) + $gcai_Base64Table14[$ai_Bytes[5]]
    $ai_Bytes[7] = BitShift($ai_Bytes[6], 16) + BitAND($ai_Bytes[6], 65280) + BitShift(BitAND($ai_Bytes[6], 255),-16)
    
;Write the BinaryString
    $s_Out &= StringLeft(BinaryString($ai_Bytes[7]),3)
#ce

$i_Temp = BitShift($gcai_Base64Table14[BitShift(Asc($as_CypherText[$i_Count + 0]), -7) + Asc($as_CypherText[$i_Count + 1]) ], -12) _
         + $gcai_Base64Table14[BitShift(Asc($as_CypherText[$i_Count + 2]), -7) + Asc($as_CypherText[$i_Count + 3]) ]
$s_Out &= StringLeft(BinaryString(BitShift($i_Temp, 16) + BitAND($i_Temp, 65280) + BitShift(BitAND($i_Temp, 255), -16)), 3)
Updates include:

Only 2 global variables - the other 2 take .001 seconds to make

New decoder - a lot faster. Thanks blindwig!

More UDF friendly code - i think the base64 stuff would go in the string.au3 file

Grab new code from my second post

EDIT

Give props to blindwig

Edited by mikeytown2
Link to comment
Share on other sites

Oh and now i'm completely lost on the decoder but it works so good job!

Let me see if I can walk you through it:

First we need to get the values of the 4 bytes we want to decode (just like before):

;read 4 characters
    $ai_Bytes[0] = Asc($as_CypherText[$i_Count + 0])
    $ai_Bytes[1] = Asc($as_CypherText[$i_Count + 1])
    $ai_Bytes[2] = Asc($as_CypherText[$i_Count + 2])
    $ai_Bytes[3] = Asc($as_CypherText[$i_Count + 3])

Now to do a 16-bit look-up I would combine the byes into pairs and look them up. But since I only want to do a 14-bit look-up, I have to combine the bytes as 7-bit bytes:

;Setup for the lookup
    $ai_Bytes[4] = BitShift(BitAND($ai_Bytes[0],127),-7) + BitAND($ai_Bytes[1],127)
    $ai_Bytes[5] = BitShift(BitAND($ai_Bytes[2],127),-7) + BitAND($ai_Bytes[3],127)

But since I know that these bytes have already been filtered against the base64 alphabet, and I know that the alphabet is only 7-bits maximum, I can eliminate the BitAND() calls to force the 8-bits into 7-bits. So I write the code like this:

;Setup for the lookup
    $ai_Bytes[4] = BitShift($ai_Bytes[0],-7) + $ai_Bytes[1]
    $ai_Bytes[5] = BitShift($ai_Bytes[2],-7) + $ai_Bytes[3]

Now to do the 14-bit lookups:

$gcai_Base64Table14[$ai_Bytes[4]]
$gcai_Base64Table14[$ai_Bytes[5]]

Each of these look-ups will return a 12-bit number, so I might as well combine them into a single 24-bit number:

;Do the lookup, setup for the BinaryString conversion
    $ai_Bytes[6] = BitShift($gcai_Base64Table14[$ai_Bytes[4]], -12) + $gcai_Base64Table14[$ai_Bytes[5]]

It would be nice if I could pass this 24-bit number to a function that would print it as 3 8-bit bytes. BinaryString() does this, except that it takes 4 bytes ABCD and returns them as DCBA. And it always returns a multiple of 4, so if I pass it ABC it will return CBA0. So I need to swap A and C and then strip off the 0.

First I swap the 8 highest bits with the 8 lowest bits:

$ai_Bytes[7] = BitShift($ai_Bytes[6], 16) + BitAND($ai_Bytes[6], 65280) + BitShift(BitAND($ai_Bytes[6], 255),-16)

Then I use StringLeft() to grab the first 3 bytes (effectively trimming the 4th byte, the '0'):

;Write the BinaryString
    $s_Out &= StringLeft(BinaryString($ai_Bytes[7]),3)

And I'm done!

But something I discovered is that AutoIt takes a long time to assign and call all these variables, so I combined as many functions as I could and ended up writing the whole thing in just 2 lines:

$i_Temp = BitShift($gcai_Base64Table14[BitShift(Asc($as_CypherText[$i_Count + 0]), -7) + Asc($as_CypherText[$i_Count + 1]) ], -12) _
         + $gcai_Base64Table14[BitShift(Asc($as_CypherText[$i_Count + 2]), -7) + Asc($as_CypherText[$i_Count + 3]) ]
$s_Out &= StringLeft(BinaryString(BitShift($i_Temp, 16) + BitAND($i_Temp, 65280) + BitShift(BitAND($i_Temp, 255), -16)), 3)
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...