Jump to content

Complex ToBase conversion...


KaFu
 Share

Recommended Posts

Hiho,

I guess I'm out of school for too long :idea:. I'm looking for an UDF to enumerate all valid variable and function names, with a resulting expression as short as possible (this is key!).

I think the enumaration of the valid Var names can be done with this UDF by defining a respective 37 elements long $_BCaddlNums array.

But what kills me is the change in base in the function names. Anyone got a clue about this?

Vars

Valid characters (37)

  • 0-9
  • _
  • a-z
Funcs

Valid first character (27)

  • _
  • a-z
Valid following characters (37)

  • 0-9
  • _
  • a-z

I would expect the following enumeration for Vars:

...24=x, 25=y, 26=z, 27=00, 28=01, 29=02,... zx, zz, 000, 001...

And for Funcs something like this:

...24=x, 25=y, 26=z, 27=_0, 28=_1, 29=_2,... zx, zz, _00, _01...

And all that in a function :), for something like this:

for $i = 0 to 1000
consolewrite(_valid_var_name($i) & @crlf)
consolewrite(_valid_func_name($i) & @crlf)
next
func _valid_var_name($i)
endfunc
func _valid_func_name($i)
endfunc

Best Regards

Link to comment
Share on other sites

Thanks for the tip, will do now :)... as I just found a dirty workaround :idea: (to conversion, no clue how to convert back, but that's sufficient to me)...

#cs ----------------------------------------------------------------------------

 AutoIt Version: 3.3.6.0
 Author:         KaFu

 Script Function:
    TurboBooster for your scripts...
    ... or an invitation to help me to improve it...

#ce ----------------------------------------------------------------------------

Global $_BCaddlNums[37] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, "_", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
Dim $aFuncs_Skip[24]=["do", "in","if","or","to","abs","and","asc","chr","cos","dec","dim","exp","for","hex","int","log","mod","not","opt","ptr","run","sin","tan"]

$sOutput = ""

; Valid Var Names
For $i = 0 To 30000
    $sFuncsNew = "$" & _BasetoBase($i +1 , 10, 37)
    $sOutput &= $sFuncsNew & "= 1" & @crlf
    if mod($i,1000) = 0 then ConsoleWrite("Var " & $i & @crlf)
    ;ConsoleWrite("Var " & $i & @tab & @tab & $sFuncsNew & @crlf)
Next

; Valid Func Names
$var_offset = 1
For $i = 10 To 30000
    $sFuncsNew = _BasetoBase($i + $var_offset, 10, 37)
    $sLeft = StringLeft($sFuncsNew,1)
    If  Asc($sLeft) < 58 Or _StringInArray($sFuncsNew,$aFuncs_Skip) Then
        $i -= 1
        $var_offset += 1
        ContinueLoop
    EndIf
    if mod($i,1000) = 0 then ConsoleWrite("Func " & $i & @crlf)
    $sOutput &= "Func " & $sFuncsNew & "()" & @crlf & "EndFunc" & @crlf
    ;ConsoleWrite("Func " & $i & @tab & @tab & $sFuncsNew & @crlf)
Next

ClipPut($sOutput)

; =======================
; StringInArray
; =======================
Func _StringInArray($string, $array)
    If IsArray($array) Then
        For $i = 0 To UBound($array) - 1
            If $array[$i] = $string Then Return True
        Next
    EndIf
    Return False
EndFunc   ;==>_StringInArray

; =======================
; ToBase Functions
; =======================

; http://www.autoitscript.com/forum/index.php?showtopic=81189
; james3mg

;   Function:   _ToDec()
;   Author:     james3mg
;   Summary:    Converts a number represented in a string from any positive base less than base 63 into decimal (base 10)
;               Numbers greater than 9 should be represented by English letters in ascending alphabetical order; capitols first.  CASE SENSITIVE!
;               See the $_BCaddlNums array at the top of this file for an ordered ranking of digits.
;               Negative bases are not supported; they confuse me too much in practice, though the theory is elegant.
;               Fractional bases (i.e. base 10.25) are correctly supported, but I can't imagine a useful application of them.
;   Arguments:
;               $_BCnum     The string to convert to decimal (base 10) (required)
;               $_BCbase    The base to convert $_BCnum from (optional: default is 16)
;   Return value:
;               Success:    A string representing the number expressed in the base requested
;               Failure:    Returns a blank string ("") and sets @error as following:
;                           @error=1    number string contains non-alphanumeric digits
;                           @error=2    invalid base number provided
;                           @error=3    a digit provided was not in the base provided (for instance a 2 occured in an allegedly binary string)

Func _ToDec($_BCnum, $_BCbase = 16);converts the string from any positive base less than 63 into decimal (base 10)
    If $_BCbase = -1 Or $_BCbase = Default Then $_BCbase = 16
    If Not IsNumber($_BCbase) Or $_BCbase > 62 Or $_BCbase < 1 Then Return SetError(2)
    Local $_BCsign = ""
    If StringLeft($_BCnum, 1) = "-" Then
        $_BCsign = "-"
        $_BCnum = StringTrimLeft($_BCnum, 1)
    EndIf
    Local $_SplitNum = StringSplit(String($_BCnum), "."), $_SplitNumInt = StringSplit($_SplitNum[1], ""), $_i, $_n, $_BCreturnVal = ""
    If Not StringIsAlNum($_SplitNum[1]) Then Return SetError(1);number contains non-alphanumeric digits (to the left of the decimal)
    If $_SplitNum[0] <> 1 Then
        If $_SplitNum[0] <> 2 Then Return SetError(1);number contains non-alphanumeric digits (in this particular case, too many decimal points)
        If Not StringIsAlNum($_SplitNum[2]) Then Return SetError(1);number contains non-alphanumeric digits (to the right of the decimal)
        Local $_SplitNumDec = StringSplit($_SplitNum[2], "")
    EndIf
    ;from here, we can guarentee the number passed the function could be a number in some base
    For $_i = 1 To $_SplitNumInt[0]
        For $_n = 10 To $_BCbase - 1
            If $_SplitNumInt[$_i] == $_BCaddlNums[$_n] Then
                $_SplitNumInt[$_i] = $_n
                ExitLoop
            EndIf
        Next
        If Not StringIsInt($_SplitNumInt[$_i]) Or $_SplitNumInt[$_i] > $_BCbase Then Return SetError(3);digit out of base range (for instance, 3 included in string given as binary)
    Next
    If $_SplitNum[0] = 2 Then;if there was a decimal point
        For $_i = 1 To $_SplitNumDec[0]
            For $_n = 10 To $_BCbase - 1
                If $_SplitNumDec[$_i] == $_BCaddlNums[$_n] Then
                    $_SplitNumDec[$_i] = $_n
                    ExitLoop
                EndIf
            Next
            If Not StringIsInt($_SplitNumDec[$_i]) Or $_SplitNumDec[$_i] > $_BCbase Then Return SetError(3);digit out of base range
        Next
    EndIf

    For $_i = 1 To $_SplitNumInt[0]
        $_BCreturnVal += $_BCbase ^ ($_i - 1) * $_SplitNumInt[$_SplitNumInt[0] - ($_i - 1)]
    Next
    If $_SplitNum[0] = 2 Then;if there was a decimal point
        For $_i = 1 To $_SplitNumDec[0]
            $_BCreturnVal += $_BCbase ^ (0 - $_i) * $_SplitNumDec[$_i]
        Next
    EndIf
    Return SetError(0, 0, $_BCsign & $_BCreturnVal)
EndFunc   ;==>_ToDec

;   Function:   _ToBase()
;   Author:     james3mg
;   Summary:    Converts a number represented in a string from any positive base less than base 63 into decimal (base 10)
;               Numbers greater than 9 are represented by English letters in ascending alphabetical order; capitols first.  CASE SENSITIVE!
;               See the $_BCaddlNums array at the top of this file for an ordered ranking of digits.
;               Decimals are supported, but remember that 1.1 in binary is represented by 1.5 in decimal, since .1 is HALF of a whole number in binary! (Doubling .1 in binary will give you 1.0)
;   !!!         NOTE: the output of this function is a STRING!  Do not expect AutoIt to know how to perform mathematical operations on the result in the correct base!
;               If you want to perform such operations, convert into decimal, run the operation, and convert back.
;               Negative bases are not supported; they confuse me too much in practice, though the theory is elegant.
;               Fractional bases (i.e. base 10.25) are correctly supported, but I can't imagine a useful application of them.
;   Arguments:
;               $_BCnum     The string to convert to the specified base (required)
;               $_BCbase    The base to convert $_BCnum into (optional: default is 16)
;               $_BClimit   The maximum number of decimal points returned (numbers to the RIGHT of the decimal point).  (optional: default is 12)  Note this has no effect on the number of digits to the left of the point (or if there is no point).
;   Return value:
;               Success:    A string representing the number expressed in the base requested
;               Failure:    Returns a blank string ("") and sets @error as following:
;                           @error=1    string is not a decimal (base 10) number
;                           @error=2    invalid base number provided

Func _ToBase($_BCnum, $_BCbase = 16, $_BClimit = 12);converts the decimal (base 10) string into any positive base less than 63
    If $_BCbase = -1 Or $_BCbase = Default Then $_BCbase = 16
    If $_BClimit = -1 Or $_BClimit = Default Then $_BCbase = 12
    If Not IsNumber($_BCbase) Or $_BCbase > 62 Or $_BCbase < 1 Then Return SetError(2)
    Local $_BCsign = ""
    If StringLeft($_BCnum, 1) = "-" Then
        $_BCsign = "-"
        $_BCnum = StringTrimLeft($_BCnum, 1)
    EndIf
    Local $_SplitNum = StringSplit(String($_BCnum), "."), $_i, $_BCreturnVal = ""
    If Not StringIsDigit($_SplitNum[1]) Then Return SetError(1);not a decimal number
    If $_SplitNum[0] <> 1 Then
        If $_SplitNum[0] <> 2 Then Return SetError(1);not a decimal number (too many decimal points)
        If Not StringIsDigit($_SplitNum[2]) Then Return SetError(1);not a decimal number right of the decimal point
    EndIf
    ;now we can assume the number is valid
    Local $_MaxExp = Floor(Log($_BCnum) / Log($_BCbase));find out the digit length in destination base (actual digit length is 1 greater than this number, since 1 has a length of 1 and is 10^0)
    For $_i = $_MaxExp To 0 Step -1;loop through each digit to find the highest multiplier without going over
        $_BCreturnVal &= $_BCaddlNums[Floor($_BCnum / ($_BCbase ^ $_i))]
        $_BCnum -= Floor($_BCnum / ($_BCbase ^ $_i)) * $_BCbase ^ $_i
    Next
    If $_SplitNum[0] = 1 Then Return SetError(0, 0, $_BCsign & $_BCreturnVal)
    ;if you get here, there's still decimal points to tally
    $_BCreturnVal &= "."
    For $_i = -1 To $_BClimit * - 1 Step -1
        $_BCreturnVal &= $_BCaddlNums[Floor($_BCnum / ($_BCbase ^ $_i))]
        $_BCnum -= Floor($_BCnum / ($_BCbase ^ $_i)) * $_BCbase ^ $_i
        If $_BCnum = 0 Then ExitLoop;don't add trailing zeros if an exact conversion is found
    Next
    Return SetError(0, 0, $_BCsign & $_BCreturnVal)
EndFunc   ;==>_ToBase

;   Function:   _BaseToBase()
;   Author:     james3mg
;   Summary:    Converts a number represented in a string from any positive base less than base 63 into decimal (base 10)
;               Numbers greater than 9 are represented by English letters in ascending alphabetical order; capitols first.  CASE SENSITIVE!
;               See the $_BCaddlNums array at the top of this file for an ordered ranking of digits.
;               Decimals are supported, but remember that 1.1 in binary is represented by 1.5 in decimal, since .1 is HALF of a whole number in binary!
;   !!!         NOTE: the output of this function is a STRING!  Do not expect AutoIt to know how to perform mathematical operations on the result in the correct base!
;               If you want to perform such operations, convert into decimal, run the operation, and convert back.
;               Negative bases are not supported; they confuse me too much in practice, though the theory is elegant.
;               Fractional bases (i.e. base 10.25) are correctly supported, but I can't imagine a useful application of them.
;   Arguments:
;               $_BCnum         The string to convert to the specified base (required)
;               $_BCbaseOrig    The base to convert $_BCnum from (optional: default is 16)
;               $_BCbaseFinal   The base to convert $_BCnum into (optional: default is 2)
;               $_BClimit       The maximum number of decimal points returned (numbers to the RIGHT of the decimal point).  (optional: default is 12)  Note this has no effect on the number of digits to the left of the point (or if there is no point).
;   Return value:
;               Success:        A string representing the number expressed in the base requested
;               Failure:        Returns a blank string ("") and sets @error as following:
;                               @error=1    number string contains non-alphanumeric digits
;                               @error=2    invalid input base number provided
;                               @error=3    a digit provided was not in the base provided (for instance a 2 occured in an allegedly binary string)
;                               @error=4    unknown error (this should never happen- it means that _ToDec misfired somehow
;                               @error=5    invalid output base number provided

Func _BaseToBase($_BCnum, $_BCbaseOrig = 16, $_BCbaseFinal = 2, $_BClimit = 12);errors 1-3 match errors 1-3 in _ToDec(); errors 4-5 match errors 1-2 in _ToBase()
    Local $_BCconv = _ToDec($_BCnum, $_BCbaseOrig)
    If @error Then Return SetError(@error)
    $_BCconv = _ToBase($_BCconv, $_BCbaseFinal, $_BClimit)
    If @error Then Return SetError(@error + 3)
    Return SetError(0, 0, $_BCconv)
EndFunc   ;==>_BaseToBase
Edited by KaFu
Link to comment
Share on other sites

Hi KaFu,

You won't be able to go very far this way. Enumerating _all_ variables/function is not possible in practice since the namespace is essentially unbounded AFAWK. Anyway, using a map between variables (or functions) namespace in lexicographic order and signed 64-bit integers will limit you to variables having 11 (and some having 12) characters ($ aside) and functions having 12 (and some having 13) characters. FP errors in the base conversion could even cause errors well before reaching that point.

Then I question the advantage of such mapping. Since you're after "a resulting expression as short as possible", it isn't obvious to me that even the internal 64-bit representation of variables gains much over the natural name string.

Finally, your functions miss at least one variable and one function. Also the function-name is offset by 10 unduly.

All in all, I'd say you can stay on your chair as there's no need to take a walk: you're already out of yourself :idea:

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Hi KaFu,

You won't be able to go very far this way.

In theory you're right of course, but in practice? Updated the above example slightly by adding a short array of functions $aFuncs_Skip to exclude, as these names are reserved for au internal functions. But otherwise it produces a namespace up to 30.000 vars and func names. That should be sufficient for 99.99% of all scripts :idea:. It's meant to pre-process a script before final compilation (see my TurboBooster example).

Then I question the advantage of such mapping. Since you're after "a resulting expression as short as possible", it isn't obvious to me that even the internal 64-bit representation of variables gains much over the natural name string.

My goal is to shorten the source in byte as much as possible. As au3 is a scripting language and every line of source is parsed during runtime I assumed this should also reduce the runtime. Hmmm, I'm not quite grasping what you're reffering to with the 64bit representation :), maybe you could elaborate?

Finally, your functions miss at least one variable and one function. Also the function-name is offset by 10 unduly.

Which ones? The offset of 10 in the func names is intentionally, as the first 10 elements of the namespace are illegal characters for func names. Edited by KaFu
Link to comment
Share on other sites

Ah, I was thinking of a completely different usage.

Using 0..9 _ a..z is quickly more compact than the decimal mapping.

Local $_ = 'abc'

$_ = $_ & 'ghi'
ConsoleWrite($_ & @LF)

Local $_ = 'def'

Func _($_)
    $_+=1
EndFunc

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Link to comment
Share on other sites

Oops, sorry, I meant $0 = 'abc' (missed as well).

And about compactness of names?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

My goal is to shorten the source in byte as much as possible. As au3 is a scripting language and every line of source is parsed during runtime I assumed this should also reduce the runtime.

I think you've left the rails already. The AutoIt interpreter does not read the actual variable and function name stings during execution. There is a one time conversion to byte code before execution actually begins.

I don't think you can make any measurable change in script performance based on this.

:idea:

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

$aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa = 100
$bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb = 100
$a = 100
$b = 100

For $y = 0 To 10
    $timer = TimerInit()
    For $iiiiiiiiiiiiiiiiiiiiiiiiii = 0 To 1000000
        $ccccccccccccccccccccccccccccccccc = $aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ^ $bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
    Next
    ConsoleWrite("Long Vars" & @TAB & TimerDiff($timer) & @CRLF)

    $timer = TimerInit()
    For $i = 0 To 1000000
        $c = $a ^ $b
    Next
    ConsoleWrite("Short Vars" & @TAB & TimerDiff($timer) & @CRLF)
Next

and it decreases the exe size too.

Link to comment
Share on other sites

Well... rats. :)

I was a little surprised that made a difference in SciTE, and a lot surprised it survived compiling the script.

:idea:

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

That's to be expected and I agree with Kafu even if I didn't run any test. I believe the optimization is a bit of a last resort anyway. By the time you have your optimizer certified running in all cases, you could afford a 5% faster CPU... which would be enough to void the offset.

But what I really question is that the decimal expansion of the integers mapped to lexicographically increasing variable name expand quickly faster than the natural ASCII name and never gains anything (in script length):

1 letter variables map to 0 to 36 and StringLen($varname) > 1 for 10 to 36, decimal expansion is longer for 27 variables

2 letters variables map to 37 to 1368, decimal expansion is longer for 1368 - 99 = 1269 variables

3 letters variables map to 1369 to 50652, decimal expansion is longer for 50652 - 1368 = 49221 variables

4 letters variables map to 50653 to 1874160, decimal expansion is longer for all variables in this range as well

All longer variable names map as well to much longer decimal expansion integer mapping.

In short, using decimal mapping is _never_ shorter than corresponding ASCII names. The initial $ is present everywhere so it can be ignored in this comparison.

So if you insist on getting the (last) drop of sideways optimization at the level of source script compactness, you should instead scan the source file(s), make a count of use of every variable, dissociate global and local variables (so you can re-use $0, $1 [short names] locally as much as possible), then reassign global variables with another run of $0, $1, .. $z, $0a, ..., by assigning shortest names to the most used variables. Then do the same for function names, taking usage count as assignment order.

Edit: I goofed pityfully first time in copying the comparison figures, but the point remains. I also wanted to see how much it gains by running your script and shortening long variables to more reasonable length. I agree it makes a difference, but will that be worth in practice? Do we use that much longish names?

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

By the time you have your optimizer certified running in all cases, you could afford a 5% faster CPU... which would be enough to void the offset.

True, I consider it something more like an intellectual challenge :(.

scan the source file(s), make a count of use of every variable

Thought about that yesterday already. I think it's even better to multiply count of var * length to determine the characters used by this var in the overall script and then replace vars by sorting this value in descending order.

dissociate global and local variables (so you can re-use $0, $1 [short names] locally as much as possible), then reassign global variables with another run of $0, $1, .. $z, $0a, ..., by assigning shortest names to the most used variables.

That's an intriguing new idea :idea:, thanks! Will do some research on that one too.

Do we use that much longish names?

:) the longer and more complex the scripts, the longer and more complex the var and func names are, at least in my code. I agree it doesn't make a huge difference in short script, there I tend to use i's and a's and b's anyhow :)...
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...