Float to integer - but integer get to big

Simpel · February 6, 2018

Hi.

I have to filter audio samples. The coefficents are very small but with a lot of decimal. Because of errors while calculating with floating point numbers I want to convert the whole calculation into integer:

; variables just example for an audio sample it's samples before and before and one result before and before
Local $xn = 32767, $xn_1 = 3276, $xn_2 = 327, $yn_1 = 3276, $yn_2 = 327 

Local $yn = $xn - 2 * $xn_1 + $xn_2 + 1.99004745483398 * $yn_1 - 0.99007225036621 * $yn_2
ConsoleWrite($yn & @CRLF)

$yn = ($xn - 2 * $xn_1 + $xn_2 + 199004745483398 * $yn_1 - 99007225036621 * $yn_2) / 1e+028
ConsoleWrite($yn & @CRLF)

Console output is:

32737.6418361664
6.19564183616663e-011

The upper one seems correct (but with the normal errors of floating point calculation in autoit). The lower one is not a possible solution I guess.

Anyone a hint how to calculate better? Regards, Conrad

P.S. How big is the calculation error with floats in this case?

Zedna · February 6, 2018

Look here to see AutoIt's limits

https://www.autoitscript.com/autoit3/docs/intro/lang_datatypes.htm

https://www.autoitscript.com/autoit3/docs/appendix/LimitsDefaults.htm

In case of need for some more data types (double, ...) look here:

https://www.autoitscript.com/autoit3/docs/functions/DllStructCreate.htm

You could Use DllStructCreate + DllStructSetData and then maybe call some API math function by DllCall and pass these structures to get some more complicated math operations, but this only my rough idea, so take it easy :-)

Mat · February 8, 2018

@Simpel - your maths is squiffy. You were dividing everything by 1e28 (which should have been 1e14) including the terms that weren't scaled up.

The two possible ways to do it are to scale everything up to be 14 orders of magnitude higher, or to only divide the parts that have been scaled up. I'd actually suggest using the 3rd version below, which leaves the value as an integer, scaled up by 1e14, as then no precision is lost in the conversion back to floating point.

This style of calculation is termed "fixed point" as we keep all the values scaled up by a fixed amount. This gives you a smaller range than floating point calculations, but greater precision (as we don't need to use bits to store the exponent).

Local $xn = 32767, $xn_1 = 3276, $xn_2 = 327, $yn_1 = 3276, $yn_2 = 327

Local $yn = $xn - 2 * $xn_1 + $xn_2 + 1.99004745483398 * $yn_1 - 0.99007225036621 * $yn_2
ConsoleWrite($yn & @CRLF)

$yn = ( ($xn - 2 * $xn_1)*100000000000000 + $xn_2 + 199004745483398 * $yn_1 - 99007225036621 * $yn_2) / 100000000000000
ConsoleWrite($yn & @CRLF)

$yn = $xn - 2 * $xn_1 + ($xn_2 + 199004745483398 * $yn_1 - 99007225036621 * $yn_2) / 100000000000000
ConsoleWrite($yn & @CRLF)

$yn = ($xn - 2 * $xn_1)*100000000000000 + ($xn_2 + 199004745483398 * $yn_1 - 99007225036621 * $yn_2)
ConsoleWrite($yn & @CRLF)

ConsoleWrite(Log($yn)/Log(2) & @LF)

You aren't that far away from the limits in this case, about 62 bits of precision (the last line of code above is the bit calculation). The limit will be hit depending on the value of $yn_1, if it was 2x higher then you might be in trouble.

Tbh, I'm surprised to see such a big error in the floating point calculation.

Simpel · February 8, 2018

Thanks @Mat,

your right. I became fuzzy with the factors. Your solution works well. But as you stated if values getting bigger is not working. As audio can come with 16, 24 or 32 bit values can increase tremendously. My example used values from an 16bit version. I changed values for 16bit in another sequence and even then it's not working anymore. Second example now uses 24bit audio sample values and problem gets more worse. Additional there is a second filter stage needing 5 times a high value factor:

;~ Local $xn = 327, $xn_1 = 3276, $xn_2 = 32767, $yn_1 = 3276, $yn_2 = 32767 ; 16bit
Local $xn = 83886, $xn_1 = 838860, $xn_2 = 8388607, $yn_1 = 838860, $yn_2 = 8388607 ; 24bit

ConsoleWrite(@CRLF & "filter stage 1" & @CRLF)
Local $yn = 1.53512485958697 * $xn - 2.69169618940638 * $xn_1 + 1.19839281085285 * $xn_2 + 1.69065929318241 * $yn_1 - 0.73248077421585 * $yn_2
ConsoleWrite($yn & @CRLF)

$yn = (153512485958697 * $xn - 269169618940638 * $xn_1 + 119839281085285 * $xn_2 + 169065929318241 * $yn_1 - 73248077421585 * $yn_2) / 100000000000000
ConsoleWrite($yn & @CRLF)

ConsoleWrite(@CRLF & "filter stage 2" & @CRLF)
$yn = $xn - 2 * $xn_1 + $xn_2 + 1.99004745483398 * $yn_1 - 0.99007225036621 * $yn_2
ConsoleWrite($yn & @CRLF)

$yn = $xn - 2 * $xn_1 + ($xn_2 + 199004745483398 * $yn_1 - 99007225036621 * $yn_2) / 100000000000000
ConsoleWrite($yn & @CRLF & @CRLF)

Is there another solution? Regards, Conrad

Edited February 8, 2018 by Simpel
added second filter stage

Mat · February 9, 2018

That depends on the range of the outputs.

The way to analyse this properly is to look at extremes. For each variable, look at what the largest and smallest possible value is (the domain of the equation). Then from there you can work out what combination of values get you the largest and smallest result (and which intermediate steps are likely to overflow).

If the range of possible outputs exceeds 2^64 (i.e. the range of answers ends up as zero to 2^80 or something), then the options are either a bignum UDF (with reduced performance), or a loss of accuracy (either floating point, or fixed point with a smaller factor).

If the range of outputs is less than 2^64, but just happens to be at the top end of that, then you can work on offsetting it back into range. There is also clever stuff you can do with the numbers to stop you from going in and out of range during intermediate steps, just by forcing the order of operations. For example, lets say we were working on unsigned 8 bit values to try and do the sum "17x-15y" and we know the answer will fit into an 8 bit value. It's obvious that certain values of x and y will cause the operation to overflow on 17x, then come back into range when we subtract 15y. If we know that x>y rewrite the operation as "15(x-y)+2x". It's a trivial example. but that's the kind of way you have to approach the problem. Like I said, first step is looking at the domain of the equation. With that information, look at where you're overflowing and then we can look at how to correct that.

LarsJ · February 10, 2018

#AutoIt3Wrapper_Au3Check_Parameters=-d -w- 1 -w 2 -w 3 -w 4 -w 5 -w 6
;#AutoIt3Wrapper_UseX64=y

#include "Variant.au3"

Opt( "MustDeclareVars", 1 )

Example1()
Example2()

Func Example1()
  Local $xn = 32767, $xn_1 = 3276, $xn_2 = 327, $yn_1 = 3276, $yn_2 = 327
  Local $yn = $xn - 2 * $xn_1 + $xn_2 + 1.99004745483398 * $yn_1 - 0.99007225036621 * $yn_2
  ConsoleWrite( $yn & @CRLF )
EndFunc

Func Example2()
  Local $xn = "32767", $xn_1 = "3276", $xn_2 = "327", $yn_1 = "3276", $yn_2 = "327", $c0 = "2", $c1 = "1.99004745483398", $c2 = "0.99007225036621", $yn = 0, $r = 0
  Local $txn   = StrToDec( $xn ),   $pxn   = DllStructGetPtr( $txn )
  Local $txn_1 = StrToDec( $xn_1 ), $pxn_1 = DllStructGetPtr( $txn_1 )
  Local $txn_2 = StrToDec( $xn_2 ), $pxn_2 = DllStructGetPtr( $txn_2 )
  Local $tyn_1 = StrToDec( $yn_1 ), $pyn_1 = DllStructGetPtr( $tyn_1 )
  Local $tyn_2 = StrToDec( $yn_2 ), $pyn_2 = DllStructGetPtr( $tyn_2 )
  Local $tc0   = StrToDec( $c0 ),   $pc0   = DllStructGetPtr( $tc0 )
  Local $tc1   = StrToDec( $c1 ),   $pc1   = DllStructGetPtr( $tc1 )
  Local $tc2   = StrToDec( $c2 ),   $pc2   = DllStructGetPtr( $tc2 )
  Local $tyn   = StrToDec( $yn ),   $pyn   = DllStructGetPtr( $tyn )
  Local $tr    = StrToDec( $r ),    $pr    = DllStructGetPtr( $tr )

  ;Local $yn = $xn - 2   * $xn_1 + $xn_2 + 1.99004745483398 * $yn_1 - 0.99007225036621 * $yn_2
  ;Local $yn = $xn - $c0 * $xn_1 + $xn_2 + $c1              * $yn_1 - $c2              * $yn_2
  VarDecMul( $pc2,   $pyn_2, $pr )  ; $r  = $c2   * $yn_2
  VarDecMul( $pc1,   $pyn_1, $pyn ) ; $yn = $c1   * $yn_1
  VarDecSub( $pyn,   $pr,    $pyn ) ; $yn = $yn   - $r    =                             $c1 * $yn_1 - $c2 * $yn_2
  VarDecAdd( $pxn_2, $pyn,   $pyn ) ; $yn = $xn_2 + $yn   =                     $xn_2 + $c1 * $yn_1 - $c2 * $yn_2
  VarDecMul( $pc0,   $pxn_1, $pr )  ; $r  = $c0   * $xn_1
  VarDecSub( $pxn,   $pr,    $pr )  ; $r  = $xn   - $r    = $xn - $c0 * $xn_1
  VarDecAdd( $pr,    $pyn,   $pyn ) ; $yn = $r    + $yn   = $xn - $c0 * $xn_1 + $xn_2 + $c1 * $yn_1 - $c2 * $yn_2

  $yn = DecToStr( $pyn )
  ConsoleWrite( $yn & @CRLF )
EndFunc

Func StrToDec( $str )
  Local $tStr = DllStructCreate( $tagVARIANT )
  Local $pStr = DllStructGetPtr( $tStr )
  $tStr.vt   = $VT_BSTR
  $tStr.data = SysAllocString( $str )
  VariantChangeType( $pStr, $pStr, 0, $VT_DECIMAL )
  Local $tDec = DllStructCreate( $tagDEC, $pStr )
  Local $l = StringInStr( $str, "." )
  $tDec.scale = $l ? StringLen( $str ) - $l : 0
  Return $tStr
EndFunc

Func VarDecAdd( $pDecLeft, $pDecRight, $pDecResult )
  Local $aRet = DllCall( "OleAut32.dll", "long", "VarDecAdd", "ptr", $pDecLeft, "ptr", $pDecRight, "ptr", $pDecResult )
  If @error Then Return SetError(1,0,1)
  Return $aRet[0]
EndFunc

Func VarDecDiv( $pDecLeft, $pDecRight, $pDecResult )
  Local $aRet = DllCall( "OleAut32.dll", "long", "VarDecDiv", "ptr", $pDecLeft, "ptr", $pDecRight, "ptr", $pDecResult )
  If @error Then Return SetError(1,0,1)
  Return $aRet[0]
EndFunc

Func VarDecMul( $pDecLeft, $pDecRight, $pDecResult )
  Local $aRet = DllCall( "OleAut32.dll", "long", "VarDecMul", "ptr", $pDecLeft, "ptr", $pDecRight, "ptr", $pDecResult )
  If @error Then Return SetError(1,0,1)
  Return $aRet[0]
EndFunc

Func VarDecSub( $pDecLeft, $pDecRight, $pDecResult )
  Local $aRet = DllCall( "OleAut32.dll", "long", "VarDecSub", "ptr", $pDecLeft, "ptr", $pDecRight, "ptr", $pDecResult )
  If @error Then Return SetError(1,0,1)
  Return $aRet[0]
EndFunc

Func DecToStr( $pDec )
  Local $tStr = DllStructCreate( $tagVARIANT )
  VariantChangeType( DllStructGetPtr( $tStr ), $pDec, 0, $VT_BSTR )
  Return SysReadString( $tStr.data )
EndFunc

Variant.7z

Edited February 10, 2018 by LarsJ

Simpel · February 10, 2018

Hi @Mat,

I found out that there isn't such a big error in floting point operation. It's an issue with your example. I believe you did an error with the variables too. You didn't treat $xn_2 right. Your example should look like this:

Local $xn = 32767, $xn_1 = 3276, $xn_2 = 327, $yn_1 = 3276, $yn_2 = 327

Local $yn = $xn - 2 * $xn_1 + $xn_2 + 1.99004745483398 * $yn_1 - 0.99007225036621 * $yn_2
ConsoleWrite($yn & @CRLF)

$yn = ( ($xn - 2 * $xn_1 + $xn_2) * 100000000000000 + 199004745483398 * $yn_1 - 99007225036621 * $yn_2) / 100000000000000
ConsoleWrite($yn & @CRLF)

$yn = $xn - 2 * $xn_1 + $xn_2 + (199004745483398 * $yn_1 - 99007225036621 * $yn_2) / 100000000000000
ConsoleWrite($yn & @CRLF)

$yn = ($xn - 2 * $xn_1 + $xn_2) * 100000000000000 + (199004745483398 * $yn_1 - 99007225036621 * $yn_2)
ConsoleWrite($yn & @CRLF)

And than all results will look the same.

@LarsJ: I tested your solution. I had to generalize it a bit because of filter stage 2 with 5 coefficients. Then I made a speed test. I looped it 100.000 times. Normal calculation took 666ms and your solution (after generalizing) took 47848 with all possible local declares have been declared global. But thank you anyway.

As I see now that the error calculating with these decimal are not a big problem I will stay with that. Especially I have to do this calculation for 48000 samples of a second audio file. So speed is most important.

Thanks all, Conrad

Edited February 10, 2018 by Simpel
No line break possible anymore - so sending and editing for the rest of the post

czardas · February 10, 2018

This is just a test. The last couple of digits vary from @Mat's example: probably because of operator64 internal corrections, but finding out why would require further analysis. It looks as though no overflow occurred, but some adjustments were still needed. This result should be precise for all 19 digits (printed at the end).

#include 'operator64.au3'

; variables just example for an audio sample it's samples before and before and one result before and before
Local $xn = 32767, $xn_1 = 3276, $xn_2 = 327, $yn_1 = 3276, $yn_2 = 327

$xn = _Multiply64($xn, 100000000000000)
$xn_1 = _Multiply64($xn_1, 100000000000000)
$xn_2 = _Multiply64($xn_2, 100000000000000)

$yn = _Divide64(_Subtract64(_Add64(_Subtract64($xn, _Multiply64(2, $xn_1)), _Multiply64(199004745483398, $yn_1)), _Multiply64(99007225036621, $yn_2)), 1e+014)
ConsoleWrite($yn & @CRLF) ; ==> 32410.6418361664
                          ; OR ==> 3.241064183616636781 * 10^4

~~The discrepancy with the same calculation using floats is disturbing.~~

Edited February 12, 2018 by czardas

Simpel · February 11, 2018

Hi @czardas,

you forgot $xn_2:

; variables just example for an audio sample it's samples before and before and one result before and before
Local $xn = 32767, $xn_1 = 3276, $xn_2 = 327, $yn_1 = 3276, $yn_2 = 327, $yn

$xn = _Multiply64($xn, 100000000000000)
$xn_1 = _Multiply64($xn_1, 100000000000000)
$xn_2 = _Multiply64($xn_2, 100000000000000)

$yn = _Divide64(_Subtract64(_Add64(_Add64(_Subtract64($xn, _Multiply64(2, $xn_1)), $xn_2), _Multiply64(199004745483398, $yn_1)), _Multiply64(99007225036621, $yn_2)), 1e+014)
ConsoleWrite("Operator64: " & $yn & @CRLF) ; ==> 32410.6418361664
                          ; OR ==> 3.241064183616636781 * 10^4

And now are the results identic to floating point. Thanks for the hint. So I can handle my calculations with floating point, as the error isn't that much.

Conrad

czardas · February 12, 2018

16 hours ago, Simpel said:

you forgot $xn_2:

Ah, so I did. My result was so close to Mat's and that's why I wrongly assumed it was correct. Well spotted to you. :thumbsup:
Operator64 allows you to freely switch to floats from int-64: handling overflow on int-64 internally and correcting for small errors. It slows down calculations though.

Edited February 12, 2018 by czardas

Float to integer - but integer get to big

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members