|
#1 |
"99(4^34019)99 palind"
Nov 2016
(P^81993)SZ base 36
25×5×23 Posts |
Double: 64 bits
bit 63: sign (0 = positive, 1 = negative) bit 62 to bit 52: exponent (bit 62 = e_10, bit 61 = e_9, bit 60 = e_8, ..., bit 53 = e_1, bit 52 = e_0) (the exponent uses two's complement representation, the range is -1024 (e_10,e_9,e_8,...,e_1,e_0 = 10000000000) through +1023 (e_10,e_9,e_8,...,e_1,e_0 = 01111111111)) bit 51 to bit 0: significand precision (bit 51 = s_51, bit 50 = s_50, bit 49 = s_49, ..., bit 1 = s_1, bit 0 = s_0) the value of the number is (-1)^sign*(1.s_51,s_50,s_49,...,s_1,s_0)2*2^(e_10,e_9,e_8,...,e_1,e_0)2 (for the integer part in the "scientific notation", in decimal (base 10) it can be 1, 2, 3, ..., 8, 9, but in binary (base 2) it must be 1, thus this 1 need not to memory) Special cases: * sign = 0, e_10,e_9,e_8,...,e_1,e_0 = 01111111111 (+1023), s_51,s_50,s_49,...,s_1,s_0 are all 1 --> +∞ * sign = 1, e_10,e_9,e_8,...,e_1,e_0 = 01111111111 (+1023), s_51,s_50,s_49,...,s_1,s_0 are all 1 --> -∞ * sign = 0, e_10,e_9,e_8,...,e_1,e_0 = 10000000000 (-1024), s_51,s_50,s_49,...,s_1,s_0 are all 0 --> 0 * sign = 1, e_10,e_9,e_8,...,e_1,e_0 = 10000000000 (-1024), s_51,s_50,s_49,...,s_1,s_0 are all 0 --> NaN (I think this double floating-point is better than the original one, since it is bijective) Last fiddled with by sweety439 on 2022-05-03 at 03:34 |
|
|
|
#2 |
"99(4^34019)99 palind"
Nov 2016
(P^81993)SZ base 36
25·5·23 Posts |
Examples:
Code:
0000 0000 0000 0000 = 1 8010 0000 0000 0000 = −2 0057 8000 0000 0000 = 47 8085 5000 0000 0000 = −341 3fff ffff ffff fffe = 21024−2972 (Max double) 4000 0000 0000 0001 = 2−1024+2−1076 (Min double) 7ff0 0000 0000 0000 = 1/2 7fe5 5555 5555 5555 ≈ 1/3 4000 0000 0000 0000 = 0 c000 0000 0000 0000 = NaN 3fff ffff ffff ffff = ∞ bfff ffff ffff ffff = −∞ Last fiddled with by sweety439 on 2022-05-03 at 03:22 |
|
|
|
#3 |
"99(4^34019)99 palind"
Nov 2016
(P^81993)SZ base 36
25×5×23 Posts |
single: 32 bits, including 1 sign bit, 8 exponent bits, 23 significand precision bits
double: 64 bits, including 1 sign bit, 11 exponent bits, 52 significand precision bits long double: 80 bits, including 1 sign bit, 16 exponent bits, 63 significand precision bits quadruple: 128 bits, including 1 sign bit, 15 exponent bits, 112 significand precision bits octuple: 256 bits, including 1 sign bit, 16 exponent bits, 239 significand precision bits (I think it is better, since 239 significand precision bits means its significant bits is 239+1 = 240 bits, and 240 is a highly-composite number, i.e. 240 has many divisors, and 240 bits ≈ 72 decimal digits (and thus its significant decimal digits is 72 digits), and 72 also has many divisors (72 is known as the smallest Achilles number), the current octuple-precision is 1 sign bit, 19 exponent bits, 236 significand precision bits) super: 65536 bits, including 1 sign bit, 256 exponent bits, 65280 significand precision bits use my sense of two's complement exponent and only one bit combo refers to each of (+∞,-∞,0,NaN), super-precision has 65279+1 = 65280 bits (≈19652 decimal digits) significant digits, its maximum number is 2^(2^255)-2^(2^255-65280), and its minimum nonzero number is 2^(-2^255)+2^(-2^255-65280) Last fiddled with by sweety439 on 2022-05-03 at 03:32 |
|
|
|
#4 |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
61×109 Posts |
The lack of negative zero will be a problem for some algorithms.
And if NaN always, or never, faults then is could be cumbersome. The existing QNaN vs SNaN allows for some nice efficiencies. And the extra bits in the QNaN/SNaN encoding provides good debugging opportunities. I don't care about denormals, so whatever. But -0 == Nan? Why? Extra circuitry/code for what gain? |
|
|
|
#5 |
"99(4^34019)99 palind"
Nov 2016
(P^81993)SZ base 36
25×5×23 Posts |
In super-precision:
Number rounds to 65280 binary significant digits, use Gaussian rounding (round half to even), i.e. (the bold number is the 65280th binary significant digit) * ...00... --> ...0 * ...01...1... --> ...1 * ...01000... (the digits after the only 1 in the 65281st bit are all 0) --> ...0 * ...11... --> ...(+1)0 * ...10...0... --> ...1 * ...10111... (the digits after the only 0 in the 65281st bit are all 1) --> ...(+1)0 (remember: 0.999... = 1) These calculations return "+∞": * the result number >= 2^(2^255) (in fact, >=2^(2^255)-2^(2^255-65281), since we must use Gaussian rounding to round to 65280 binary significant digits, thus 2^(2^255)-2^(2^255-65281) (which has 65281 consecutive 1's after the "0" in the 2^(2^255) bit) become 2^(2^255) and become +∞, since the 2^(2^255-65280) digit is 1 and the digits after it is exactly a half, thus it will be rounded up) * (+∞) + (x) (except the cases x = -∞ and x = NaN) * (+∞) - (x) (except the cases x = +∞ and x = NaN) * (x) - (-∞) (except the cases x = -∞ and x = NaN) * (+∞) * (x) when x > 0 (including x = +∞) * (-∞) * (x) when x < 0 (including x = -∞) * (+∞) / (x) when x >= 0 (except x = +∞) * (-∞) / (x) when x < 0 (except x = -∞) * (x) / (0) when x > 0 (including x = +∞) * (+∞) ^ (x) when x > 0 (including x = +∞) * (x) ^ (+∞) when x > 1 (including x = +∞) * (x) ^ (-∞) when 0 <= x < 1 These calculations return "0": * the result number between 2^(-2^255) and -2^(-2^255) inclusive (in fact, between 2^(-2^255)+2^(-2^255-65281) and -2^(-2^255)-2^(-2^255-65281) inclusive, since we must use Gaussian rounding to round to 65280 binary significant digits, thus 2^(-2^255)+2^(-2^255-65281) (which has 65280 consecutive 0's after the "1" in the 2^(-2^255) bit) become 2^(2^-255) and become 0, since the 2^(-2^255-65280) digit is 0 and the digits after it is exactly a half, thus it will be rounded down) * (x) / (+∞) * (x) / (-∞) * (x) ^ (+∞) when 0 <= x < 1 * (x) ^ (-∞) when x > 1 (including x = +∞) These calculations return "NaN": * at least one number is NaN * the result number is complex number, e.g. (-1)^(1/2) * (+∞) + (-∞) * (+∞) - (+∞) * (+∞) * (0) * (+∞) / (+∞) * (0) / (0) * (0) ^ (0) * (+∞) ^ (0) * 1 ^ (+∞) Last fiddled with by sweety439 on 2022-05-03 at 03:34 |
|
|
|
#6 | |
"99(4^34019)99 palind"
Nov 2016
(P^81993)SZ base 36
25·5·23 Posts |
Quote:
|
|
|
|
|
#7 | |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
61×109 Posts |
Quote:
You need something to manipulate the bits to do the computations. If you make the bit patterns hard to deal with then it is no fun to use. Too many special cases in the code or the circuitry. |
|
|
|
|
||||
Thread | Thread Starter | Forum | Replies | Last Post |
floating point operations | ATH | Lounge | 3 | 2006-01-01 20:29 |
Floating point options for Windows XP 64 | dsouza123 | Hardware | 2 | 2005-03-12 17:45 |
LL tests: Integer or floating point? | E_tron | Math | 4 | 2004-01-13 19:44 |
Floating point precision | lunna | Hardware | 11 | 2003-12-29 16:46 |
floating point exception in Version 23.4.2 | mda2376 | Software | 2 | 2003-06-12 04:45 |