From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.2-2 i586)
Description of problem:
The value of (e^x - 1) is representable as a long double when x <=
11356.5234, but expm1l(x) overflows to infinity when x > 11356.1768
(approximately).
How reproducible:
Always
Steps to Reproduce:
Compile and run the following program:
#include <stdio.h>
#include <math.h>
main ()
{
long double a, b;
const long double x = 11356.35L;
a = expl (x) - 1.0L;
b = expm1l (x);
if (a == b) {
printf ("e^%.2Lf - 1 = %Lg\n", b);
return 0;
} else {
printf ("e^%.2Lf - 1 = %Lg; but expm1l(%.2Lf) returned %Lg\n",
x, a, x, b);
return 1;
}
}
Actual Results: e^11356.35 - 1 = 1.00032e+4932; but expm1l(11356.35)
returned inf
Expected Results: e^11356.35 - 1 = 1.00032e+4932
Additional info:
This bug probably isn't even worth reporting, except for completeness.
expm1l is designed to compute e^x-1 more accurately when x is negative, so
[at least the i386 version] uses a slightly different algorithm than expl.
This algorithm overflows for large x where expl would not. But then, expm1
really isn't intended for large x anyway.
In my own port of the math library, I circumvent the problem by calling
__ieee754_expl when x >= 64, because at that magnitude, -1 is too small to
affect the result. This is probably overkill, but it works.

> except for completeness.
> expm1l is designed to compute e^x-1 more accurately when x is negative, so
> [at least the i386 version] uses a slightly different algorithm than expl.
Yes, true. If you compute expm1 the "normal" way you use a different series
than for computing exp. But on x86 you'd not want to stop using the FPU
instructions and there is none available allows the processor internally to
use the different series. It's just more drawback of the Intel FPU.
If you want to change it intercept the +Inf path and test for the arguments
which lead to incorrect results. I don't consider this important enough to
do myself right now.

gcc-2.96 is too old. Its release cycle was finished long ago.
Therefore I am closing the case. The bug was fixed at least in 3.2
(rhel3). If it is still important, the customer could reopen the case.

Since this bug is so old, I decided to re-confirm the bug. It is
still present in glibc-2.3.2-27.9.7 (RedHat 9).
But it is still just as non-urgent. ;^)

Red Hat apologizes that these issues have not been resolved yet. We do want to
make sure that no important bugs slip through the cracks.
Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc.
They are maintained by the Fedora Legacy project (http://www.fedoralegacy.org/)
for security updates only. If this is a security issue, please reassign to the
'Fedora Legacy' product in bugzilla. Please note that Legacy security update
support for these products will stop on December 31st, 2006.
If this is not a security issue, please check if this issue is still present
in a current Fedora Core release. If so, please change the product and version
to match, and check the box indicating that the requested information has been
provided.
If you are currently still running Red Hat Linux 7.3 or 9, please note that
Fedora Legacy security update support for these products will stop on December
31st, 2006. You are strongly advised to upgrade to a current Fedora Core release
or Red Hat Enterprise Linux or comparable. Some information on which option may
be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/.
Any bug still open against Red Hat Linux 7.3 or 9 at the end of 2006 will be
closed 'CANTFIX'. Again, if this bug still exists in a current release, or is a
security issue, please change the product as necessary. We thank you for your
help, and apologize again that we haven't handled these issues to this point.

I have confirmed that this bug is present in Fedora Core 5. This is not
surprising, since I had listed the bug as unimportant. Feel free to close it as
'WONTFIX' if you don't think it's important either.

I created a testcase of new algorithm, will attach it now.
Extended explanation (I do hope bugzilla wont destroy text alignment):
http://ex-cs.sist.ac.jp/~tkouya/cgi-bin/try_mpfr.cgi - 8192-bit online calculator
x86 long double has 1 sign bit, 15-bit biased exponent, 64-bit mantissa
with explicit msb. Example: 128.0 has bit value (hex) of
4006 8000 0000 0000 0000,
2.0 = 4000 8000 0000 0000 0000,
1.0 = 3fff 8000 0000 0000 0000,
0.0 = 0000 0000 0000 0000 0000,
-1.0 = bfff 8000 0000 0000 0000 etc.
ln(2^64) = 44.3614195558364998034.
Thus e^x with x >= 45 is bigger than 2^64 and therefore subtracting 1
(e^x - 1) does not change it if mantissa has only 64 bits.
Testing for x >= 45 is not very easy, it's easier to test for x >= 64
(just test for binary exponent >= 6 (in biased form: >= 0x4005).
However, long doubles have explicit msb in mantissa, this even if
exponent > 5, mantissa still can be 0.000...0001 (binary).
Thus, we'll test for binary exponent >= 7: generally it means
that x >= 128, and even denormalized long doubles are > 64 in this case.
Asm code is added at the beginning of expm1:
movw 8+8(%rsp), %ax // load sign bit and 15-bit exponent
xorb $0x80, %ah // invert sign bit (now 1 is "positive")
cmpw $0xc006, %ax // is num positive and exp >= 6 (number is >=
128.0)?
jae expl // (if num is denormal, it is at least > 64.0)
Testing results:
./a.out 127.999999999:
...
Ok, last checked:
1.27999999999999999993e+02: 3.88770840599459506441e+55
3.88770840599459506441e+55
(3.88770840599459506521e+55)
expm1(1.27999999999999999993e+02) = 3.887708405994595065008714846...e55
[expm1() is obtained using 8192-bit online calculator, see URL above]
Thus, all values up to largest possible x < 128 are the same as old algorithm.
./a.out 128:
Different!
1.28000000000000000000e+02: 3.88770840599459510163e+55
3.88770840599459509206e+55 (3.88770840599459509206e+55)
Value: 3.88770840599459510163e+55 Bit value: 40b7
caf2 a62e ea10 bc1e (new)
Value: 3.88770840599459509206e+55 Bit value: 40b7
caf2 a62e ea10 bbfa (current)
expm1(128) = 3.887708405994595092222673688...e55
New algorightm gives different result, which is closer to "exact" one
than result of current algorightm.
Let's see how smooth the transition is:
For x < 128, new and current algorithms are the same:
new current
exp(x)
1.27999999999999999965e+02: 3.88770840599459495222e+55
3.88770840599459495222e+55 (3.88770840599459495727e+55)
1.27999999999999999972e+02: 3.88770840599459498944e+55
3.88770840599459498944e+55 (3.88770840599459498439e+55)
1.27999999999999999979e+02: 3.88770840599459502692e+55
3.88770840599459502692e+55 (3.88770840599459501124e+55)
1.27999999999999999986e+02: 3.88770840599459506441e+55
3.88770840599459506441e+55 (3.88770840599459503836e+55)
1.27999999999999999993e+02: 3.88770840599459506441e+55
3.88770840599459506441e+55 (3.88770840599459506521e+55)
[1.27999999999999999993e+02 = 4005 ffff ffff ffff ffff]
3.88770840599459509206e+55 - expm1(128)
approximated as exp(128) by new algorithm
3.88770840599459510163e+55 - expm1(128) result via
current algorithm
Thus, we see that around x = 128 incrementing mantissa by 1 lsb
makes expm1(x) grow by ~0.000000000000000037e+55
New algorithm does not exhibit "jump" here, transition
from 4005 ffff ffff ffff ffff to 4006 8000 0000 0000 0000
adds ~0.000000000000000028e+55 to "new" expm1(x).
I did many runs for x < 128 (including 0, negative numbers, etc)
and it gives identical results as expected.
Upper bound where result goes to +inf (./a.out 11356.52340629414):
1.13565234062941435678e+04: 1.18973149535677761008e+4932
1.13565234062941435687e+04: 1.18973149535677866678e+4932
1.13565234062941435695e+04: 1.18973149535677972347e+4932
[online calculator says 1.1897314953567796767173743362...e4932]
1.13565234062941435704e+04: inf
1.13565234062941435713e+04: inf
1.13565234062941435722e+04: inf

Created attachment 312979[details]
New algorithm and testcase for it
This testcase shows that approximating expm1l(x) with expl(x)
for x >= 128.0 makes expm1l(x) work correctly for x up to
11356.5234062941435695, provides more precise results for these large x,
and does not exhibit worsened precision for x < 128.0.
It also does not exhibit bad behavior at x = 128 (the function is
still ascending and df is not significantly different there
from nearby non-transition points).

Slight error in explanation:
-------------
./a.out 128:
Different!
1.28000000000000000000e+02: 3.88770840599459510163e+55
3.88770840599459509206e+55 (3.88770840599459509206e+55)
Value: 3.88770840599459510163e+55 (new)
Value: 3.88770840599459509206e+55 (current)
expm1(128) = 3.887708405994595092222673688...e55
New algorightm gives different result, which is closer to "exact" one
than result of current algorightm.
----------
(new) and (current) above should be swapped.

Your patch is very weird. It's x86-64 code but this is supposed to be for x86. If it is meant to be for x86-64, then it is wrong because on that platform the floating point parameters are passed in the SSE registers.
About the change itself, you jump to expl in the double implementation. You should jump to exp instead (no l suffix). Furthermore, you probably shouldn't jujp to exp since this would duplicate the error handling wrapper. Jump to __ieee754_exp. This also takes care of the problem of jumping to a symbol which is exported and hence would have to go through the PLT.
About the code itself: all the additional text like those ASM_TYPE_DIRECTIVE are not optional. You cannot just remove it. Leave it all on place.

> Your patch is very weird. It's x86-64 code but this is supposed to be for x86.
I do not understand how you reached this conclusion. What exactly is weird?
> If it is meant to be for x86-64, then it is wrong because on that platform the floating point parameters are passed in the SSE registers.
The bug is about expm1l - note 'l'! It's a *long double* function.
I doubt very much you can pass 80-bit long doubles in 64-bit SSE registers.
Therefore this function has to use legacy x86 FP registers and as a result,
the function uses no x86_64 specific instructions. It's the same for x86_64
and x86_32.
> About the change itself, you jump to expl in the double implementation. You should jump to exp instead (no l suffix).
See above. It's exp1ml, not exp1m. Therefore I have to jump to expl.
Had I jumped to exp, my testing would instantly catch it. Do you think
I lied about my test results?
> Furthermore, you probably shouldn't jujp to exp since this would
duplicate the error handling wrapper. Jump to __ieee754_exp.
This also takes care of the problem of jumping to a symbol
which is exported and hence would have to go through the PLT.
You are correct here, in a glibc patch I would need to do that.
What I did attach is not a patch which can be applied to glibc tree,
I attached a test program which is meant to show that the proposed
change is working correctly. Making a glibc patch based
on this program is easy.

`> except for completeness. > expm1l is designed to compute e^x-1 more accurately when x is negative, so > [at least the i386 version] uses a slightly different algorithm than expl. Yes, true. If you compute expm1 the "normal" way you use a different series than for computing exp. But on x86 you'd not want to stop using the FPU instructions and there is none available allows the processor internally to use the different series. It's just more drawback of the Intel FPU. If you want to change it intercept the +Inf path and test for the arguments which lead to incorrect results. I don't consider this important enough to do myself right now.`

`Created attachment 312979 [details] New algorithm and testcase for it This testcase shows that approximating expm1l(x) with expl(x) for x >= 128.0 makes expm1l(x) work correctly for x up to 11356.5234062941435695, provides more precise results for these large x, and does not exhibit worsened precision for x < 128.0. It also does not exhibit bad behavior at x = 128 (the function is still ascending and df is not significantly different there from nearby non-transition points).`

`Created attachment 312981 [details] Speed measurement program for the testcase Drop test_speed.c in the same directory as testcase, build: gcc -Os -lm test_speed.c s_expm1l.S s_expm1l_fixed.S and run to compare execution speeds. Each iteration is 64 calls of __glibc27_expm1l[_fixed]. Test is done 9 times in order to show variability. On my machine: # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping : 11 cpu MHz : 800.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm ida bogomips : 4394.30 # ./a.out 1 Value: 1.00000000000000000000e+00 Bit value: 3fff 8000 0000 0000 0000 Testing at: 1.00000000000000000000e+00: 1.71828182845904523543e+00 1.71828182845904523543e+00 (2.71828182845904523543e+00) Current algorithm, iterations in 0.1 sec: 9009 11411 11923 11931 11933 11973 11932 11918 11917 New algorithm, iterations in 0.1 sec: 11510 11787 11797 11798 11794 11804 11810 11816 11808 # ./a.out 127 Value: 1.27000000000000000000e+02 Bit value: 4005 fe00 0000 0000 0000 Testing at: 1.27000000000000000000e+02: 1.43020799583481045627e+55 1.43020799583481045627e+55 (1.43020799583481044630e+55) Current algorithm, iterations in 0.1 sec: 10781 11621 12022 12017 12022 12002 12000 12022 12021 New algorithm, iterations in 0.1 sec: 11753 11819 11835 11812 11829 11826 11835 11829 11849 # ./a.out 128 Value: 1.28000000000000000000e+02 Bit value: 4006 8000 0000 0000 0000 Testing at: 1.28000000000000000000e+02: 3.88770840599459510163e+55 3.88770840599459509206e+55 (3.88770840599459509206e+55) Current algorithm, iterations in 0.1 sec: 9741 11620 11910 11929 11741 11947 11956 11940 11938 New algorithm, iterations in 0.1 sec: 11188 11272 11310 11320 11315 11302 11292 11286 11293 # ./a.out 11356 Value: 1.13560000000000000000e+04 Bit value: 400c b170 0000 0000 0000 Testing at: 1.13560000000000000000e+04: 7.04914579998566566372e+4931 7.04914579998566243830e+4931 (7.04914579998566243830e+4931) Current algorithm, iterations in 0.1 sec: 11142 11612 12007 12017 12018 12018 11748 12012 11801 New algorithm, iterations in 0.1 sec: 11117 11223 11230 11241 11239 11239 11211 11223 11227`