Bug 43354
Summary: | expm1l(x) overflows when 11356.1768 < x <= 11356.5234 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Trevin Beattie <trevin> | ||||||
Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 8 | CC: | drepper, dvlasenk, mattdm, notting, petrosyan | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-09-17 13:03:59 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Trevin Beattie
2001-06-03 18:44:26 UTC
Ulrich, what do you think about this? > except for completeness.
> expm1l is designed to compute e^x-1 more accurately when x is negative, so
> [at least the i386 version] uses a slightly different algorithm than expl.
Yes, true. If you compute expm1 the "normal" way you use a different series
than for computing exp. But on x86 you'd not want to stop using the FPU
instructions and there is none available allows the processor internally to
use the different series. It's just more drawback of the Intel FPU.
If you want to change it intercept the +Inf path and test for the arguments
which lead to incorrect results. I don't consider this important enough to
do myself right now.
gcc-2.96 is too old. Its release cycle was finished long ago. Therefore I am closing the case. The bug was fixed at least in 3.2 (rhel3). If it is still important, the customer could reopen the case. Sorry, occasionally I modified a wrong case. I am reopening the bug. Since this bug is so old, I decided to re-confirm the bug. It is still present in glibc-2.3.2-27.9.7 (RedHat 9). But it is still just as non-urgent. ;^) Red Hat apologizes that these issues have not been resolved yet. We do want to make sure that no important bugs slip through the cracks. Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc. They are maintained by the Fedora Legacy project (http://www.fedoralegacy.org/) for security updates only. If this is a security issue, please reassign to the 'Fedora Legacy' product in bugzilla. Please note that Legacy security update support for these products will stop on December 31st, 2006. If this is not a security issue, please check if this issue is still present in a current Fedora Core release. If so, please change the product and version to match, and check the box indicating that the requested information has been provided. If you are currently still running Red Hat Linux 7.3 or 9, please note that Fedora Legacy security update support for these products will stop on December 31st, 2006. You are strongly advised to upgrade to a current Fedora Core release or Red Hat Enterprise Linux or comparable. Some information on which option may be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/. Any bug still open against Red Hat Linux 7.3 or 9 at the end of 2006 will be closed 'CANTFIX'. Again, if this bug still exists in a current release, or is a security issue, please change the product as necessary. We thank you for your help, and apologize again that we haven't handled these issues to this point. I have confirmed that this bug is present in Fedora Core 5. This is not surprising, since I had listed the bug as unimportant. Feel free to close it as 'WONTFIX' if you don't think it's important either. VERIFIED is the wrong state. It means a patch is verified. This bug is still present in Fedora 8: gcc-4.1.2-33 glibc-2.7-2 this bug is also present on x86_64. Filed upstream http://sourceware.org/bugzilla/show_bug.cgi?id=5794 I created a testcase of new algorithm, will attach it now. Extended explanation (I do hope bugzilla wont destroy text alignment): http://ex-cs.sist.ac.jp/~tkouya/cgi-bin/try_mpfr.cgi - 8192-bit online calculator x86 long double has 1 sign bit, 15-bit biased exponent, 64-bit mantissa with explicit msb. Example: 128.0 has bit value (hex) of 4006 8000 0000 0000 0000, 2.0 = 4000 8000 0000 0000 0000, 1.0 = 3fff 8000 0000 0000 0000, 0.0 = 0000 0000 0000 0000 0000, -1.0 = bfff 8000 0000 0000 0000 etc. ln(2^64) = 44.3614195558364998034. Thus e^x with x >= 45 is bigger than 2^64 and therefore subtracting 1 (e^x - 1) does not change it if mantissa has only 64 bits. Testing for x >= 45 is not very easy, it's easier to test for x >= 64 (just test for binary exponent >= 6 (in biased form: >= 0x4005). However, long doubles have explicit msb in mantissa, this even if exponent > 5, mantissa still can be 0.000...0001 (binary). Thus, we'll test for binary exponent >= 7: generally it means that x >= 128, and even denormalized long doubles are > 64 in this case. Asm code is added at the beginning of expm1: movw 8+8(%rsp), %ax // load sign bit and 15-bit exponent xorb $0x80, %ah // invert sign bit (now 1 is "positive") cmpw $0xc006, %ax // is num positive and exp >= 6 (number is >= 128.0)? jae expl // (if num is denormal, it is at least > 64.0) Testing results: ./a.out 127.999999999: ... Ok, last checked: 1.27999999999999999993e+02: 3.88770840599459506441e+55 3.88770840599459506441e+55 (3.88770840599459506521e+55) expm1(1.27999999999999999993e+02) = 3.887708405994595065008714846...e55 [expm1() is obtained using 8192-bit online calculator, see URL above] Thus, all values up to largest possible x < 128 are the same as old algorithm. ./a.out 128: Different! 1.28000000000000000000e+02: 3.88770840599459510163e+55 3.88770840599459509206e+55 (3.88770840599459509206e+55) Value: 3.88770840599459510163e+55 Bit value: 40b7 caf2 a62e ea10 bc1e (new) Value: 3.88770840599459509206e+55 Bit value: 40b7 caf2 a62e ea10 bbfa (current) expm1(128) = 3.887708405994595092222673688...e55 New algorightm gives different result, which is closer to "exact" one than result of current algorightm. Let's see how smooth the transition is: For x < 128, new and current algorithms are the same: new current exp(x) 1.27999999999999999965e+02: 3.88770840599459495222e+55 3.88770840599459495222e+55 (3.88770840599459495727e+55) 1.27999999999999999972e+02: 3.88770840599459498944e+55 3.88770840599459498944e+55 (3.88770840599459498439e+55) 1.27999999999999999979e+02: 3.88770840599459502692e+55 3.88770840599459502692e+55 (3.88770840599459501124e+55) 1.27999999999999999986e+02: 3.88770840599459506441e+55 3.88770840599459506441e+55 (3.88770840599459503836e+55) 1.27999999999999999993e+02: 3.88770840599459506441e+55 3.88770840599459506441e+55 (3.88770840599459506521e+55) [1.27999999999999999993e+02 = 4005 ffff ffff ffff ffff] 3.88770840599459509206e+55 - expm1(128) approximated as exp(128) by new algorithm 3.88770840599459510163e+55 - expm1(128) result via current algorithm Thus, we see that around x = 128 incrementing mantissa by 1 lsb makes expm1(x) grow by ~0.000000000000000037e+55 New algorithm does not exhibit "jump" here, transition from 4005 ffff ffff ffff ffff to 4006 8000 0000 0000 0000 adds ~0.000000000000000028e+55 to "new" expm1(x). I did many runs for x < 128 (including 0, negative numbers, etc) and it gives identical results as expected. Upper bound where result goes to +inf (./a.out 11356.52340629414): 1.13565234062941435678e+04: 1.18973149535677761008e+4932 1.13565234062941435687e+04: 1.18973149535677866678e+4932 1.13565234062941435695e+04: 1.18973149535677972347e+4932 [online calculator says 1.1897314953567796767173743362...e4932] 1.13565234062941435704e+04: inf 1.13565234062941435713e+04: inf 1.13565234062941435722e+04: inf Created attachment 312979 [details]
New algorithm and testcase for it
This testcase shows that approximating expm1l(x) with expl(x)
for x >= 128.0 makes expm1l(x) work correctly for x up to
11356.5234062941435695, provides more precise results for these large x,
and does not exhibit worsened precision for x < 128.0.
It also does not exhibit bad behavior at x = 128 (the function is
still ascending and df is not significantly different there
from nearby non-transition points).
Created attachment 312981 [details]
Speed measurement program for the testcase
Drop test_speed.c in the same directory as testcase, build:
gcc -Os -lm test_speed.c s_expm1l.S s_expm1l_fixed.S
and run to compare execution speeds.
Each iteration is 64 calls of __glibc27_expm1l[_fixed]. Test is done 9 times in
order to show variability.
On my machine:
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz
stepping : 11
cpu MHz : 800.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2
ssse3 cx16 xtpr lahf_lm ida
bogomips : 4394.30
# ./a.out 1
Value: 1.00000000000000000000e+00 Bit value: 3fff 8000 0000 0000 0000
Testing at:
1.00000000000000000000e+00: 1.71828182845904523543e+00
1.71828182845904523543e+00 (2.71828182845904523543e+00)
Current algorithm, iterations in 0.1 sec: 9009 11411 11923 11931 11933
11973 11932 11918 11917
New algorithm, iterations in 0.1 sec: 11510 11787 11797 11798 11794
11804 11810 11816 11808
# ./a.out 127
Value: 1.27000000000000000000e+02 Bit value: 4005 fe00 0000 0000 0000
Testing at:
1.27000000000000000000e+02: 1.43020799583481045627e+55
1.43020799583481045627e+55 (1.43020799583481044630e+55)
Current algorithm, iterations in 0.1 sec: 10781 11621 12022 12017 12022
12002 12000 12022 12021
New algorithm, iterations in 0.1 sec: 11753 11819 11835 11812 11829
11826 11835 11829 11849
# ./a.out 128
Value: 1.28000000000000000000e+02 Bit value: 4006 8000 0000 0000 0000
Testing at:
1.28000000000000000000e+02: 3.88770840599459510163e+55
3.88770840599459509206e+55 (3.88770840599459509206e+55)
Current algorithm, iterations in 0.1 sec: 9741 11620 11910 11929 11741
11947 11956 11940 11938
New algorithm, iterations in 0.1 sec: 11188 11272 11310 11320 11315
11302 11292 11286 11293
# ./a.out 11356
Value: 1.13560000000000000000e+04 Bit value: 400c b170 0000 0000 0000
Testing at:
1.13560000000000000000e+04: 7.04914579998566566372e+4931
7.04914579998566243830e+4931 (7.04914579998566243830e+4931)
Current algorithm, iterations in 0.1 sec: 11142 11612 12007 12017 12018
12018 11748 12012 11801
New algorithm, iterations in 0.1 sec: 11117 11223 11230 11241 11239
11239 11211 11223 11227
Slight error in explanation: ------------- ./a.out 128: Different! 1.28000000000000000000e+02: 3.88770840599459510163e+55 3.88770840599459509206e+55 (3.88770840599459509206e+55) Value: 3.88770840599459510163e+55 (new) Value: 3.88770840599459509206e+55 (current) expm1(128) = 3.887708405994595092222673688...e55 New algorightm gives different result, which is closer to "exact" one than result of current algorightm. ---------- (new) and (current) above should be swapped. Your patch is very weird. It's x86-64 code but this is supposed to be for x86. If it is meant to be for x86-64, then it is wrong because on that platform the floating point parameters are passed in the SSE registers. About the change itself, you jump to expl in the double implementation. You should jump to exp instead (no l suffix). Furthermore, you probably shouldn't jujp to exp since this would duplicate the error handling wrapper. Jump to __ieee754_exp. This also takes care of the problem of jumping to a symbol which is exported and hence would have to go through the PLT. About the code itself: all the additional text like those ASM_TYPE_DIRECTIVE are not optional. You cannot just remove it. Leave it all on place. > Your patch is very weird. It's x86-64 code but this is supposed to be for x86. I do not understand how you reached this conclusion. What exactly is weird? > If it is meant to be for x86-64, then it is wrong because on that platform the floating point parameters are passed in the SSE registers. The bug is about expm1l - note 'l'! It's a *long double* function. I doubt very much you can pass 80-bit long doubles in 64-bit SSE registers. Therefore this function has to use legacy x86 FP registers and as a result, the function uses no x86_64 specific instructions. It's the same for x86_64 and x86_32. > About the change itself, you jump to expl in the double implementation. You should jump to exp instead (no l suffix). See above. It's exp1ml, not exp1m. Therefore I have to jump to expl. Had I jumped to exp, my testing would instantly catch it. Do you think I lied about my test results? > Furthermore, you probably shouldn't jujp to exp since this would duplicate the error handling wrapper. Jump to __ieee754_exp. This also takes care of the problem of jumping to a symbol which is exported and hence would have to go through the PLT. You are correct here, in a glibc patch I would need to do that. What I did attach is not a patch which can be applied to glibc tree, I attached a test program which is meant to show that the proposed change is working correctly. Making a glibc patch based on this program is easy. I brought the changes into shape to be applied to glibc cvs. The next rawhide build will have the fixed code. Thanks Ulrich, should I close this bug now? Fixed in rawhide. Might or might not be backported to older Fedora releases. It'st important enough to bother. |