Bug 616105
Summary: | problems with 64b division on 32b platforms. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Issue Tracker <tao> | ||||||||||
Component: | kernel | Assignee: | Oleg Nesterov <onestero> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Petr Beňas <pbenas> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 6.0 | CC: | cww, pbenas, pstehlik, tao | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | kernel-2.6.32-91.el6 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-05-23 20:42:07 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Issue Tracker
2010-07-19 16:47:10 UTC
Event posted on 07-13-2010 03:10pm EDT by woodard From: Brian Behlendorf <behlendorf1> To: Ben Woodard <bwoodard> Cc: Mark Grondona <mgrondona> Subject: div64_u64() incorrect Date: 07/13/2010 12:01:24 PM Hi Ben, While implementing signed 64-bit division in the kernel for 32-bit platforms (for zfs) I discovered something very disturbing. The regression tests I added to make sure I got it right instead showed that the kernels div64_u64() implementation is just wrong for certain cases. I've attached a trivial reproducer which runs ~2700 interesting divisions and flags the ones which result in the wrong answer. On a 64-bit platforms everythings passes correctly, on a 32-bit platform 17 tests fails. One of the failures cases is as follows: div64_u64(7000000080000000, 7000000080000001) = 0000000000000001 I dug in to it a bit and it turns out the div64_u64() implementation is just wrong for certain cases. The answer will be close to right but not exactly right, perhaps this was done to simplify the code? If so there should at least be a HUGE comment saying that this is an approximation! Can you ask around redhat if there's a good reason this is wrong. There are correct implementations of this algorithm floating around so it looks like I'll just need to adopt one of them and drop the kernels. It won't be optimized but at least it will be right! Also, have you heard anything back on my btrfs questions? Where do that stand? Thanks, Brian This event sent from IssueTracker by kbaxley [LLNL (HPC)] issue 1136073 Event posted on 07-13-2010 03:10pm EDT by woodard File uploaded: div64.tgz This event sent from IssueTracker by kbaxley [LLNL (HPC)] issue 1136073 it_file 857373 Event posted on 07-13-2010 03:26pm EDT by woodard same in upstream kernels up to 2.6.34.1 woodard assigned to issue for LLNL (HPC). Status set to: Waiting on Tech This event sent from IssueTracker by kbaxley [LLNL (HPC)] issue 1136073 Created attachment 432959 [details]
reproducer
NOTABUG, I think. Please look at div64_u64(), it doesn't even try to pretend it can do the 64bit division precisely, u64 div64_u64(u64 dividend, u64 divisor) { u32 high, d; high = divisor >> 32; if (high) { unsigned int shift = fls(high); d = divisor >> shift; dividend >>= shift; } else d = divisor; return dividend / d; } If there is something in the high word of divisor, div64_u64() just shifts both arguments and throws out the low bits. IOW. Consider u64 a = (1uul << 33); u64 b = (1uul << 33) + 1; In this case div64_u64(a, b) should return 1 which is obviously not correct since a < b. But since div64_u64 does ">> 1" it can't "see" the difference in the lower bit. I believe this is by design. Event posted on 2010-08-02 06:57 PDT by woodard I think that for customer satisfaction we are going to need to have a bit more info than that. Can we point to a commit log or some lkml discussion regarding it. Something that indicates that it is by design rather than just a sloppy implementation. The evidence that you present just confirms that there are bugs not that it was intentional. This event sent from IssueTracker by woodard issue 1136073 (In reply to comment #8) > > I think that for customer satisfaction we are going to need to have a bit > more info than that. Can we point to a commit log or some lkml discussion > regarding it. The commit log says nothing. 3927f2e8f9afa3424bb51ca81f7abac01ffd0005 just consolidates the multiple definitions. I didn't find any commit which introduces something like this helper alone. > Something that indicates that it is by design rather than > just a sloppy implementation. I just noticed that div64_u64() has a comment which says "dynamic precision". Perhaps this is enough? > The evidence that you present just confirms > that there are bugs not that it was intentional. I bet it was. Surely the code writer understood it is not possible to throw out the lower bits without loosing the accuracy ;) (In reply to comment #9) > (In reply to comment #8) > > > > The evidence that you present just confirms > > that there are bugs not that it was intentional. > > I bet it was. Surely the code writer understood it is not possible > to throw out the lower bits without loosing the accuracy ;) I sent the trivial patch which adds the comment upstream. Event posted on 2010-08-02 10:20 PDT by woodard From: Brian Behlendorf <behlendorf1> To: woodard Subject: 64-bit division implementation Date: 08/02/2010 12:01:01 PM Ben, Attached is my slightly modified version of a public domain implementation hosted at Hacker Delight. My version is basically just adjusted to be kernel friendly and similiarly implements __udivdi3/__divdi3. These of course would be called div64_u64/div64_s64 respectively if added to the kernel. It's worth looking at the original however since it has a nice proof explaining why this is right. http://www.hackersdelight.org/HDcode/newCode/divDouble.c $ ./build.sh make: Entering directory `/usr/src/kernels/2.6.32.12-115.fc12.i686' LD /home/behlendo/src/git/div64/built-in.o CC [M] /home/behlendo/src/git/div64/div64.o Building modules, stage 2. MODPOST 1 modules CC /home/behlendo/src/git/div64/div64.mod.o LD [M] /home/behlendo/src/git/div64/div64.ko make: Leaving directory `/usr/src/kernels/2.6.32.12-115.fc12.i686' $ sudo /sbin/insmod ./div64.ko $ dmesg | tail -4 Testing unsigned 64-bit division. Passed all 2756 tests Testing signed 64-bit division. Passed all 2450 tests -- Thanks, Brian This event sent from IssueTracker by woodard issue 1136073 it_file 923203 Event posted on 2010-08-09 09:59 PDT by woodard From: Brian Behlendorf <behlendorf1> To: Andrew Morton <akpm> Cc: Oleg Nesterov <oleg>, Ben Woodard <bwoodard>, Jeremy Fitzhardinge <jeremy>, Mark Grondona <mgrondona>, linux-kernel.org <linux-kernel.org> Subject: [PATCH] Make div64_u64() precise on 32bit platforms Date: 08/09/2010 11:30:52 AM > On Mon, 2 Aug 2010 18:09:51 +0200 > > Oleg Nesterov <oleg> wrote: > > We have a bugreport which blames div64_u64() on 32bit platforms. > > > > However, the code obviously doesn't even try to pretend it can do > > the 64bit division precisely. If there is something in the high > > word of divisor, div64_u64() just shifts both arguments and throws > > out the low bits. > > Well that was a bit lazy of us - I wonder how hard it is to fix. > > At present people will test their code on 64-bit only to find out later > that it doesn't work correctly on 32-bit. Bad. Perhaps we should > similarly break the 64-bit version :) Here's an even crazier idea, let's just fix the 32-bit version. :) The attached patch fully implements div64_u64() such that it will return precisely the right quotient even when the divisor exceeds 32-bits. The patch also adds a div64_s64() function to fully support signed 64-bit division. Because this fix is non-obvious I have also included a unsigned and signed regression test to verify the correctness of the patch. Using a vanilla 2.6.35 kernel the unsigned regression tests fails on 32-bit platforms. With the proposed patch applied both the unsigned and signed tests pass. -- Thanks, Brian This event sent from IssueTracker by woodard issue 1136073 Created attachment 437649 [details]
[PATCH] Fix div64_u64 for 32bit platforms
The current implementation of div64_u64 for 32bit systems returns
an approximately correct result when the divisor exceeds 32bits.
Since doing 64bit division using 32bit hardware is a long since
solved problem we just use one of the existing proven methods.
Additionally, add a div64_s64 function to correctly handle doing
signed 64bit division.
Created attachment 437650 [details]
unsigned test program
Created attachment 437651 [details]
signed test module
The last (simplified) version of Brian's patch was taken into -mm. I think, most probably it will be merged during the current 2.6.37 window, I'll forward this patch to rhkl then. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. [RHEL6.1 PATCH] bz616105: div64_u64(): improve precision on 32bit platforms http://post-office.corp.redhat.com/archives/rhkernel-list/2010-November/msg01360.html (In reply to comment #24) > > [RHEL6.1 PATCH] bz616105: div64_u64(): improve precision on 32bit platforms > http://post-office.corp.redhat.com/archives/rhkernel-list/2010-November/msg01360.html Forgot to attach the patch... [RHEL6.1 PATCH v2 1/1] bz616105: div64_u64(): improve precision on 32bit platforms http://post-office.corp.redhat.com/archives/rhkernel-list/2010-November/msg01363.html Patch(es) available on kernel-2.6.32-91.el6 Reproduced in 2.6.32-90.el6 and verified in 2.6.32-91.el6. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |