Bug 1668164 - linux kernel 64-bit division implementation contains error.
Summary: linux kernel 64-bit division implementation contains error.
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-22 05:48 UTC by Siarhei Volkau
Modified: 2022-07-03 21:58 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
simple driver code to showcase problem (1.27 KB, application/gzip)
2019-01-22 19:38 UTC, Siarhei Volkau
no flags Details
proposed patch for div64.c (937 bytes, patch)
2019-01-24 14:27 UTC, Steve
no flags Details | Diff

Description Siarhei Volkau 2019-01-22 05:48:36 UTC
1. Please describe the problem:
Far time ago in the linux kernel was introduced helper
function named div64_u64_rem (https://elixir.bootlin.com/linux/latest/source/lib/div64.c#L102).
As i understand no one test this function accurately.

Today i found that this function gets wrong result for some input.
Example: try to divide 0x8000000000000000ULL by 4299651251ULL via this helper function. As you can see dividend should be relatively big and divisor should have ones on LSB bits then problem occurs.


2. What is the Version-Release number of the kernel:
3.12.x - present (pre-5.0)

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
4.0. instal 32-bit linux to VM
4.1. create simple kernel module
4.2. divide 0x8000000000000000ULL by 4299651251ULL via div64_u64_rem().
4.3. print quotient and remainder in the log.
4.4. compare with expected results.
 in other case (on 64-bit system) you can:
4.5. create simple c program
4.6. copy div64_u64_rem helper and its dependencies into this program.
4.7. divide 0x8000000000000000ULL by 4299651251ULL via div64_u64_rem().
4.8. print quotient and remainder in the log.
4.9. compare with expected results.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Assume yes.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
Not relevant

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Steve 2019-01-22 17:58:42 UTC
(In reply to Siarhei Volkau from comment #0)
...
> 4.1. create simple kernel module
....
> 4.5. create simple c program
...

Could you attach the source code and makefile for your test cases?

Comment 2 Steve 2019-01-22 18:40:02 UTC
(In reply to Siarhei Volkau from comment #0)
> 1. Please describe the problem:
> Far time ago in the linux kernel was introduced helper
> function named div64_u64_rem
> (https://elixir.bootlin.com/linux/latest/source/lib/div64.c#L102).
...

The function div64_u64_rem was added to the kernel in this commit:

math64: New separate div64_u64_rem helper
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/lib/div64.c?id=eb18cba78c2b9250663021e17e1e9cc34630e92a

This is the git log for lib/div64.c:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/lib/div64.c

NB: The log shows an earlier implementation of div64_u64_rem that was then reverted:

math64: New div64_u64_rem helper
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/lib/div64.c?id=f792685006274a850e6cc0ea9ade275ccdfc90bc

Comment 3 Siarhei Volkau 2019-01-22 19:38:22 UTC
Created attachment 1522500 [details]
simple driver code to showcase problem

usable with 32-bit kernel only

Steps to reporduce:
1. unpack archive
2. make
3. sudo insmod ./div64_bug.ko
4. see dmesg for driver output

Comment 4 Siarhei Volkau 2019-01-22 19:47:34 UTC
(In reply to Steve from comment #2)
> 
> The function div64_u64_rem was added to the kernel in this commit:
> 

The problem persists in all implementations you mentioned IMHO (not tested).

Since this line:
quot = div_u64(dividend >> n, divisor >> n);
present in all of them. 

Reference implementation from
http://www.hackersdelight.org/hdcodetxt/divDouble.c.txt
works for this test case and other numbers as expected.

Comment 5 Hans de Goede 2019-01-22 19:51:38 UTC
Hi Siarhei,

Note: I've not tried to reproduce the bug.

If you really believe that there is an issue with the div_u64 implementation in the kernel, then Fedora's bugzilla is not the best place to discuss this.

It would be much better if you directly report this bug to the mainline kernel developers by sending an email to: linux-kernel.org about this bug.

Regards,

Hans

Comment 6 Siarhei Volkau 2019-01-22 20:16:51 UTC
(In reply to Hans de Goede from comment #5)

> Hi Siarhei,
Hi Hans.

> issue with the div_u64
No, only with the div64_u64_rem helper.

> It would be much better if you directly report this bug to the mainline
> kernel developers by sending an email to: linux-kernel.org about
> this bug.

Sorry, i'm not have experience and time to take this.
Also i have no idea how to fix that bug.

Its not so hard to reproduce bug. Please, do it and report by itself.

Thank you.

Comment 7 Hans de Goede 2019-01-22 20:34:54 UTC
(In reply to Siarhei Volkau from comment #6)
> > It would be much better if you directly report this bug to the mainline
> > kernel developers by sending an email to: linux-kernel.org about
> > this bug.
> 
> Sorry, i'm not have experience and time to take this.
> Also i have no idea how to fix that bug.

It is really not that hard to send a single mail to the linux-kernel.org mailinglist and the people there always welcome well worded bug-reports, especially when they come with a reproducer such as your bug report.

I'm sorry but I do not have the time to follow-up on this on your behalf, I have 100s of bugs which need my attention.

Comment 8 Steve 2019-01-22 20:43:01 UTC
For the record, this is the complete commit (which has changes to math64.h and div64.c):

math64: New separate div64_u64_rem helper
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb18cba78c2b9250663021e17e1e9cc34630e92a

IIUC, this is only a problem with 32-bit kernels, so I suggest being more specific in the bug summary:

"32-bit kernel implementation of div64_u64_rem contains error"

See:
#elif BITS_PER_LONG == 32
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/math64.h?id=eb18cba78c2b9250663021e17e1e9cc34630e92a#n58

BTW, in addition to what Hans suggests, upstream kernel bugs can be reported here:
https://bugzilla.kernel.org/

Comment 9 Steve 2019-01-22 21:00:20 UTC
(In reply to Siarhei Volkau from comment #3)
> Created attachment 1522500 [details]
> simple driver code to showcase problem
...

Thanks. For completeness, could you post the "pr_warn" output from a test run?

Comment 10 Siarhei Volkau 2019-01-23 06:54:23 UTC
(In reply to Steve from comment #9)
> (In reply to Siarhei Volkau from comment #3)
> > Created attachment 1522500 [details]
> > simple driver code to showcase problem
> ...
> 
> Thanks. For completeness, could you post the "pr_warn" output from a test
> run?

There is it:
[  420.526425] div64_bug: loading out-of-tree module taints kernel.
[  420.526462] div64_bug: module verification failed: signature and/or required key missing - tainting kernel
[  420.528173] div64_u64_rem_bug showtime:
[  420.528178] div64_u64_rem_bug calculates 9223372036854775808 / 4299651251 via div64_u64_rem() helper
[  420.528180] div64_u64_rem_bug expect receive:
[  420.528182] quotient: 2145144221 and remainder: 3456705337
[  420.528184] but got
[  420.528185] quotient: 2145144223 and remainder: 18446744068566954451
[  420.528187] div64_u64_rem_bug ends.

Comment 11 Siarhei Volkau 2019-01-23 07:22:32 UTC
https://bugzilla.kernel.org/show_bug.cgi?id=202391

Comment 12 Steve 2019-01-23 08:38:32 UTC
(In reply to Siarhei Volkau from comment #10)
...
> [  420.528178] div64_u64_rem_bug calculates 9223372036854775808 / 4299651251
> via div64_u64_rem() helper
> [  420.528180] div64_u64_rem_bug expect receive:
> [  420.528182] quotient: 2145144221 and remainder: 3456705337
> [  420.528184] but got
> [  420.528185] quotient: 2145144223 and remainder: 18446744068566954451
...

Thanks, Siarhei. Python returns the same expected results:

$ python
...
>>> 9223372036854775808 / 4299651251
2145144221L
>>> 9223372036854775808 % 4299651251
3456705337L

And thanks for opening and linking to the upstream bug.

Comment 13 Steve 2019-01-23 09:57:02 UTC
After looking at the code, I wonder if it is dividing by zero, but not raising an exception. Could you try that?

9223372036854775808 / 0

Note that the reference implementation explicitly skips that case:
...
   static unsigned long long tabu[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
...
      for (j = 1; j < n; j++) {         // Skip tabu[0], which is 0.
...

Comment 14 Steve 2019-01-23 12:06:17 UTC
(In reply to Steve from comment #13)
> After looking at the code, I wonder if it is dividing by zero, but not
> raising an exception. Could you try that?

That's not it. Here's the computation using python:

>>> dividend = 9223372036854775808L
>>> divisor = 4299651251L
>>> n = 2
>>> quot = (dividend >> n) / (divisor >> n)
>>> quot
2145144223L
>>> quot1 = quot - 1
>>> quot1
2145144222L
>>> remainder = dividend - quot1 * divisor
>>> remainder
-842945914L

The problem is that quot1 is off-by-one:

>>> quot2 = quot1 - 1
>>> quot2
2145144221L
>>> remainder2 = dividend - quot2 * divisor
>>> remainder2
3456705337L

And that is the correct remainder.

Variable names are based on the source code:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/lib/div64.c?h=v4.19.17#n90

NB: I've omitted some details -- notably what "fls()" does ...

Comment 15 Steve 2019-01-23 23:05:36 UTC
The algorithm for computing "quot" in div64_u64_rem is the same as in div64_u64. That algorithm appears to have been introduced in this commit:

div64_u64(): improve precision on 32bit platforms
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/lib/div64.c?id=658716d19f8f155c67d4677ba68034b8e492dfbe

Note, in particular, this change:
...
-		unsigned int shift = fls(high);
...
+		int n = 1 + fls(high);
...

NB: The side-by-side view of the diffs is easier to read, because the order of the conditional branches was reversed.

Comment 16 Steve 2019-01-24 14:27:25 UTC
Created attachment 1523149 [details]
proposed patch for div64.c

This is a proposed patch for div64.c that attempts to replicate the reference implementation more closely.

Disclaimer: This has not been compiled or tested.

Comment 17 Steve 2019-01-27 18:04:14 UTC
The div64 kernel code should be subject to a unit test.

Is there a way to extract specific functions from an executable kernel and convert them into a library that could be linked with a unit test driver?

BTW, kernel unit tests are a subject of current interest:

Unit Testing in the Linux Kernel
by Zack Brown
January 3, 2019
https://www.linuxjournal.com/content/unit-testing-linux-kernel

Comment 18 James 2021-01-04 17:16:02 UTC Comment hidden (spam)
Comment 19 Mary Rax 2021-07-01 08:16:11 UTC Comment hidden (spam)
Comment 20 neville shayon 2021-07-01 11:31:34 UTC Comment hidden (spam)
Comment 21 Doug 2021-07-22 08:12:29 UTC Comment hidden (spam)
Comment 22 Dennis Olson 2022-07-03 21:58:08 UTC Comment hidden (spam)

Note You need to log in before you can comment on or make changes to this bug.