Bug 340161

Summary: system crash on kjournald with div64_64
Product: [Fedora] Fedora Reporter: Robert Hoekstra <redhat>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 7   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.23.8-34.fc7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-12-07 23:35:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
crash 1, crashed on md1_raid1, not kjournald
none
This crash is on kjournald, not sure if it matters
none
crash on kjournald, kernel 2.6.23.1-10.fc7
none
The dmesg of the machine none

Description Robert Hoekstra 2007-10-19 17:35:25 UTC
Description of problem:
After a couple of hours (or, when lucky, days) the system hangs with a dump of a
process on screen. It doesn't show a 'kernel panic' and does not restart after 5
seconds (though I have that set in sysctl.conf).

This is occurring since the last month in shortening intervals.

Version-Release number of selected component (if applicable):
I am running an up-to-date fedora 7 system with kernel-2.6.22-9.91.fc7. I have
had this since a number of kernels though, so this version is not explicitly the
cause.

How reproducible:
Unknown. My last change to my system is adding a Promise SATA card to my P3
system, with a SATA disk attached.

Steps to Reproduce:
1.Bootup
2.wait.. (sorry, no way to actually reproduce)
3.
  
Actual results:
system hang

Expected results:
... no system hang ?

Additional info:
I will attach a couple of pictures I took of my system when it crashed. You will
notice that it varies to crash with the md1_raid1 process or kjournald.

I have run a memory test to verify that my memory is functioning properly.

Comment 1 Robert Hoekstra 2007-10-19 17:35:25 UTC
Created attachment 232821 [details]
crash 1, crashed on md1_raid1, not kjournald

Comment 2 Robert Hoekstra 2007-10-19 17:36:26 UTC
Created attachment 232831 [details]
This crash is on kjournald, not sure if it matters

Comment 3 Robert Hoekstra 2007-10-19 17:38:01 UTC
I forgot:

my lspci:
---------------------------------------
00:00.0 Host bridge: VIA Technologies, Inc. VT8605 [ProSavage PM133] (rev 81)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8605 [PM133 AGP]
00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 1b)
00:04.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:04.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 0e)
00:04.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 0e)
00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 20)
00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
00:10.0 Mass storage controller: Promise Technology, Inc. PDC40775 (SATA 300
TX2plus) (rev 02)
00:11.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet
Controller (rev 05)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400/G450 (rev 85)
---------------------------------------

for what it matters, my cpuinfo:
---------------------------------------
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 6
cpu MHz         : 803.448
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36
mmx fxsr sse up
bogomips        : 1608.22
clflush size    : 32
---------------------------------------

Comment 4 Robert Hoekstra 2007-11-07 19:16:59 UTC
Created attachment 250651 [details]
crash on kjournald, kernel 2.6.23.1-10.fc7

With the newest kernel, 2.6.23.1-10.fc7, also crashes, here's a new crash dump.

Comment 5 Robert Hoekstra 2007-11-07 19:17:47 UTC
Created attachment 250661 [details]
The dmesg of the machine

Comment 6 Chuck Ebbert 2007-11-07 23:05:44 UTC
You can work around this problem in the kernel by adding this to /etc/sysctl.conf:

kernel.sched_features = 21

This will disable precise CPU time accounting, a feature that is already removed
in 2.6.24.


Comment 7 Chuck Ebbert 2007-11-09 19:51:58 UTC
Now disabled by default. Could be re-enabled by a user but that's not very likely.

Comment 8 Robert Hoekstra 2007-11-09 19:56:42 UTC
Thank you for the response. I have applied the suggested change. Unfortunately,
only time will tell me if this is the solution. I do have good hopes though. I
will respond within two or three weeks with my result, with which I will close
the bug if that's okay.

Comment 9 Chuck Ebbert 2007-11-13 22:17:31 UTC
In 2.6.23.1-28

Comment 10 Jan Gutter 2007-11-23 08:17:43 UTC
Although I'm not a Red Hat or Fedora user, I've been bitten by the selfsame bug.
This patch definitely fixes the problem.

Just a small query: the likely/unlikely logic flips around because of this in
sched.c. Would it be prudent to amend it, or is that too invasive for "stable":

--- ./kernel/sched.c~   2007-11-19 10:37:44.000000000 +0200
+++ ./kernel/sched.c    2007-11-19 10:37:44.000000000 +0200
@@ -1988,7 +1988,7 @@
        int i, scale;

        this_rq->nr_load_updates++;
-       if (unlikely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
+       if (likely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
                goto do_avg;

        /* Update delta_fair/delta_exec fields first */