Red Hat Bugzilla – Bug 340161
system crash on kjournald with div64_64
Last modified: 2007-12-07 18:35:57 EST
Description of problem:
After a couple of hours (or, when lucky, days) the system hangs with a dump of a
process on screen. It doesn't show a 'kernel panic' and does not restart after 5
seconds (though I have that set in sysctl.conf).
This is occurring since the last month in shortening intervals.
Version-Release number of selected component (if applicable):
I am running an up-to-date fedora 7 system with kernel-2.6.22-9.91.fc7. I have
had this since a number of kernels though, so this version is not explicitly the
Unknown. My last change to my system is adding a Promise SATA card to my P3
system, with a SATA disk attached.
Steps to Reproduce:
2.wait.. (sorry, no way to actually reproduce)
... no system hang ?
I will attach a couple of pictures I took of my system when it crashed. You will
notice that it varies to crash with the md1_raid1 process or kjournald.
I have run a memory test to verify that my memory is functioning properly.
Created attachment 232821 [details]
crash 1, crashed on md1_raid1, not kjournald
Created attachment 232831 [details]
This crash is on kjournald, not sure if it matters
00:00.0 Host bridge: VIA Technologies, Inc. VT8605 [ProSavage PM133] (rev 81)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8605 [PM133 AGP]
00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 1b)
00:04.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:04.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
00:04.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 20)
00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
00:10.0 Mass storage controller: Promise Technology, Inc. PDC40775 (SATA 300
TX2plus) (rev 02)
00:11.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet
Controller (rev 05)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400/G450 (rev 85)
for what it matters, my cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 803.448
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36
mmx fxsr sse up
bogomips : 1608.22
clflush size : 32
Created attachment 250651 [details]
crash on kjournald, kernel 126.96.36.199-10.fc7
With the newest kernel, 188.8.131.52-10.fc7, also crashes, here's a new crash dump.
Created attachment 250661 [details]
The dmesg of the machine
You can work around this problem in the kernel by adding this to /etc/sysctl.conf:
kernel.sched_features = 21
This will disable precise CPU time accounting, a feature that is already removed
Now disabled by default. Could be re-enabled by a user but that's not very likely.
Thank you for the response. I have applied the suggested change. Unfortunately,
only time will tell me if this is the solution. I do have good hopes though. I
will respond within two or three weeks with my result, with which I will close
the bug if that's okay.
Although I'm not a Red Hat or Fedora user, I've been bitten by the selfsame bug.
This patch definitely fixes the problem.
Just a small query: the likely/unlikely logic flips around because of this in
sched.c. Would it be prudent to amend it, or is that too invasive for "stable":
--- ./kernel/sched.c~ 2007-11-19 10:37:44.000000000 +0200
+++ ./kernel/sched.c 2007-11-19 10:37:44.000000000 +0200
@@ -1988,7 +1988,7 @@
int i, scale;
- if (unlikely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
+ if (likely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
/* Update delta_fair/delta_exec fields first */