Description of problem: After a couple of hours (or, when lucky, days) the system hangs with a dump of a process on screen. It doesn't show a 'kernel panic' and does not restart after 5 seconds (though I have that set in sysctl.conf). This is occurring since the last month in shortening intervals. Version-Release number of selected component (if applicable): I am running an up-to-date fedora 7 system with kernel-2.6.22-9.91.fc7. I have had this since a number of kernels though, so this version is not explicitly the cause. How reproducible: Unknown. My last change to my system is adding a Promise SATA card to my P3 system, with a SATA disk attached. Steps to Reproduce: 1.Bootup 2.wait.. (sorry, no way to actually reproduce) 3. Actual results: system hang Expected results: ... no system hang ? Additional info: I will attach a couple of pictures I took of my system when it crashed. You will notice that it varies to crash with the md1_raid1 process or kjournald. I have run a memory test to verify that my memory is functioning properly.
Created attachment 232821 [details] crash 1, crashed on md1_raid1, not kjournald
Created attachment 232831 [details] This crash is on kjournald, not sure if it matters
I forgot: my lspci: --------------------------------------- 00:00.0 Host bridge: VIA Technologies, Inc. VT8605 [ProSavage PM133] (rev 81) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8605 [PM133 AGP] 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 1b) 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:04.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 0e) 00:04.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 0e) 00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 20) 00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 00:10.0 Mass storage controller: Promise Technology, Inc. PDC40775 (SATA 300 TX2plus) (rev 02) 00:11.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) 01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400/G450 (rev 85) --------------------------------------- for what it matters, my cpuinfo: --------------------------------------- processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 6 cpu MHz : 803.448 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36 mmx fxsr sse up bogomips : 1608.22 clflush size : 32 ---------------------------------------
Created attachment 250651 [details] crash on kjournald, kernel 2.6.23.1-10.fc7 With the newest kernel, 2.6.23.1-10.fc7, also crashes, here's a new crash dump.
Created attachment 250661 [details] The dmesg of the machine
You can work around this problem in the kernel by adding this to /etc/sysctl.conf: kernel.sched_features = 21 This will disable precise CPU time accounting, a feature that is already removed in 2.6.24.
Now disabled by default. Could be re-enabled by a user but that's not very likely.
Thank you for the response. I have applied the suggested change. Unfortunately, only time will tell me if this is the solution. I do have good hopes though. I will respond within two or three weeks with my result, with which I will close the bug if that's okay.
In 2.6.23.1-28
Although I'm not a Red Hat or Fedora user, I've been bitten by the selfsame bug. This patch definitely fixes the problem. Just a small query: the likely/unlikely logic flips around because of this in sched.c. Would it be prudent to amend it, or is that too invasive for "stable": --- ./kernel/sched.c~ 2007-11-19 10:37:44.000000000 +0200 +++ ./kernel/sched.c 2007-11-19 10:37:44.000000000 +0200 @@ -1988,7 +1988,7 @@ int i, scale; this_rq->nr_load_updates++; - if (unlikely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD))) + if (likely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD))) goto do_avg; /* Update delta_fair/delta_exec fields first */