Red Hat Bugzilla – Bug 109497
Default Fedora Core 1 SMP Kernel Hang on Dual Xeon System
Last modified: 2015-01-04 17:03:49 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031016
Description of problem:
I've installed Fedora Core 1 in a new Server System having the
following hardware configuration:
Motherboard: ASUS PR-DLS533/2GBL/SCSI1030
CPUs: 2 Intel Xeon @2,4Ghz (with MultiThreading Enabled)
RAM: 512MBx2 ECC Memory (1024 MB Total)
For a full detailed MotherBoard hardware list:
The system boot cleanly and seem works right, but after some minutes
or hours (in random mode) the system completly crash. The network IP
address of the server become unreachable, but if i go to server
console the keyboard seem response, but if i press the Caps Lock, Num
or Scroll Key the led is not lighting. I can login, but as soon as
lanch a program (like top) the system hang completly, hard reboot is
If i boot the system using the kernel-2.4.22-1.2115.nptl (non SMP
version) the system doesn't have any instability problem.
Before post this bug report i tryed everything:
I installed kernel 2.6.0-test9 in smp mode and used it for hours
without crash or strange things.
So, i tryed the vanilla 2.4.22 kernel compiled with smp support (it
show 4 MultiThreading CPUs).
Used the stress test for 2 hours whitout any crash, the system runs
the stress line command used is:
# stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --hdd 2 --timeout 120m
Stress is a tool downloadable from:
I hope these descriptions help to solve this bug.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Boot using kernel kernel-smp-2.4.22-1.2115.nptl
2. Use the system
3. After some minutes the system completly hang
We've had similar freezes on 4 different SMP machines. These machines
have either dual pentium III and VIA82C686 chipset, or dual Pentium4
Xeon and Intel E7500 chipset. System freeze oocurs typically within
half day unrelated to any particular activity. All machines were
running stable under various redhat 9 kernels.
Here, 2 P4 Xeon - one CPU, with hyperthreading - freeze after some hours
using fedora 1 final.
One of these machines - now with hyperthreading disabled - is running
since 20 hours without problems (And with 5 hours "stress"-ing).
I encountered such a random freeze with fedora beta2 on one simular
machine. But with fedora beta3 all runs fine on 5 of these machines
for about 3 weeks!
Just before im leaving for weekend:
One hyperthreading host is running for more than 4 hours
with "stress" nearly all the time ....
Maybe the kernel switch "noapic" has caused this ...?
$ cat /proc/cmdline
ro root=LABEL=/ hdd=ide-scsi rhgb noapic
$ uname -r
Dual P3 Here, with 2 GB ram, Board Tyan.
SMP enabled, noapic not present, and the system hangs in about 2 days
Hangs expecialy when accessing mountpoints of samba exported from
other sistems to this one
back from weekend and just after 5 minutes load the machine hangs:
Thus: "noapic" does not help
12:19:32 up 1 day, 15:58, 6 users, load average: 1.40, 1.54, 1.64
ro root=/dev/hda2 noapic
And still up... probably will hang after sending this mail :(
Will keep reporting
I'm seeing the same problem as well with the nptl smp kernel, just
crashed about 1/2 hour ago. Dual pentium III 650Mhz. Thought it
seemed to be X related, since most all of the hangs were when loggging
in under Gnome. Switched to runlevel 3 last night, but did have vnc
running. It hung right after connecting from work.
I will switch to non-smp tonight and do some testing to see if it helps.
I've found this on the fedora mailing-list:
On Tue, 2003-11-18 at 04:43, Joseph M Bironas wrote:
> I'm not sure. I know that I'm getting two processors to boot now. Top
> reports both processors, and I get two little penguins on FB boot
> -always an important indicator.
I think this could be the CONFIG_NR_CPUS patch that's causing this.
It makes the assumption that APICs are contiguously numbered,
so if you have them sparsely numbered, you could end up with some
of them being unused.
This would explain several similar bugs in bugzilla too.
For those that want to test, setting this option to 32 should
restore to the previous behaviour.
My system hanged again, 10 mn after mounting an network filesystem.
It had been running for almost 4days at 90% Load.
I've build a stock 2.4.23 kernel with smp support.
This was done on 11/30/2003, and the system was booted from this
kernel around 12:00pm. Have several SMB mounts and also have X and
Xvnc running. The system has been stable now for over 33 hours, which
is a first for an SMP kernel. I ran stress today for quite a while,
and completely build a new kernel (in an xterm, under Xvnc) to give
the system a workout. This was done with both X and Xvnc running as
well as the SMB mounts active. Also tried several times stopping /
starting X which would usually cause the freeze, but has not done so yet.
I've probably just jinxed myself.
Hmmm the problem is really at a netfs/kernel support level.
I'm only running local filesystems and the system seems rock solid
20:32:25 up 6 days, 4:35, 5 users, load average: 2.16, 2.86, 2.34
Installed the new kernel-smp-2.4.22-1.2129.nptl yesterday, in the
morning the machine was frozen as usual.
A problem with nfs seems likely to me too, we've had lots of freezes
overnight when the machine is not doing anything except for an
occasional mount/umount. Typically the last syslog message is from the
Created attachment 96334 [details]
serial console logfile
Here's a me too. Single 3.2GHz P4 w/hyperthreading enabled. If I boot the
UP kernel, system runs for days. If I boot the smp kernel, system doesn't
last more than 1 day. When it locks up, I can still ping it, but I can't do
anything else. I attached a serial console, no error messages or anything.
I'll attach the output of SysRq's for showMem, showPc and ShowTasks, hopefully
ps This is with both the 2115 and 2129 kernels.
My system hangs after i change Netvault Server settings, totaly hang.
I also have nfs and i'm not sure if it was running smp, but that is
first in boot list, (machine is att customers place).
After updating to the new kernel, the problem still remains...
If i dont have any netfs mounts, the system runs like girbratar... a
It does appear to be netfs related, and in the RedHat specific
kernels. I switched to a stock 2.4.23 smp kernel 8 days ago and the
system has been up continuously since then. I use it to backup a
couple of other W2k boxes over SMB shares daily, so there has been
considerable network related file activity. I have no NFS mounts.
I belive the problem beeing that, samba nfs mounts have a timeout
If you have an samba mount too long inactive and if you try to access
it, either beeing in a df or anything else, the kernel will send an
retry connection. (You can see this using dmesg after the df in that
I dont know why, but the kernel, after a wille, is not able of
sending this retry, and the system hangs in a kernel panic, as it
would if a physical disk would fail.
I have a similar bug, filed as #111527 which after careful of reading
of this bug is likely the same bug. System is Dell PowerEdge 1650
with dual 1.4ghz PentiumIII CPUs. Crashes happen on mount/unmount
during boot and shutdown. Local filesystems only. I'll try some
additional stressing of mount/unmount after business hours today.
System has been rock stable on the non-SMP kernel.
I can duplicate this problem on a Dell PowerEdge 2650 Dual Xeon 2.4GHz. The
system hangs after 'Probing Modules' during bootup when using the smp kernels (
2.4.22-1.2115.nptlsmp and 2.4.22-1.2129.nptlsmp ).
The non-smp kernels ( 2.4.22-1.2115.nptl and 2.4.22-1.2129.nptl ) seem to be
Same problem here with a fresh install of Fedora Core 1 + all updates.
System is a Dell PowerEdge 2650 with dual 3.06 GHz Xeon processors and
2 gig of RAM. Any attempt to boot SMP will freeze solid, usually at
"Mounting local filesystems" or "Enabling file system quotas."
This is a dev system so if there is a test kernel to try I will be
happy to give it a shot. When I get a chance I'll also see if I can
get some debug info using serial console.
Some descriptions of this bug indicate it may also be related to
#109962, in which SMP kernel hangs during unmounting of filesystems in
Tried the 2332 and 2335 kernels from updates-testing.
2332 ran almost two days before hanging;
2335 oopses (attempting to kill idle task) on boot.
Ok the problem is definitely USB, at least in my case. I've also
reproduced this on a much older Dell PowerEdge with dual 600 MHz P3
processors. The common link between the two machines is the USB chipset:
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev
04) (prog-if 10 [OHCI])
Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
Flags: bus master, medium devsel, latency 32, IRQ 11
Memory at feb00000 (32-bit, non-prefetchable) [size=4K]
On both machines on which I've had the boot lockups the problem goes
away if I comment out all the usb controllers in /etc/modules.conf. So
it seems that on ServerWorks chipsets the OHCI driver is not SMP safe.
I've been running almost for 19 days.... and no hangs... but no nfs on
any mount point....
Another me too (2xpIII 700 on tyan mobo). And yes, it seems
definitivly smb and/or nfs related
(hang happens with both services)
Another "me too" and like others I suspect NFS. System is dual Xeon
2.4GHz, 2GB RAM, M/B is SuperMicro X5DAE. Home directories are
mounted via NFS from elsewhere. Samba is running, but nothing is
using it at present. Nothing that looks like a "crash," i.e., no
"oopses" even when it hangs at the console. Most recent hang was on
reboot; it hung while shutting down autofs.
I've just installed kernel 2135 and will see what happens now.
Oh, I've also usually had setiathome running, and one hang took place
within minutes of it being launched. The setiathome binaries live in
my home directory which is mounted via NFS.
Just added the new kernel 2135 and will see what happens now.
The systems is now fully loaded... if anybody has any problems please
let me know... i dont want the cursed machine crashing while i'm 50
miles from it....
kernel 2135 has the same problem :-(
FWIW, I have anothoer SMP machine that is mostly idle but which also
has *one* network mount (NFS) which was mounted manually. It's in the
process of being set up as a new firewall. Given that it is totally
idle, it's hard to really compare, but it *has* been up for 19 days.
Last item related to network mounts...I have had a large number of NFS
timeouts since installing FC1. I get the messages "nfs server foo not
responding" then an okay message. The nfs server is a RH8.0 host. I
never saw these either with RH9
kernel 2140 feezes too
Dell 2550 - dual pIII, NFS, autofs and samba in use. The last message in
log always expiring an NFS mount samba has initiated.
Update - moved my samba server to a Dell 2650 - dual P4 Xeon and it hung
within ten minutes using 2140. Both of these server have been running
NFS and autofs. Seems to be realted to Samba using NFS.
I have switched the 2650 to the single CPU 2140 kernel and will see
how things go.
Both of these machines have dozens of NFS exports which serve a bunch
of SGI workstations without problems (except when the server freezes)
It seems that when a samba share uses an NFS/autofs mounted filesystem
. When the mount times out and then samba tries to access the share
again things go south.
As I said before the last message in my logs is always a mount expiring.
Dual Pentium (old MMX from 1996) with 2135 smp kernel (compiled from
src rpm, no modifications to configuration) also fails. A few points here:
1. I've got USB UHCI so it is not OHCI driver as someone suggested.
2. I do not have any nfs or samba mounts.
3. I do have one nfs exported directory, but according to log it was
umounted over 90 minutes before machine freezed.
4. 2 setiathome processes were running and probably were not yet
ready to sent work results (over 10% work left which on this
machine means over 4hrs).
5. After freeze machine was fully responsive to pings (ping flood on
local 100Mbit network didn't drop a single packet out of well over
6. Initial tcp connection sequence seemed ok, the problem probably
kicked in when application (httpd, sendmail or ssh) was supposed
to do any work.
The machine survived over 9 days, but it is not stressed at all
(occasional email/www traffic and nfs access every few days). I don't
know if UP kernel causes any problems as I havn't really tried it.
I also don't know if console access was possible (no console
connected and server far away from here).
Hope this helps.
Just tried the 2.4.22-1.2149.nptl kernel on my dual Xeon Dell
Precision 650n and the kernel hangs at boot time when loading modules.
It has done this consistently for all the Fedora kernels. I am
currently running the non-smp kernel and it works just fine.
Max, is this with acpi=on, or left at default ?
It'd be interesting to know if its the same module each time causing
the problem. Moving /sbin/modprobe to /sbin/modprobe.real and hacking
up a /sbin/modprobe script to..
echo loading $1
might give us the culprit..
I have a dual 2.2 XEON (hyperthread disabled) with FC1.0, kernel 2115.
It worked for over 25 days with NFS and SMB. It finally hung last
night after I disconnected the cable to the NFS server and did a DF.
Message was posted to the Fedora mailing list. I am upgrading to 2140
kernel and will re-test (and will use Stress). Please CC me on
Dave, with the modprobe script and acip=on, the SMP kernel
consistently hung at "Initializing USB Interface." With the standard
modprobe executable it failed a few lines down at "Finding module
dependencies." In the past it has hung at various other places as
well. SMP works fine on this machine with Redhat 9.0, however, I'm
sticking with Fedora and currently can only run on one processor with
the non-SMP kernel (2.4.22-1.2149.nptl).
Well, it's not samba related (no surprise) another server running 2140
hung over the weekend. My other 2140 machine has been running for four
days now with single processor kernel (with samba :-) )
Another thing, the load does not appear to matter. Since we are all
just coming back from the holiday here at school our server load
values are at or below 1.0 and yet we are still getting the freeze.
Also, hyperthreading does not seem to matter -- on or off.
perhaps #109463 gives a workaround? (disabling USB-Support via
I tried disabling usb. First tried disabling via modules.conf and then
in the BIOS and it still froze. ;-(
My problem seems to have something to do with NFS or AutoFS. Anyone
listening in that is using these (not someone with one export) and NOT
having the system freeze problem. Because it seems that there are at
least two different problems being described here.
When these hangs occur, is it totally dead (as in even pushing numlock
doesn't change the keyboard LED) ? If it isn't getting backtraces will
1) Turn on Magic System Request Keys....
(i.e. echo 1 > /proc/sys/kernel/sysrq)
2) Alt-SysRq-p to show where the processors are
3) Alt-SysRq-t to show processes states
Additionally, ctrl-scrolllock will do backtraces, and shift scrolllock
will show current memory states
Oh, one other thing, the backtraces will produce a lot of output,
which won't fit on the screen. alt-sysrq-s will sync the drives
(to make sure its in the logs), alt-sysrq-u will umount the partitions
and alt-sysrq-b will then reboot.
Okay, just restarted a couple of machines that have been running the
single processor 2140 kernel without problems. Well, for four days anyway.
Dell-2550 2xPIII w/Fedora-2140smp
Dell-2650 2xP4 Xeon w/HT enabled w/Fedora-2140smp
sysreq enabled on both and both clients and servers for NFS, both
using autofs (version 3), The 2650 has some samba shares as well that
are actually NFS automounts.
booting with nmi_watchdog=1 may also be useful if the machines really
are stuck when they hang.
SOLUTION!! -- I believe I have found the solution...
Fedora Core 1
Dual Xeon 2.4GHz
I found that the issue was in the advanced power management for the
processors. Leads on this path were made previously in this forum
discussion, but I have taken the full step. There are two areas where
APIC (the advanced power management system) must be disabled. The
first and foremost is in the BIOS. The second is in the kernel load
To disable APIC in BIOS, restart your system and hit whatever key
brings you into BIOS. Then, just look around for APIC in all of the
settings and turn it off.
Next, startup a working non-smp kernel and edit your grub config
file (/boot/grub/grub.conf). On the smp enabled kernel, add noapic to
the end of the kernel line. Example: kernel
/vmlinuz-2.4.22-1.2149.nptlsmp ro root=LABEL=/ hdd=ide-scsi rhgb noapic
This repaired all issues for me. I certainly hope that this will be
the solution for many others to come until they make whatever needs to
be compatible with APIC compatible.
Sorry, but there is something I don't understand. APIC stands for
Advanced Programmable Interrupt Controller as far as I know, opposed
to ACPI which is Advanced Configuration and Power Interface. Which one
do you mean? And I am quite sure you can't disable APIC in BIOS...
For the record - I have Pentium WITHOUT ACPI (but with APIC) and still
One of my servers died within four hours using 2140smp. SysRq reports
Restarted using single processor kernel.
Then, when restarting my other test machine to go back to single
processor it HUNG also!! It stopped during the Stopping automounter
step of the shutdown sequence. The SysRq keys showed the hang in the
SAME PLACE as above.
The other thing was that the second server that did not hang last
night (but did during the reboot) had a load average of over 200.
normally its hovers around 1.
Hope this helps debug this particular problem.
Also, disk sync, unmount and reboot magic keys did not work after the
hang. Both machines had to be powered off.
Update on my system (dual xeon 2.2G, 2G RAM ATAPI soft RAID). I
updated to a stock 2.4.24 kernel (made with gcc 3.3) and it has not
crashed, yet. I am re-making the kernel using gcc32 (recommended in
What steps should those of us following this bug take to try to gather
more data on it? Should we each report our MB, processor, RAM,
kernel, etc., or does it seem to be a more generic bug? Is anyone on
the linux kernel mailing list following this? Does this bug seem
specific only to Fedora users? Is it specific to only Fedora kernels
and not stock 2.4.20-24 or 2.6 kernels?
Note: I also posted this to bug 113148 which I recommended be marked
as a duplicate of this bug.
There has been something similar upstream, which could be the same
*** Bug 113148 has been marked as a duplicate of this bug. ***
Do you think it would be useful/productive to try the proposed patch
discussed in that thread?
If so, what would be the best way to proceed - apply to 2140?
Created attachment 96992 [details]
Please try this version.
Probably obvious...but attachment will not patch against the inode.c
that is part of the 2140 kernel.
ah, it relies upon the recent refile_inodes change in 2.4.25pre.
which in turn, depends on the -aa VM changes.. needs rediffing.
A little more legwork than I have time for. I think I am going to sit
tight on the single processor kernel until an updated kernel package
Sorry about the confusion,
ACPI must be disabled in BIOS and APIC must be disabled in the
bootloader config. Disabling ACPI in BIOS made it so that the system
only saw two processors like it should instead of four like it was
doing. Disabling APIC stopped the system from hanging.
I tried kernel 2115,2129 and now 2140.
2140 has not hung yet.
I have 2 almost identical system only motherbord is diffrent.
Both is ASUS from same serie but on has the addition with integreted
On both i run D-Link DFE-530TX network cards, and not any Gigabit.
The machine with not Gigabit is still runing 2115 and has NEVER HUNG
rock solid, it run's as a NFS SERVER.
This machine alway's has IDLE or 100%-99% and almost no load.
(both machine is NFS server and server two HPUX machine int 2 diffrent
The Gigabit machine has always 0% IDLE and always 1.00 in load or
if i close every service and only run text mode and no network, some
in "top" is shows like this:
CPU states: cpu user nice system irq softirq iowait
total 0.0% 0.0% 0.0% 33.3% 33.4% 33.1% 0.0%
I been looking at this bug since it started.
In one way i point's to NFS but why does i have NO problem with
HPUX version on one linux, problem with the other.
Some times it hangs (gigabit one) when i start or stop NETVAULT
never on the other.
For me it point on something connected to SMP functionality even if
inside a single processor kernel, or some thing with usb.
A question, Those the has problem what does there TOP says?
Because this is the only place where i can se any diffrent.
[root@PBKSE-BS16 root]# uname -a
Linux PBKSE-BS16 2.4.22-1.2140.nptl #1 Tue Jan 6 20:20:43 EST 2004
[root@js_volga root]# uname -a
Linux js_volga 2.4.22-1.2115.nptl #1 Wed Oct 29 15:42:51 EST 2003
Disregard my patch. It fixes a problem in a 2.4.24 patch which isn't
actually included in the FC1 kernel anyway.
I also have been having major problems with the Fedora SMP kernel.
I used to use RedHat 6.2 for many years and it was rock solid and
stupid me decided it was time to upgrade since I had the machine down
to fix some failed CPU fans and a hard drive anyways.
I have a AMI Goliath board with Quad 200 Mhz 256k L2 PPRO processors
installed. Uses dual Orion chipsets. The system has 1 gig of ECC EDO
DRAM. There is a AMI MegaRAID Express 300 card running in native mode
(I2O mode crashes Fedora on bootup), a Intel Dual EEPro Server Net
adapter, and a N9 Imagine Series 2 video card. So its not just new
fancy machines with this issue. I have no USB, APM, APCI, or any of
the latest interfaces. So those couldn't be the cause of this problem.
Yes its an old system, but with all 4 CPUs going, it is very
responsive for my needs. Until I installed Fedora that is. Now I
cannot seem to keep the machine up. It will always boot and work for
awhile and then some random time later I find the machine completely
locked up. No display, no keyboard response, unable to connect over
the network. Its completely frozen. None of the system logs show
anything that hints what happened.
I did try running the single CPU kernel and everything has been
stable for 2 days now, albeit quite a bit slower having only one CPU
to run everything on. This machine is useless until this bug gets
Forgot to add one thing. I did find a way to make the bug surface
extremely quick on my system. Have a ftp server running, in this case
vsftpd, log into the ftp from a remote machine, proceeded to upload a
huge file. The SMP kernel completely locks up exactlly the same way I
find my machine randomly locked up in the past. No video, keyboard
lights all off, num lock no longer works. This difference is I can
get it to lock up within a few secs everytime and repeatable when
uploading a large file through ftp.
The latest smp kernel version 2140 also crashes on my machine.
For those looking for quick fix - you may try the latest errata kernel
for RedHat 9 (2.4.20-28.9smp). Works for me for over 72hrs now (but
as I mentioned earlier - the machine is not very stressed).
Could you please turn on the Magic System Request Keys
as descripted in Comments #39 and #40 of this thread.
Then post the output of Alt-SysRq-p and Alt-SysRq-t commands
when the system hangs.
Hate to add a me too but I have 3 dual Xeon machines and they reach
the totally dead state within minutes of boot..... even the numlock
Hang occurs on a single Xeon 2.4GHz Dell 2650 wiht hyperthreading
enabled and SMP kernel. Non-SMP kernel runs fine.
I read Jerry's comment about HUGE files on SMP.
I run single bur the newer MotherBord has som type of (or it's cpu)
SMP functionality because it wants to install SMP kernel.
I also has HUGE files, i run via NETVAULT a Virtual Tape Library
where every file is 10GB i have around 25 and 6 can be loaded in the
same time. Also around 20-50GB of data is transferd as a quick backup
of a HPUX system over NFS (around 120-150 files), With the .2115
kernel it hangs randomly, with .2129 hangs almost like every 10th
time i restart service for NetVault. with .2140 i don't know it has
not run for 2 days yet i'll know a few weeks from know.
And as wrote before i have exactly duplicate system with only 6 month
older motherbord and that has no integrated gigabit card. and that is
rocksolid with .2115
BTW i only run single and not SMP, but it's seams to be connected to
So far so good... My single Xeon 2.4GHz Dell 2650 (Serverworks
chipset) has been running kernel-2.4.22-1.2149.nptlsmp has been
running with USB disabled for 15 hours without problems. See comment #23.
I would capture the sysreq back traces if I could, but my system is
in a hard locked state. The keyboard no longer works for me to press
the keys. I wish there was some way of capturing the system state
when it crashes, but I have found no way in doing this.
Tried the 2149 kernel, it crashes too.
I just found a copy of the new kernel ...
Has anyone tried it?
...2149 is buggy on the following server:
HP Proliant DL 580
4 2.5 GHz Xeon
2 Gig RAM
4 72 Gig HD
Problems with X Windows (irratic behavior) and unable to
I have also disabled the ACPI in the services applet and still have
There's a large number of changes in the -testing kernel (2163) which
may fix this. It's something of a sledgehammer to crack a walnut, but
that kernel puts us back in sync with mainline VM, and includes all
recent fixes there too.
Will try it starting tonight - we have a long weekend ahead here to test.
Just tried 2163 with my FTP test. Locked up immediately as before.
It looks like I had some keyboard control this time before it
completely went to a blank screen. Tomorrow when I have more time,
I'll try to get some backtraces.
Created attachment 97075 [details]
messages log from crash and using alt-sysrq-p/alt-sysrq-t keys
Looks to me like the trace is not complete. Saw more info pop up on my terminal
display than got recorded into the messages log file. I'm guessing the extra
info got lost in a disk cache somewhere and never made it to the messages file.
New development here. It appears, that the problem is also been
reported when using removable storage devices.
I have several Iomega Jaz devices who are locking the same way (atempt
to access after timeout of the device).
So it now appears, that the problem may not be directly connected, to
the NFS style mountpoints, but to any removeable or non-local storage
Bad news, 2163 hung in the same spot as before. That is SysRq<p>
reports that cpu1 is sitting in .text.lock.inode
The task list (that I can see) shows umount running(heh)
here is a call stack I managed to copy
Back to single processor kernel for now...
More bad news ..... 2163 is just as bad. My test systems were all
hung this morning when I came in. They also still exhibit many of
the behaviors as before. <sigh>
What can I do to better determine the issues with the kernel? Can
anyone e-mail some procedures I can do to try to narrow these issues
did you try the nmi watchdog as mentioned in #42 ?
Thanks for the hard work you fellows are putting in on this problem
but 2163 is still failing for me on a Dell Pecision 650N (dual Xeon).
On boot, it now gets past the USB init but either hangs on module
dependencies or setting local filesystem quotas. On my single
processor Dell XPS system (Xeon), the SMP kernel fails with a DMA
timeout for my SATA drive. Both systems work well with the non-SMP
email@example.com wrote about problems with any mounts.
And it seams to get back to this again and again (plus some usb).
But i get same with single kernel also, and i run a totaly single cpu
I have as i said before: 2 system running NetVault.
Both mounts logical devises.
I have a single kernel, but experiens same as you all when you all
On the machine that is stable as a rock when it finaly starts runs
The new hanged today when i restarted NetVault for the second time,
this machine runs 2149.
Both machine is ASUS P800 Deluxe, but the one with 2149 is newer.
How can i get same situation with TOTALY keylock or some times
blinking chaps lock light and scroll lock with the single kernel.
If i try the smp, it goes dead directly.
It could also be the driver for teh Adaptec SATA raid, but then why
does not the other (2115) hang?.....
Sadly, I too am having this problem, even with the 2163 kernel. I have
a dual-proc Xeon 2.0GHz machine, hyperthreading disabled. I have
several traces made with sysrq-t while the machine is frozen; I'll
attach them here.
I've found that the best way for me to reproduce them is to run the
system monitor applet in my GNOME panel. It often seems like that's the
process sitting in .text.lock.inode (it's called multiload-applet).
For the record, we're having similar freezes on RedHat 7.1 machines
running a more recent 2.4.20 kernel--I have no idea if the cause is
related, but the symptoms are very similar. I don't have a sysrq-t
trace for one of those yet.
thanks for all of the work you're doing!
Created attachment 97201 [details]
sysrq-t trace of a frozen machine #1
with 2129 kernel
Created attachment 97202 [details]
sysrq-t trace of a frozen machine #2
with 2129 kernel
Created attachment 97203 [details]
sysrq-t trace of a frozen machine #3
with 2163 kernel
I upgraded to 2163 on my SuperMicro dual XEON (HT enabled) and it
seemed stable for nearly a week. However, when I took a box to a
customer's to be integrated, after 1 day it promptly locked up (looks
Do you think a stock 2.4.24 kernel would be better at this time (I am
resisting moving to another disto this weekend).
Another hang... again in a removable media. Does anybody have the same
problem with removable medias? The media had journaling filesystem.
Will remove the Journaling Filesystem, and format it, with a non
journaling filesystem and see what happends.
Ah by the way it hang too with nfs ;)
I have dual Xeon 2.8 4G RAM on Intel SE7501HG2 board. Fedora Core 1
with LTSP 4.
I tried all Fedora kernels up to 2.4.22-1.2149.nptlsmp. All of them
gived up without ANY error message after 6-12h uptime. Server load
was not important. It can hang with heavy load and without any load.
I tried all suggestions from this thread (noapic, acpi, nousb, etc.)
with no success.
With self-compiled 2.4.23 kernel my server had 6 days uptime.
Now I try self-compiled 2.6.1 smp kernel. So far 21 hours uptime.
Better than original kernel.
I've tried during the weekend to re-criate, the error that lock up my
machine with removable disks. If there is no logging then the machine
will not lockup. So it would appear, that the removable /
not-on-machine storage module in the kernel is having problems.
It would appear that, the kernel, is not able to differ between non
removable and removable, and by that reason, locks out, as it would if
a physical disk went bad.
I updated to 2149 version on dual SMP dell PE2650 2.8Ghz HT, and now
we have not freeze, the USB parameters were disable at BIOS. More
The PCI boot logs says:
ACPI: RSDP (v000 DELL ) @
ACPI: RSDT (v001 DELL PE2650 0x00000001 MSFT 0x0100000a) @
ACPI: FADT (v001 DELL PE2650 0x00000001 MSFT 0x0100000a) @
ACPI: MADT (v001 DELL PE2650 0x00000001 MSFT 0x0100000a) @
ACPI: SPCR (v001 DELL PE2650 0x00000001 MSFT 0x0100000a) @
ACPI: DSDT (v001 DELL PE2650 0x00000001 MSFT 0x0100000a) @
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 Pentium 4(tm) XEON(tm) APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
Processor #6 Pentium 4(tm) XEON(tm) APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
Processor #1 Pentium 4(tm) XEON(tm) APIC version 20
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled)
Processor #7 Pentium 4(tm) XEON(tm) APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x01] polarity[0x1] trigger[0x1] lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] polarity[0x1] trigger[0x1] lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] polarity[0x1] trigger[0x1] lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] polarity[0x1] trigger[0x1] lint[0x1])
Using ACPI for processor (LAPIC) configuration information
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: DELL Product ID: PE 0121 APIC at: 0xFEE00000
I/O APIC #8 Version 17 at 0xFEC00000.
I/O APIC #9 Version 17 at 0xFEC01000.
I/O APIC #10 Version 17 at 0xFEC02000.
Just tried 2163 for the first time on my dual PIII. Crashed overnight
as usual. alt-sysrq-p shows something like
comm: ypserv CPU: 0 EIP at.text.lock.read_write
alt-sysrq-t shows as last entry
Call Trace: invalidate_inode_buffers invalidate_list nfs_fs_type
invalidate_inodes nfs_sops kill_super sys_umount ...
Same here, one 3 different machines (all SMP) and all Fedora Kernel
until 2163. Same with custom 2.4.24 (gcc33).
Was ok with Redhat 7.3. Seems to be ok with Debian.
Kernel 2.6 doesn't boot.
This is just a "ME TO" note.
I've been getting this hang in text.lock.inode on our systems since I
upgraded to FC1. From what I can tell, the hang occurs when either the
amd automounter would make an umount call or when the system is
shutting down and attempts to unmount a filesystem.
My systems include a 4 processor Xeon and a 2 processor Xeon.
Also I have a 16 node beowulf cluster of single processor AMDs. These
also lock up intermittently, or when restarted, in text.lock.inode.
These were running the smp kernel. I've now started the nodes with a
non-smp kernel to see what happens.
The latest kernel I'm using is 2.4.22-1.2149.nptlsmp but the hang has
happened on all the Fedora kernels I've tried since I started using
FC1 in Dec last year.
Perhaps a silly question, but are the upstream kernel developers aware
that a umount seems to be the most reliable way of triggering this
hang? I don't get that impression from the kernel thread mentioned above.
I know have a system that I can reliably hang. I've got:
Quad processor Pentium III (550MHz), 2 Gig of memory
kernel /vmlinuz-2.4.22-1.2166.nptlsmp ro root=LABEL=/ nmi_watchdog=1
/proc/sys/kernel/sysrq = 1
If I run this:
while [ true ]; do
mount /dev/sdh1 /mnt
echo -n '['
echo -n ']'
The system will hang in less than a minute.
alt-sysrq P shows the processors in text.lock.inode, text.lock.namei,
nmi_watchdog does not seem to give any oops.
I will attempt to do a serial console tomorrow and get some more
If anyone has some suggestions, I'll be willing to try them.
I have 4 days uptime with self-compiled 2.6.1 kernel.
Some weeks ago I tried 2.6.0 but i couldn't compile it. I don't know
why because same .config file with 2.6.1 compiled perfectly now.
All FC1 kernels crashed within day.
Self-compiled 2.4.23 lasts 6 days.
to: Philippe - don't forget mkinitrd if your ext3 filesystem is not
compiled into kernel :)
The script in comment #90 reliably crashes my machine. It only happens
when the machine is booted with an SMP kernel and the noapic boot flag
Created attachment 97450 [details]
Created attachment 97451 [details]
Created attachment 97452 [details]
Created attachment 97453 [details]
The script in comment #90 is not as reliable as I stated. Once I got
my serial console going, it seemed to take a lot longer than before.
It did work after several minutes however. There was probably an amd
umount while the script was running that caused the hang (though I'm
Anyway, I've attached the sysrq details. These ones had the
nmi_watchdog enabled which I don't think has happened for any of the
other sysrq messages posted here and on Bug #109962
kernel grub entry looks like:
title Fedora Core (2.4.22-1.2166.nptlsmp)
kernel /vmlinuz-2.4.22-1.2166.nptlsmp ro root=LABEL=/
console=tty0 console=ttyS0,9600n81 panic=60 nmi_watchdog=1
Created attachment 97454 [details]
Previous sysrq-P was a sysrq-T, doh!
I've had enough... just compiled a kernel.org 2.4.24 kernel and the
system is running smoothly for almost 3 days.
I've tried to recreate the problem with de removable disks, and the
system didn't hang... Let's see for next days.
I am using NFS mounts and CIFS mounts... so if there is a problem, it
will show really soon.
kernel-2.4.22-1.2149.nptlsmp has been running happily here for the
last 26 days with USB disabled, but I can't test
kernel-smp-2.4.22-1.2166.nptl.i686.rpm until sometime next week.
Can anyone confirm (or deny) that this fixes the problems for them?
Disabling USB does *NOT* fix the problem most reported here. System
will still hang in text.lock.inode etc.
Have been running 2163 for 20 days now using single processor kernel
I have been experiencing the same issue on ~30 machines with FC 1
installed (hardware is single Pentium 4 with hyperthreading, 1 GB RAM,
SATA WD HDD, ASUS p4p800 MB). 2.4.22-1.2166.nptlsmp does not seem to
help, as I have still seen them hang over the last day since it has
been installed. I have *not* seen the crashes/hangs when booted into
the UP kernel. By the way, the script posted in comment #90 seems to
reliably crash all the machines within a minute or two with the new
Also experiencing same problems on "true" dual-CPU systems (dual Xeon)
when using the smp kernel.
The problem is with stock kernel 2.4.22 built for fedora.
Both machines running fedora, one with stock kernel, and the other
with the same version but compiled from source.
The compiled machine is running smoothly....
The system with the stock kernel, hangs about every two hours......
Has anyone tried with other kernel, besides the stocked mess?
I see the same problem on my dual-cpu Dell Precision 650. SMP kernel
locks hard (required me to pull the plug). Non-smp kernel runs fine.
I should add this is with the smp version of 2.4.22-1.2166.nptl.
Using the 2.4.24 kernel from kernel.org has not crashed in over a day...
Linux XXXX.XXX.XXX 2.4.24 #43 SMP Wed Feb 11 09:26:01 WET 2004 i686
i686 i386 GNU/Linux
22:02:26 up 2 days, 12:30, 8 users, load average: 1.58, 2.29, 2.39
With a LOT of smb and samba mounts.....
I'm still working on properly testing this idea now, but it looks to
me like it might be related to MRTG running as a cron job.
If I stop cron, the problem goes away. If I start cron, but remove
/etc/cron.d/mrtg, the problem goes away. I have had this system
running now for 4 days since I took away /etc/cron.d/mrtg.
I did try removing all the anacron entries, but this seemed to make no
I know this isn't an answer, but maybe it'll give someone a clue where
to look :)
> might be related to MRTG running as a cron job.
Wow, is that confirmed? At least thats the best idea i heard so far.
I have got MRTG runnig as a cron job, too.
not, it's probably not the problem. Beacause 1/ I don't have MRTG
running, and 2/ the crash test on comment #90 has nothing to do with MRTG.
I thought first the problem is caused by my "slocate" cron job, but
actually all filesystems I/O could crash the system.
Status update: All of our smp machines (>12 various mainboards and
processors) crash with all fedora kernels. Highest uptime achieved
with an fedora kernel was 5 days, machines with more nfs traffic hang
earlier. alt-sysreq-t (when possible) shows an nfs umount.
All these machines run autofs and are both nfs clients and servers.
From ruptime it seems that many crashes happen around 4am when daily
cron jobs are started; apart from the usual, we run autoupdate or yum
and on some machines a dsm backup (no mrtg)
Vanilla 2.4.24 seems to run stably on these same machines (e.g. 12
days uptime on my desktop).
I just got my Dell Precision 610 (Dual PIII Xeon 550MHz) booted up
with the 2.4.22-1.2166 nptlsmp kernel downloaded from the Fedora
updates. I booted the kernel by passing the options:
In addition, I had all power management support turned off in the
BIOS. I hope this helps out anyone who may be having issues. As a
general note, the option acpi=off can be extremely helpful in
resolving issues with systems. I find it fixes problems many times.
Linux XXX.XXX.XXX 2.4.24 #43 SMP Wed Feb 11 09:26:01 WET 2004 i686
i686 i386 GNU/Linux
11:06:05 up 7 days, 1:33, 8 users, load average: 3.04, 2.60, 2.31
18 NFS Remote mounts... 3 CIFS/SMB mounts.... and no crashes... The
problem is with the stock kernel from fedora.
Here is something new, that works for me. I have an FC1 installation
with RH EL kernelinstalled. You can get these from whitebox linux at:
I installed kernel-smp-2.4.21-9.0.1.EL and
I have run the test script at
for over 20,000 mount/umounts while doing an updatedb for slocate. Any
FC1 SMP kernel I have tried will hang before 500 mount/umounts. Most
times well before that.
You need to install the kernel packages with:
rpm --oldpackage -hiv kernel...
Wrong link to test script in last message. Should be:
Norman's script work's for me as well. SMP machines running fedora
kernels hang within 1000 iterations. Vanilla 2.4.24 has no problems.
I have a situation that looks to be the same or related to this bug.
Platform: IBM Intellistation Z-PRO (MSI Board) Dual 2.8 GH Xeon.
NVIDIA Quadro Pro 980XL, running in generic VGA mode (no NVIDIA driver)
OS: Fedora core 1 clean install
Symptom: On console login, keyboard becomes unresponsive in a cyclic
pattern; (~10 seconds dead/~10 seconds active ...). With rapid
keyboard activity system will eventually become unresponsive; no ping,
no keyboard response.
This is a lab system that is frequently re-installed and has shown no
such symptoms under RH 8.0, RH 9, RH ES 3.0, SUSE, or kernel.org
2.4.24 kernels. Symptoms appear immediately upon installation of a
fresh Fedora image.
Duplicated on 2.4.22-1.2115.nptlsmp and 2.4.22-1.2174.nptlsmp.
Corresponding Fedora non-smp kernels do not exhibit the symptom.
Telnet/ssh sessions do not exhibit this problem; only local console
(tty1-tty6) sessions do.
I have a system that exhibits the same problems. It's a Dell Precision
530. It hangs after a random amount of time with nothing in the logs.
The nmi_watchdog doesn't seem to help.
Just installing the smp kernel causes the non-smp kernel to oops when
loading the firewire driver. Adding the "nofirewire" kernel option
causes the system to hang elsewhere during boot. Removing the smp
kernel restores "normal" behavior in the non-smp kernel. This is on a
Dell Precision 620 (dual Xeon). Its simpler for the time being just to
run the non-smp kernel.
So is this the new business model? Break Fedora on high-end hardware
and force people to the enterprise products?
(More likely you guys are too busy with FC2 and 2.6 to track this
down. Hopefully, things will be better with the newer kernels...)
The script from comment #119 causes a hang here as well on several
different SMP systems. No problem when booting to the UP kernel.
Using 2.4.22-1.2174.nptl kernel.
Any chance of a fix for this?
I can also confirm this bug (or something that looks like it) on the
x86_64 release of FC1. I'm running dual operton 242s on a Tyan 2885
mb. Standard FC1 with updates from up2date (except I hand-updated
the XFree86 exe and the radeon drivers to 4.4.0 to fix another
problem). I have no NFS or samba mounts or shares, and no removable
media except cd/dvd drives which i don't use much. All files are on
storage attached to a 3Ware 8500-8. I'm running LVM (1.x), and /home
is about 1.5TB I think I have an automounter running, but it does not
do anything at this stage. Crashes occur usually when idle (I also
suspected cron at first, and have not totally ruled it out yet). It
took me ages to find anything about this one because I was looking
for an x86_64 problem rather than an SMP one! Looks to me like it is
time to try a hand-rolled kernel. Thanks to everyone here for some
ideas on how to proceed as I had almost run out. I can post more
details tomorrow if anyone else is having this problems with x86_64.
I can't ssh to the box tonight as it hung hours ago :-(
people have asked if this is fixed in later kernels - i have run the
script from comment 114 with a recent FC2 devel 2.6.3 kernel on a dual
P3 box and had no crashes.
Once FC2-test2 is released, I will repeat with NFS (I just got the
hardware allocated for that test and I don't have the time to do a
complete install for this when the test2 isos are going to be released
Dual Athlon Tyan S2466 AMD 2400+MP. Same issues, NFS crossmount via
AUTOFS which is the most likely cause of crash. Built vanilla 2.4.25,
no crashes since.
I'm experiencing the same problems with an HP/Compaq DL370 Dual
processor machine. We usually get a lockup within 15 days of a re-boot.
It's running the redhat 9.0 Distro with the following kernel.
Linux xxx.nerc-bas.ac.uk 2.4.20-20.9smp #1 SMP Mon Aug 18 11:32:15 EDT
2003 i686 i686 i386 GNU/Linux
We also have a DL360/G2 with two processors. It is not experiencing
any problems and it is running Redhat 8.0 with the following kernel.
Linux xxx.nerc-bas.ac.uk 2.4.18-14smp #1 SMP Wed Sep 4 12:34:47 EDT
2002 i686 i686 i386 GNU/Linux
I'm planning on creating a stock SMP kernel from source and testing
that on the ML370.
*** Bug 118990 has been marked as a duplicate of this bug. ***
I had almost given up hope that this bug will be fixed before we all
move to fedora 2, but it seems the new 2.4.22-1.2179.nptlsmp does it.
I now have six SMP machines with uptimes over 3 days, and the script
from comment #115 runs for hours without effect.
2179 dropped the low-latency patch which may have been related. Is
anyone here still seeing it with 2179 or later
No problems here with 2188; everything that used to trigger the
problem works fine now.
I have FC1 2188 installed on PE 2650 dual processor with aacraid.
SMP kernel hangs consistently during boot, UP kernel runs fine.
Tried suggested "noapic", "noapic acpi=off" solutions - no joy.
Sounds like this may have been two bugs - NFS mount problem fixed?
However boot problem still there for me.
I think the boot one is in fact different - want to open a new bug for
it and just reference this bug in the description ?
Sure, can do. Found that disabling USB in BIOS and modules.conf allowed
boot with SMP kernel on two servers, when SMP kernel hung 100% before.
Using "noapci acpi=off" corrected "cat /proc/cpuinfo", which was
showing 4 cpus. I am up and running now on SMP. Woo-hoo!
Experiencing intermittent system hang with FC1 2118. Server is Dell
1750 with dual Xeon 3.2GHz. I haven't had any problems with the UP
I try all suggestions (except BIOS settings because I'm far away the
server) including the last kernel version (FC1 2188) as Alan suggest,
but my Dell PE 1750 dual Xeon 2.4 still have system hang.
For me, the only solution for the moment seems to be a non SMP kernel.
No problems here, on a dual Xeon, since 2179 & 2188. The system used
to hang almost every quarter of an hour before the update.
Have just tried smp kernel 2188 on a Dual 2.4G Dell PowerEdge 2650
with embedded raid and had it hang. (same as with 2179 and below).
Have tried the following after reading Brian Hanna's comments (#129 &
usb bios setting originally 'on with bios support'
plain boot - hung
Appending 'noapic acpi=off' - still hung
Disabling bios usb (off)& appending 'noapic acpi=off' - booted
Disabled bios usb (off) not appending 'apic acpi=off' - booted
Changed usb bios setting to 'on with no bios support', nothing
appended - booted !!!
Don't know if this is going to help anyone, or weather it will give
anymore hints to the problem
That helps a great deal in some ways. USB disabled working says the
problem is either the BIOS USB emulation or our USB drivers. The fact
USB works without BIOS magic pretty much points the finger at the BIOS
firmware (the stuff doing 'make the USB keyboard appear to be a PS/2
keyboard, work in DOS, BIOS etc)
2.6 kernels are much cleverer about how they handle the PS/2 keyboard
so may be tripping a bug in the BIOS - or doing something naugty that
the BIOS trips over on. Hard to be sure which of the two. I'll ping a
Dell guy but you might want to check for bios updates.
We've been facing this issue here on our 2550 dual processor machine.
Finally we appear to have solved it by disabling USB in the BIOS
(A09) and booting the 2.4.22-1.2188.nptlsmp kernel. We'll need
repeatable successful reboots and more uptime under normal load to be
Still using the same Kernel as in comment #113. Uptime now in 45 Days...
Does RH wants to make everyone migrate to it's payed linux with this
The problem is in the stock kernel. Using a kernel from kernel.org the
system is rock solid.
Just a quick update on my comment (#135) don't take the 'usb with no
bios support' as fully working - i've since re-booted with this
option and had 2188 smp kernel hang on 1 machine.
Dell Bios version is A10, and as I have a couple more Dual Xeon DELL
PE 2650's I'll check what happens with them.
ok, after a morning of testing, here are some more results :-
Bios version A10
Cold boot, usb no bios support - hangs
warm Boot, no usb - boots
Cold Boot, no usb - boots
Warm Boot, usb no bios support - hangs
Bios Updated to A17
Warm Boot, no usb - boots
Warm boot, usb no bios support - hangs
Cold boot, usb no bios support - hangs
Unfortunatly I can't verify how I managed to get the system to boot
with 'usb no bios support' yesterday.
I disabled usb in bios and in modules.conf in an attempt to fix the
problem in comment #132 and had no success. Multiple startups and
shutdowns of an Oracle database installed on the system will
consistently cause the system to hang.
Re: comment #138 all kernels have bugs. Fedora happened to have one
that people hit in the low latency stuff. But you don't actually need
to buy RHEL to play with the RHEL kernel - you can download the source
rpm from ftp.redhat.com.
I installed 2188 SMP on my Quad PPro 200 last night, this morning it
was hung again. There is still a problem with this latest kernel.
Is this ever going to get fixed? I'm guessing I'll be using Fedora
Core 2 before this kernel is ever gonna get fixed.
Is there a confirmed working kernel ?
I have trouble with the 2.4.22-1.2188.nptlsmp kernel and switched
back to the 2.4.22-1.2179.nptlsmp because I am quite sure that I did
not have this problem a couple of weeks ago.
Running FC1 on a dual Xeon Dell PowerEdge 2650.
Regarding comment #132 and #141:
The following is dumped out to the console when the system hangs:
Uhhuh. NMI received for unknown reason 21 on CPU 0.
Dazed and confused, but trying to continue.
Do you have a strange power saving mode enabled?
NMI usually indicates a system problem, memory parity error or the
like. Dell may be able to tell you what NMI code 21 is. I'd normally
guess at bad memory - but its very odd that your box is reliable with
one CPU only if so.
The "unknown reason" code is just what the system read from I/O port
0x61 when the NMI occurred. This is pretty much useless. The bits
mean things like "speaker clock" (the output of the counter used to
drive the speaker), "refresh detect" (a bit that toggles every time
the memory is refreshed), some NMI enable bits (0x21 would mean that
NMI is enabled from the two sources IOCHK and SERR), "speaker
data", "speaker enable", and one bit that indicates if an NMI has
occurred because of a PCI PERR or SERR (bit 7--no PERR or SERR).
I have been following this rather long since december i think.
I have problem att upstart also somethimes, the fedora 2 prerealse
seams to work better.
A question, does RHEL realy differ so much from Fedora?
And if it works in RHEL, why does it not work in Fedora.
There has med NFS,USB, and good allot of stuff.
I say that people had problem with Oracle, i had BIG problem with
Bakbone's NetVault (backup system) almost every time i wrote "service
netvault stop" the kernel hanged...
I apologize for the folowing meaning:
Was this bug planted in Fedora by RHEL, to lett peoble with dual
system give up and go RHEL instead?
I can't even remeber this fault in the kernel with same number from
And again sorry for my meaning, for all you that was offended by my
meaning, i know you all work hard to solve this, but perhaps we
should just let it go, and upgrade to newer fedora kernel.
And again, i'm sorry
Regarding comment #132, it looks like we had a bad processor. Dell
diagnostic utilities didn't catch it, but we tried to reinstall
Windows on the box and it choked during startup. The CPU has been
replaced, so I'll need to start testing again from the beginning...
Something worth trying for people running FC1 who can regularly
experience this bug. Install the RHEL 3 kernels either from RHEL3 or
from centos-3.1 and see if it goes away. The kernels should install
w/o any pain on an FC1 system.
For anyone who wants to try Seth's recommendation in comment 150, make
sure to use RPM's "--oldpackage" option. You'll need it since RPM
considers 2.4.21 to be older than 2.4.22.
Is this bug still present in fedora core 2 ?
FC2 is a totally different kernel (2.6). I've certainly not seen any
evidence of matching problems, although since this bug is a composite
of about half a dozen things, most of which are fixed its hard to be
definitive. If you find any FC2 problems - please open a *new* bug for
somebody tried the "noht" flag for the kernel? this should disable
Installed RHEL3 entirely on this system. Hot springy death after about
8 hours of running on a poweredge 1750. Appears to be the same error
and it is immediately preceeded by an autofs mount expiring.
Just ANOTHER datapoint of pain.
Seth, is that RHEL 3 update 2 or just the original RHEL 3? (i.e. is
that with kernel 2.4.21-15.EL or later?)
BTW, on every machine I've encountered that had this problem with
Fedora Core 1, installing a Fedora Core 2 kernel has fixed it.
RHEL3U2. I really don't want to install a 2.6 kernel if I can avoid
it. I installed rhel so I wouldn't have to play with kernels for a while.
I figured dell equipment should get along well with rhel.
This bug has turned into a complete mish-mash of a number of problems
& reports for what turned out to be several different bugs.
Trying to pick through it, and find out the bits that are still
causing problems, and still unfixed is a nightmare.
I'm going to close this bug as fixed (as backing out the low-latency
patch did fix this problem for a number of users). If you are still
seeing problems with kernels past 2188, please open a new bug instead
of adding to this one, even if your bug sounds identical to someone
elses. (If it is, I can mark it as a duplicate easily -- if it isn't
it adds to the noise, and we end up with bugs like this monster).
If you are still seeing it, include as much info as possible
(Do *not* put 'See 109497', as that defeats what I'm trying to do with
this exercise). Try out some of the suggestions mentioned above
(acpi=on, nmi_watchdog=1, noapic, other such options..)
I don't care if it means I end up with another 150 bugs. It's easier
to weed out duplicates that way than it is trying to make sense of the
situation with this bug.
With Fedora Core 1 only having a finite amount of shelflife left
before it gets handed off to the fedora-legacy folks, it'd be good to
get a better idea of whats going wrong, both for Red Hat folks still
working on this such as myself, and for the legacy folks that'll pick
up when we're done.
If you're seeing this problem in RHEL3 / FC2 or whatever else, mark it
as such in the new bugs. "similar" problems to this bug frequently
aren't as the comments above show.