Red Hat Bugzilla – Bug 469107
Fedora 8/9 >= kernels 2.6.26.x fail to shutdown properly with Disabled IRQ# message.
Last modified: 2009-07-14 13:06:39 EDT
Description of problem:
First seen with Fedora 8 and now with Fedora 9.
With any kernel after 2.6.26.x (2.6.25.x and earlier not affected) shutting down the system fails like clockwork once the initscripts reach towards the end of the shutdown procedures and a spurious "Disabling IRQ #18" message appears and the system just hangs there indefinitely.
Version-Release number of selected component (if applicable):
Fedora 8/9 starting with all kernel versions 2.6.26.x and above.
Always as long as using a 2.6.26.x kernel, reverting to 2.6.25.x not affected.
Steps to Reproduce:
1. Update to one of the 2.6.26.x kernels.
2. Attempt to shutdown system...
3. Watch as the system never fully shuts down and has to be hard rebooted.
System never completes shutdown procedures. This leads to unclean unmounted filesystems and the system never powers off or reboots as expected.
System shuts down/reboots as commanded.
The only clue as to what happens is that spurious "Disabling IRQ #18" message that shows up during the initscripts shutdown procedure. Running an "lspci -v" on my system shows that IRQ #18 is being used by two devices, my Intel PRO/1000 GT Gigabit ethernet adapter and the onboard SiI 3112 SATA controller for my hard drive. Once that message shows up, the hard drive appears to go dead (no more activity despite the system still "barely" responding to keystrokes. This leads me to believe the hard drive is somehow disabled or shutdown inadvertently and would explain why no entries in the logs and why the file systems are never cleanly unmounted. This usually happens around the unmounting filesystems or shutting off swap portion of the shutdown initscripts.
Decided to do some additional testing and verified it has most definitely something to do with the Intel PRO/1000 GT Gigabit ethernet adapter I have in a PCI slot of my Asus A7N8X-Deluxe. It shares the IRQ with my onboard SATA controller which would explain why the system appears to hang. It crashes the system when a #service network stop is issued or if changing runlevels (say issuing an #init 1 command). The system boots up just fine and appears to run indefinitely, network access also appears to work flawlessly. But again, unloading/reloading the network service/module seems to crash the system (most likely due to it sharing the IRQ with my SATA controller). If I don't load the network services (and presumably the e1000 module) at boot time, I CAN successfully shut the system down.
Any ideas? It just occurred to me to include a copy of the output of #lspci -v. I will try to include that as soon as possible. In the mean time I'm going to see if moving the Gigabit card to another PCI slot changes which IRQ it uses so it doesn't share one with the SATA controller as no setting in the BIOS would appear to fix that.
The kernel option "noirqdebug" will stop the interrupt from being disabled. You can add it to the end of the kernel line in /etc/grub.conf .
Thanks, I'm gonna go ahead and try that option tonight. I tried one suggested by the errors that showed up in the syslog one time I was able to manage to get it displayed before the machine inevitably crashed (though it was never committed to disk for the obvious reasons)...it mentioned to use something like "irqpoll" or something like that. Needless to say it didn't help. I'll post back as quickly as possible the results.
Oh my GOD, you are my savior! That did the trick. Been putting up with this for like 2+ months now always hoping the next kernel update would solve that issue. Eventually some update to the X.org drivers made it so booting from the last safe 2.6.25.x kernel I have installed not feasible (it would cause X to crash on load apparently, must be some special new requirement for the Radeons). I will still post my "lspci -v" and if I can manage to swipe a copy of what gets output to the syslog on a normal kernel oops prior to the system fully crashing after prolonged disconnect from the hard drive (maybe I can have a mounted USB flash drive ready to go and/or just pipe the output from syslog directly to a file on said USB flash drive. We'll see...again thanks SOOO much!
Created attachment 322248 [details]
lspci -v of my Asus A7N8X-Deluxe on 11-02-08
Here's that "lspci -v" I promised incase its of interest to anyone. Next up is the output of the syslog during a normal crash, that's gonna be tricky though since it doesn't actually get saved to the hard drive.
Created attachment 322252 [details]
last few lines of syslog output as IRQ #18 is Disabled (nobody cared) 11-02-08
I guess that trick of using a mounted and ready to go USB drive with the syslog contents piped to a file on that drive worked...here are the last few lines from the syslog as I issued a "#service NetworkManager stop" to induce the error before the system totally crashed from not being able to access the hard drive. Hope this helps, it just looks like gibberish to me :S
Just to add another data point, this also happens on F10 (184.108.40.206-117.fc10.i686)
when the machine is being shut down (or when NetworkManager is stopped and shuts down eth0). Also using the e1000 driver:
01:0c.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controll
er (rev 02)
Subsystem: Dell Optiplex GX270
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
Memory at feae0000 (32-bit, non-prefetchable) [size=128K]
I/O ports at df40 [size=64]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device
Capabilities: [f0] Message Signalled Interrupts: Mask- 64bit+ Count=1/1
Kernel driver in use: e1000
Kernel modules: e1000
Using "noirqdebug" does make the problem go away.
Thanks Gabriel. While that doesn't make me too enthusiastic to upgrade to F10 on my desktop (just barely been testing it in a VM since I was away from home for a month), I guess I'll have to keep that in mind. Many thanks. I'll report what I find as well when I find some time to upgrade and do some testing.
Same problem - also on Dell GX270
Man, I'm slacking, I needed to test this on Fedora 10 on this machine and I still haven't upgraded from 9 yet (on this one machine out of all the ones I have and its only affecting just this one). I will at least test out the latest F9 kernel without the "noirqdebug" option to see if it still happens and report back here.
I still experience this problem with the latest F10 kernel
I have very similar problems on Dell GX260 with the e1000 driver for Intel PRO/1000 Gigabit onboard NIC. In my case, IRQ #18 is disabled upon ifdown normally (with little side effects, so I don't usually care), but lately it has been doing it on ifup (VERY VERY VERY BAD, b/c I have no network!).
Sometimes I get a stack trace along with the disable message, and sometimes I get only the "Disabling IRQ #18" message. I've tried everything in the BIOS, but this problem only started happening recently for me, and there hasn't been a BIOS update for this board since 2005, so I think it's not the problem.
'sudo ethtool eth0' shows no link detected, unless I run the tool in the very brief moment during dhclient where I get an assigned IP address, in which case it shows there is a link. In fact, DHCP always successfully assigns an address, but the network goes down and IRQ 18 is disabled immediately afterwards.
The noirqdebug option does not fix the problem, and indeed reduces performance significantly. I don't get the disable message, but it also doesn't work.
(In reply to comment #12)
> I have very similar problems...
In my case, the problem was fixed by resorting to NetworkManager (which loads after other startup services... perhaps that has an effect). The network service still causes this problem when the NetworkManager is uninstalled for me.
(In reply to comment #13)
> (In reply to comment #12)
> > I have very similar problems...
> In my case, the problem was fixed...
Oh, and the IRQ disable message still appears on shutdown, but at least I get network now.
Its been a while since an update but, the original machine I was experiencing this problem with was temporarily decommissioned back a couple/few months ago when I bought parts to build a new machine to use for my primary machine.
As soon as I get a new hard drive in the original old machine with the symptoms, I will install the new Fedora 11 (should have a drive by the time it comes out) and test to see if the same conditions occur.
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '9'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 9's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 9 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.