Bug 469107

Summary: Fedora 8/9 >= kernels 2.6.26.x fail to shutdown properly with Disabled IRQ# message.
Product: [Fedora] Fedora Reporter: Reilly Hall <sly.midnight>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: medium    
Version: 9CC: al.dunsmuir, ctubbsii, kernel-maint, leo, quintela, somlo
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-14 17:06:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lspci -v of my Asus A7N8X-Deluxe on 11-02-08
none
last few lines of syslog output as IRQ #18 is Disabled (nobody cared) 11-02-08 none

Description Reilly Hall 2008-10-29 21:53:20 UTC
Description of problem:
First seen with Fedora 8 and now with Fedora 9.
With any kernel after 2.6.26.x (2.6.25.x and earlier not affected) shutting down the system fails like clockwork once the initscripts reach towards the end of the shutdown procedures and a spurious "Disabling IRQ #18" message appears and the system just hangs there indefinitely.


Version-Release number of selected component (if applicable):
Fedora 8/9 starting with all kernel versions 2.6.26.x and above.


How reproducible:
Always as long as using a 2.6.26.x kernel, reverting to 2.6.25.x not affected.


Steps to Reproduce:
1. Update to one of the 2.6.26.x kernels.
2. Attempt to shutdown system...
3. Watch as the system never fully shuts down and has to be hard rebooted.
  
Actual results:
System never completes shutdown procedures.  This leads to unclean unmounted filesystems and the system never powers off or reboots as expected.


Expected results:
System shuts down/reboots as commanded.

Additional info:
The only clue as to what happens is that spurious "Disabling IRQ #18" message that shows up during the initscripts shutdown procedure.  Running an "lspci -v" on my system shows that IRQ #18 is being used by two devices, my Intel PRO/1000 GT Gigabit ethernet adapter and the onboard SiI 3112 SATA controller for my hard drive.  Once that message shows up, the hard drive appears to go dead (no more activity despite the system still "barely" responding to keystrokes.  This leads me to believe the hard drive is somehow disabled or shutdown inadvertently and would explain why no entries in the logs and why the file systems are never cleanly unmounted.  This usually happens around the unmounting filesystems or shutting off swap portion of the shutdown initscripts.

Comment 1 Reilly Hall 2008-10-29 23:54:36 UTC
Decided to do some additional testing and verified it has most definitely something to do with the Intel PRO/1000 GT Gigabit ethernet adapter I have in a PCI slot of my Asus A7N8X-Deluxe.  It shares the IRQ with my onboard SATA controller which would explain why the system appears to hang.  It crashes the system when a #service network stop is issued or if changing runlevels (say issuing an #init 1 command).  The system boots up just fine and appears to run indefinitely, network access also appears to work flawlessly.  But again, unloading/reloading the network service/module seems to crash the system (most likely due to it sharing the IRQ with my SATA controller).  If I don't load the network services (and presumably the e1000 module) at boot time, I CAN successfully shut the system down.

Any ideas?  It just occurred to me to include a copy of the output of #lspci -v.  I will try to include that as soon as possible.  In the mean time I'm going to see if moving the Gigabit card to another PCI slot changes which IRQ it uses so it doesn't share one with the SATA controller as no setting in the BIOS would appear to fix that.

Comment 2 Chuck Ebbert 2008-11-02 00:31:19 UTC
The kernel option "noirqdebug" will stop the interrupt from being disabled. You can add it to the end of the kernel line in /etc/grub.conf .

Comment 3 Reilly Hall 2008-11-02 21:48:29 UTC
Thanks, I'm gonna go ahead and try that option tonight.  I tried one suggested by the errors that showed up in the syslog one time I was able to manage to get it displayed before the machine inevitably crashed (though it was never committed to disk for the obvious reasons)...it mentioned to use something like "irqpoll" or something like that.  Needless to say it didn't help.  I'll post back as quickly as possible the results.

Comment 4 Reilly Hall 2008-11-02 23:20:48 UTC
Oh my GOD, you are my savior!  That did the trick.  Been putting up with this for like 2+ months now always hoping the next kernel update would solve that issue.  Eventually some update to the X.org drivers made it so booting from the last safe 2.6.25.x kernel I have installed not feasible (it would cause X to crash on load apparently, must be some special new requirement for the Radeons).  I will still post my "lspci -v" and if I can manage to swipe a copy of what gets output to the syslog on a normal kernel oops prior to the system fully crashing after prolonged disconnect from the hard drive (maybe I can have a mounted USB flash drive ready to go and/or just pipe the output from syslog directly to a file on said USB flash drive.  We'll see...again thanks SOOO much!

Comment 5 Reilly Hall 2008-11-02 23:24:58 UTC
Created attachment 322248 [details]
lspci -v of my Asus A7N8X-Deluxe on 11-02-08

Here's that "lspci -v" I promised incase its of interest to anyone.  Next up is the output of the syslog during a normal crash, that's gonna be tricky though since it doesn't actually get saved to the hard drive.

Comment 6 Reilly Hall 2008-11-02 23:57:40 UTC
Created attachment 322252 [details]
last few lines of syslog output as IRQ #18 is Disabled (nobody cared) 11-02-08

I guess that trick of using a mounted and ready to go USB drive with the syslog contents piped to a file on that drive worked...here are the last few lines from the syslog as I issued a "#service NetworkManager stop" to induce the error before the system totally crashed from not being able to access the hard drive.  Hope this helps, it just looks like gibberish to me :S

Comment 7 Gabriel Somlo 2008-12-03 16:04:44 UTC
Just to add another data point, this also happens on F10 (2.6.27.5-117.fc10.i686)
when the machine is being shut down (or when NetworkManager is stopped and shuts down eth0). Also using the e1000 driver:

01:0c.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controll
er (rev 02)
        Subsystem: Dell Optiplex GX270
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
        Memory at feae0000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at df40 [size=64]
        Capabilities: [dc] Power Management version 2
        Capabilities: [e4] PCI-X non-bridge device
        Capabilities: [f0] Message Signalled Interrupts: Mask- 64bit+ Count=1/1 
Enable-
        Kernel driver in use: e1000
        Kernel modules: e1000

Using "noirqdebug" does make the problem go away.

Comment 8 Reilly Hall 2008-12-06 16:00:02 UTC
Thanks Gabriel.  While that doesn't make me too enthusiastic to upgrade to F10 on my desktop (just barely been testing it in a VM since I was away from home for a month), I guess I'll have to keep that in mind.  Many thanks.  I'll report what I find as well when I find some time to upgrade and do some testing.

Comment 9 Al Dunsmuir 2009-02-02 00:36:45 UTC
Same problem - also on Dell GX270

Comment 10 Reilly Hall 2009-02-02 19:49:13 UTC
Man, I'm slacking, I needed to test this on Fedora 10 on this machine and I still haven't upgraded from 9 yet (on this one machine out of all the ones I have and its only affecting just this one).  I will at least test out the latest F9 kernel without the "noirqdebug" option to see if it still happens and report back here.

Comment 11 Al Dunsmuir 2009-02-27 23:56:56 UTC
I still experience this problem with the latest F10 kernel
- 2.6.27.15-170.2.24.fc10.i686

Comment 12 Christopher Tubbs 2009-03-28 08:19:01 UTC
I have very similar problems on Dell GX260 with the e1000 driver for Intel PRO/1000 Gigabit onboard NIC. In my case, IRQ #18 is disabled upon ifdown normally (with little side effects, so I don't usually care), but lately it has been doing it on ifup (VERY VERY VERY BAD, b/c I have no network!).

Sometimes I get a stack trace along with the disable message, and sometimes I get only the "Disabling IRQ #18" message. I've tried everything in the BIOS, but this problem only started happening recently for me, and there hasn't been a BIOS update for this board since 2005, so I think it's not the problem.

'sudo ethtool eth0' shows no link detected, unless I run the tool in the very brief moment during dhclient where I get an assigned IP address, in which case it shows there is a link. In fact, DHCP always successfully assigns an address, but the network goes down and IRQ 18 is disabled immediately afterwards.

The noirqdebug option does not fix the problem, and indeed reduces performance significantly. I don't get the disable message, but it also doesn't work.

Comment 13 Christopher Tubbs 2009-03-29 02:43:05 UTC
(In reply to comment #12)
> I have very similar problems...

In my case, the problem was fixed by resorting to NetworkManager (which loads after other startup services... perhaps that has an effect). The network service still causes this problem when the NetworkManager is uninstalled for me.

Comment 14 Christopher Tubbs 2009-03-29 02:43:49 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > I have very similar problems...
> 
> In my case, the problem was fixed...

Oh, and the IRQ disable message still appears on shutdown, but at least I get network now.

Comment 15 Reilly Hall 2009-05-28 14:40:57 UTC
Its been a while since an update but, the original machine I was experiencing this problem with was temporarily decommissioned back a couple/few months ago when I bought parts to build a new machine to use for my primary machine.

As soon as I get a new hard drive in the original old machine with the symptoms, I will install the new Fedora 11 (should have a drive by the time it comes out) and test to see if the same conditions occur.

Comment 16 Bug Zapper 2009-06-10 03:07:00 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 17 Bug Zapper 2009-07-14 17:06:39 UTC
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.