Bug 358571

Summary: Lockups during install with Disabling IRQ #31 message
Product: [Fedora] Fedora Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 8   
Target Milestone: ---   
Target Release: ---   
Hardware: other   
OS: All   
URL: ARRAY(0x8bcb30)
Whiteboard:
Fixed In Version: 2.6.23.1-49 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-12-05 20:46:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description IBM Bug Proxy 2007-10-30 16:45:19 UTC
Problem description:
Attempted to install F8 test 3 on QS20 blade.  Could netboot and proceed
partially into an NFS install and would receive a lockup of blade with a message
of Disabling IRQ #31 on the console.

Changed install method to ftp and was able to complete the install. This all looks like there is a
missing patch (detauiled inline below) for the spidernet driver.

Have had the system up for a little while and console again show Disabling IRQ
#31 and system has lost network connectivity.

ifconfig
eth0      Link encap:Ethernet  HWaddr 00:14:5E:49:03:D6
          inet6 addr: fe80::214:5eff:fe49:3d6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3392000 errors:0 dropped:0 overruns:0 frame:0
          TX packets:847774 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3956081093 (3.6 GiB)  TX bytes:133118370 (126.9 MiB)
          Interrupt:31 Memory:20004000-20005000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:713092 errors:0 dropped:0 overruns:0 frame:0
          TX packets:713092 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:3084802191 (2.8 GiB)  TX bytes:3084802191 (2.8 GiB)


Attempts to restart network services result in the same message:
ifup eth0

Determining IP information for eth0...Disabling IRQ #31

Message from syslogd@qdc179 at Oct 17 11:22:56 ...
 kernel: Disabling IRQ #31
 failed.

Network driver is:
 ethtool -i eth0
driver: spidernet
version: 2.0 B
firmware-version: no information
bus-info: 0001:00:03.0

uname output
Linux qdc179.austin.ibm.com 2.6.23-0.214.rc8.git2.fc8 #1 SMP Fri Sep 28 17:14:38
EDT 2007 ppc64 ppc64 ppc64 GNU/Linux

system type:
QS20 Cell Blade


Additional log info:
From /var/log/messages during the restart
Oct 17 11:20:17 qdc179 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67
interval 4
Oct 17 11:20:17 qdc179 kernel: eth0: link is down trying to bring it up
Oct 17 11:20:21 qdc179 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67
interval 4
Oct 17 11:20:21 qdc179 kernel: eth0: link is down trying to bring it up
Oct 17 11:20:25 qdc179 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67
interval 9
Oct 17 11:20:25 qdc179 kernel: eth0: link is down trying to bring it up
Oct 17 11:20:34 qdc179 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67
interval 10
Oct 17 11:20:44 qdc179 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67
interval 14
Oct 17 11:20:58 qdc179 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67
interval 15
Oct 17 11:21:13 qdc179 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67
interval 2
Oct 17 11:21:15 qdc179 dhclient: No DHCPOFFERS received.

While the client is showing no offers received, the dhcp server shows:
Oct 17 16:23:49 qdc134 dhcpd: DHCPOFFER on 9.3.84.179 to 00:14:5e:49:03:d6 via eth0



[root@qdc179 ~]# lspci -v
0000:00:0a.0 IDE interface: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host
Controller (rev 02) (prog-if 85 [Master SecO PriO])
        Subsystem: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller
        Flags: bus master, medium devsel, latency 0, IRQ 51
        I/O ports at 01f0 [size=8]
        I/O ports at 03f0 [size=4]
        I/O ports at 01f8 [size=8]
        I/O ports at 0370 [size=4]
        I/O ports at 06f0 [size=16]
        Memory at 24070000000 (32-bit, non-prefetchable) [size=256]
        [virtual] Expansion ROM at 24000000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 2

0001:00:03.0 Ethernet controller: Toshiba America Unknown device 01b3 (rev 02)
        Flags: bus master, fast devsel, latency 0, IRQ 31
        Memory at 24020004000 (32-bit, non-prefetchable) [size=4K]

0002:00:03.0 Ethernet controller: Toshiba America Unknown device 01b3 (rev 02)
        Flags: bus master, fast devsel, latency 0, IRQ 32
        Memory at 34020004000 (32-bit, non-prefetchable) [size=4K]


look like we missing this patch
-------------------------------
http://patchwork.ozlabs.org/cbe-oss-dev/patch?id=13049


We must not call netif_poll_enable after enabling interrupts,
because an interrupt might come in and set the __LINK_STATE_RX_SCHED
bit before we get to clear that bit again. If that happens,
the next call to the ->poll() function will oops.

Signed-off-by: Arnd Bergmann <arnd.bergmann.com>
Signed-off-by: Kou Ishizaki <kou.ishizaki.jp>
---

I refreshed Arnd-san's patch.

Patch

Index: linux-powerpc-git/drivers/net/spider_net.c
===================================================================
--- linux-powerpc-git.orig/drivers/net/spider_net.c	2007-08-21
16:58:44.000000000 +0900
+++ linux-powerpc-git/drivers/net/spider_net.c	2007-08-21 17:11:07.000000000 +0900
@@ -2030,6 +2030,7 @@  spider_net_open(struct net_device *netde
 	/* further enhancement: setup hw vlan, if needed */
 
 	result = -EBUSY;
+	netif_poll_enable(netdev);
 	if (request_irq(netdev->irq, spider_net_interrupt,
 			     IRQF_SHARED, netdev->name, netdev))
 		goto register_int_failed;
@@ -2038,13 +2039,13 @@  spider_net_open(struct net_device *netde
 
 	netif_start_queue(netdev);
 	netif_carrier_on(netdev);
-	netif_poll_enable(netdev);
 
 	spider_net_enable_interrupts(card);
 
 	return 0;
 
 register_int_failed:
+	netif_poll_disable(netdev);
 	spider_net_free_rx_chain_contents(card);
 alloc_skbs_failed:
 	spider_net_free_chain(card, &card->rx_chain);



---------------------------------
end of inline patch


current drivers/net/spider_net.c file in Fedora 8 - test 2:
......

   2026         /* further enhancement: setup hw vlan, if needed */
   2027 
   2028         result = -EBUSY;
   2029         if (request_irq(netdev->irq, spider_net_interrupt,
   2030                              IRQF_SHARED, netdev->name, netdev))
   2031                 goto register_int_failed;
   2032 
   2033         spider_net_enable_card(card);
   2034 
   2035         netif_start_queue(netdev);
   2036         netif_carrier_on(netdev);
   2037         netif_poll_enable(netdev);
   2038 
   2039         spider_net_enable_interrupts(card);
   2040 
   2041         return 0;
...

Comment 1 Arnd Bergmann 2007-10-31 15:06:32 UTC
The mainline kernel in 2.6.24-rc1 now has a better fix for this, see commit 
7a627558214664f0e071b2652fc37e4d7d3dce32. If you intend to fix this for Fedora 
8, the smaller patch referenced above would be more appropriate, because it is 
less invasive.

Comment 2 Dave Jones 2007-10-31 21:32:13 UTC
A day too late now for F8 sadly :(


Comment 3 Chuck Ebbert 2007-11-06 23:19:36 UTC
(In reply to comment #1)
> The mainline kernel in 2.6.24-rc1 now has a better fix for this, see commit 
> 7a627558214664f0e071b2652fc37e4d7d3dce32. If you intend to fix this for Fedora 
> 8, the smaller patch referenced above would be more appropriate, because it is 
> less invasive.

That patch is already in.


Comment 4 Chuck Ebbert 2007-11-07 00:41:02 UTC
Fix from comment 0 went into kernel 2.6.23.1-47. It did not make Fedora 8 release.