From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922 Description of problem: I have RHEL3 installed on IBM x305 machine with Olympic NIC, configured as tr0. While media is connected there is not prolem to rmmod insmod olympic module, but if media was disconnected when the NIC was up, it will recognize it and write you a message on the screen. Till now it is ok. But if in this state you will try to restart network service it will come to getting tr0 up and make kernel panic. I know there was a problem with previous 4.0.1.EL kernels, but it is supposed to be fixed in 9.EL kernel, which is currently installed on a machine. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Configure tr0 - make it up and working 2.Disconnect network media to tr0. 3.Issue "service network restart" Actual Results: KERNEL PANIC Expected Results: Tr0 error message, like error determining network configuration, like one, when eth0 fails to get to DHCP. Additional info:
Created attachment 102469 [details] patch to correct oops on lobe failure/ring removal in olympic tokenring cards The problem turned out to be the attempted freeing of an irq from within the interrupt handler of that irq. I'm not sure why this driver frees the interrupt on fatal error at all, but since I didn't have a hardware data sheet/users guide handy to explain it to me, I tried to be as non disruptive as possible. This patch corrects the failure by setting up a tasklet to call free_irq after the handler has completed. Also, while I was in there, I noted that the adapter was not resetting properly on ifdown/ifup, so I added a call to olympic_init from olympic_open to correct that.
Created attachment 102490 [details] new patch to preform same function using keventd, rather than tasklet This patch has some corrections in it. It adds a spinlock to the linked list, and uses keventd to free the irq rather than a tasklet, so that the free occurs in a process context.
Created attachment 102535 [details] new patch for olympic issue This version of the patch replaces the deffered irq de-registration mechanism with a direct disabling of the affected interrupts on the card directly. This is more in-line with the origional design of the driver, but much safer than calling free_irq from an interrupt handler directly. It also fixes a subsequent oops I hit that is called by an skbuff double free that happens when you try to ifdown an interface that has been closed by the adapter.
Created attachment 102624 [details] more clean up on my last patch cleaning up a few details on my last patch
Created attachment 102664 [details] 3rd pass at fixing this bug. This patch solves the same problem by removing the free_irq calls from the interrupt path entirely, allowing olympic_close to do that work. All requisite memory leak/cleanup problems addressed (as far as I could see), and card re-init code kept in place.
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.3.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html