Description of problem: HW ..... TX200S5 (16-cores) LAN .... Intel 82574L (Hartwell)(8086:10d3) BIOS ... R1.03 iRMC ... 3.61A OS ..... RHEL4.8 GA Snapshot 1 (kernel-2.6.9-84.ELlargesmp) x86_64 For detail, please find the sosreport as attached. I am using largesmp kernel with TX200S5 (16-cores), and RHEL4.8 e1000e native driver for Hartwell. I tried netdump 12 times in total by '# echo c > /proc/sysrq-trigger' and hangup 7 times out of 12 tries. The symptom is very similar to igb deadlock issue IT#163914. How reproducible: sometimes (ca. 50%) Steps to Reproduce: trigger netdump using alt-sysrq-c on e1000e system Actual results: system hangs Expected results: netdump is successfully written Additional info: Customers driver experts told me that the e1000e driver hangup at disable_irq(). 4600 static void e1000_netpoll(struct net_device *netdev) 4601 { 4602 struct e1000_adapter *adapter = netdev_priv(netdev); 4603 4604 disable_irq(adapter->pdev->irq); <---- !!! never return from here !!! 4605 e1000_intr(adapter->pdev->irq, netdev, NULL); 4606 4607 enable_irq(adapter->pdev->irq); 4608 } This looks like bz#435886. The following patch has been suggested: --- linux-2.6.9-85.EL/drivers/net/e1000e/netdev.c.orig 2009-04-07 12:47:55.039860485 +0200 +++ linux-2.6.9-85.EL/drivers/net/e1000e/netdev.c 2009-04-07 12:48:57.181231271 +0200 @@ -4601,7 +4601,10 @@ static void e1000_netpoll(struct net_dev { struct e1000_adapter *adapter = netdev_priv(netdev); - disable_irq(adapter->pdev->irq); + if (likely(!netdump_mode)) + disable_irq(adapter->pdev->irq); + else + disable_irq_nosync(adapter->pdev->irq); e1000_intr(adapter->pdev->irq, netdev, NULL); enable_irq(adapter->pdev->irq); I'll attach another patch using the same approach used in bz#435886.
Created attachment 338583 [details] Patch fixing netpoll deadlock using the same approach used in bz#435886
Hello, Could you give a try on the attached patch and report back your results? thanks, Flavio
Copy & Paste of testing feedback --- <> --- Many thanks for the patch. Here is a feedback from my colleague. e1000e-netdump.diff .......... NG (deadlock still occurs, as we learned from igb deadlock) e1000e-netpoll-deadlock-fix.patch.txt ... NG (no deadlock, but handshaking never complete) My driver experts also thought the patch from Flavio Leitner in Comment #1 should work fine based on their experiences with igb driver deadlock case. With the patch, I can see also <handshaking> message as same as the driver without patch. Instead of the deadlock at disable_irq() in e1000_netpoll(), it seems that e1000_netpoll() is repeatedly called perhaps endless, as the result, the <handshaking> never complete.... Do you have any idea about this strange behaviour ? --- <> ---
This patch doesn't reduce the chances or deadlock, it just delays the chances for deadlock until after the vmcore has been dumped. Deadlock still occurs with the e1000e-netdump.diff patch because netdump_mode changes from 1 to 0 during the netdump process. After the dump is complete and the vmcore is copied, netdump_mode is set to 0 when the SHOW_STATE command comes down and for all the commands after it that show the tasks, semaphores, etc. As for the other patch, I'm not too surprised that it doesn't work correctly. When the handshaking never completes it's an indication that all of the devices quota has been exhausted and the device is never scheduled again. The igb driver disables interrupts on the hardware (not the OS) and then calls some of it's own internal routines to do the polling. The e1000e driver disables interrupts at the OS (not hardware) level and then calls the routine that handles interrupts (rather than other internal routines). If you switch and disable interrupts at the hardware level then call e1000_intr, e1000_intr will exit before calling netif_rx_schedule since there is a check to see if the interrupt actually occurred. With interrupts being disabled at the hardware level netif_rx_schedule will never get called. We need to find some other way to fix this. I'm not sure I like the idea of something like this: @@ -4601,7 +4601,7 @@ static void e1000_netpoll(struct net_device *netdev) { struct e1000_adapter *adapter = netdev_priv(netdev); - disable_irq(adapter->pdev->irq); + disable_irq_nosync(adapter->pdev->irq); e1000_intr(adapter->pdev->irq, netdev, NULL); enable_irq(adapter->pdev->irq); but it might be the only option.
Created attachment 338926 [details] Another suggestion Hi Andy, perhaps this one will do? Admittedly not elegant, but judging from your analysis it should work. If I'm understanding right, when netdump_mode goes from 1 to 0, all that remains to do is log some last messages and reboot.
Something like that could work, but I question whether we need to sync at all? What is the impact of making this a non-syncing interrupt disabling action?
Created attachment 339062 [details] Patch calling disable_irq_nosync instead of disable_irq Andy, Martin, let me apologize one of my previous report is not correct, that makes both of you so confused, and the followings are right result. e1000e-netdump.diff ..................... NG (deadlock at disable_irq()) e1000e-netpoll-deadlock-fix.patch.txt ... NG (handshaking never completes) e1000e-netpoll-nosync.patch.txt ......... NG (handshaking never completes) e1000e-netdump-2.diff.txt ............... NG (handshaking never completes)
Fujitsu requesting this be a fixed in 4.8. Requesting blocker status. On Tue, 2009-04-07 at 22:06 -0400, Kei Tokunaga wrote: > Hi Linda, Larry, > > I'm sending this to grab your attention on BZ494688. While > Fujitsu Siemens were doing test on RHEL4.8 Snapshot 1, they > found a bug in the e1000e driver. IT is 283053. A Red Hat > engineer, Flavio Leitner, posted a proposal patch in the bug, > and Fujitsu started testing on it. Fujitsu/Siemens would like > to get it included in 4.8. Also setting fix for bug 452287 as pre-req for this bug fix.
(In reply to comment #11) > Created an attachment (id=339062) [details] > Patch calling disable_irq_nosync instead of disable_irq > > Andy, Martin, let me apologize one of my previous report is not correct, > that makes both of you so confused, and the followings are right result. > > e1000e-netdump.diff ..................... NG (deadlock at disable_irq()) > e1000e-netpoll-deadlock-fix.patch.txt ... NG (handshaking never completes) > e1000e-netpoll-nosync.patch.txt ......... NG (handshaking never completes) > e1000e-netdump-2.diff.txt ............... NG (handshaking never completes) So there is currently *no* patch that works? I don't see how we can expect to have this working in time to make 4.8.
Created attachment 339340 [details] sosreport
Created attachment 339417 [details] Patch nosync with direct clean Here is a patch. Our driver experts believe the patch resolves both of the deadlock and the endless handshaking. Your feedback for the patch would be highly appreciated. With the patch, 90% of netdump is successfully completed, but I still have 10% netdump failure with a new different symptom with this patch, the system is rebooted with the following message before handshaking. poll_lock is locked, unable to take a netdump! Our netdump expert is now investigating the failure.
Fujitsu, unfortunately, it's definitely too late for this one to make it into 4.8. Once there is a patch available that resolve the issues described, discuss with you partner manager what options are available.
I've been looking over this, and so far, I agree that we should be using disable_irq_nosync, but I'm concerned as to why changing from the use of e1000_intr to the subrouintes of the napi method helps. We shouldn't need to do that. It implies that the interrupt routine is deciding there are no frames to schedule reception for erroneously. If you just change the e1000_netpoll routine to use disable_irq_nosync, and instrument e1000_clean, can you see it get called on the console of this system? Hmm, I wonder if this in e1000_clean: /* * IMS will not auto-mask if INT_ASSERTED is not set, and if it is * not set, then the adapter didn't send an interrupt */ if (!(icr & E1000_ICR_INT_ASSERTED)) return IRQ_NONE; Isn't preventing the poll_controllers 'simulated interrupt' from doing its job, although one would think if the hardware got a frame, this would still get asserted, even if the irq was disabled in the 8259/ioapic as we've done via disable_irq)
Created attachment 340072 [details] patch to force scheduling of napi during poll_controller Please give this patch a try. I like it better than the direct calling of the clean subroutines, since I'm not sure if any of the hardware access in e1000_clean is required to prevent the NIC from stalling in some way. That said, this will likely _not_ prevent the poll_lock issue you ran into. From what I can see that is the result of an unfortunate incident in which a cpu is halted while in the middle of a loop in net_rx_action. I'll need to address that with a separate patch.
Created attachment 340350 [details] follow on patch to correct poll_lock issue heres a follow on patch to my previous one. This patch should avoid rebooting when the poll_lock is held at the time of a netdump. I've looked it over, and in the event the poll_lock is held, there is very little we can do save for resetting the lock and hoping for the best. Please try this patch in conjunction with my last patch and let me know if all the issues are fixed up. Thanks!
Do we have a test kernel for partner to verify the fix? If not, could someone build this through brew on top of the latest 4.8 kernel. I can put it on my people.page if necessary. Partner will need this in a 4.8.z day0/early errata.
I'd like to look further at the subsequent deadlock. The fact that you're getting those last two messages indicates your seeing a new problem. Please send a tcpdump of that deadlock/infinite loop from the 16 core test with both the patches from comments 19 and 20 in place. That will tell us if we're dealing with a real deadlock or an infinite loop
No, they've not replied to my request in comment 24. I'd like to see the tcpdump from a a failed netdump please.
I'm looking at these logs, and by all rights, it looks like the dump is proceding normally, requests come in for subsequent memory pages, and netdump responds. Ceratinly the netdump handshake has completed on all of these, which stands in opposition to whats commented on in comment #31. Do you have corresponding tcpdumps with these? As it currently stands the logs you've attached show a working netdump session.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Committed in 89.4.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Some errors/confusions while adding the bz to errata tool. Returning bz to MODIFIED state so that it can be added to errata.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Using netdump may have caused a kernel deadlock on some systems.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html