Bug 494688

Summary: e1000e: sporadic hang in netdump
Product: Red Hat Enterprise Linux 4 Reporter: Flavio Leitner <fleitner>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.8CC: agospoda, cward, dhoward, jolsa, jskrabal, ktokunag, ltroan, mjenner, nhorman, takeshi.suzuki, tak, tao, vgoyal
Target Milestone: rcKeywords: OtherQA, ZStream
Target Release: 4.9   
Hardware: All   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Using netdump may have caused a kernel deadlock on some systems.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-16 10:21:48 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 452287    
Bug Blocks: 504565, 505643    
Description Flags
Patch fixing netpoll deadlock using the same approach used in bz#435886
Another suggestion
Patch calling disable_irq_nosync instead of disable_irq
Patch nosync with direct clean
patch to force scheduling of napi during poll_controller
follow on patch to correct poll_lock issue none

Description Flavio Leitner 2009-04-07 15:14:31 EDT
Description of problem:

HW ..... TX200S5 (16-cores)
LAN .... Intel 82574L (Hartwell)(8086:10d3)
BIOS ... R1.03
iRMC ... 3.61A
OS ..... RHEL4.8 GA Snapshot 1 (kernel-2.6.9-84.ELlargesmp) x86_64

For detail, please find the sosreport as attached.

I am using largesmp kernel with TX200S5 (16-cores),
and RHEL4.8 e1000e native driver for Hartwell.
I tried netdump 12 times in total by '# echo c > /proc/sysrq-trigger'
and hangup 7 times out of 12 tries.
The symptom is very similar to igb deadlock issue IT#163914.

How reproducible:
sometimes (ca. 50%)

Steps to Reproduce:
trigger netdump using alt-sysrq-c on e1000e system

Actual results:
system hangs

Expected results:
netdump is successfully written

Additional info:

Customers driver experts told me that the e1000e driver hangup at disable_irq().

4600  static void e1000_netpoll(struct net_device *netdev)
4601  {
4602      struct e1000_adapter *adapter = netdev_priv(netdev);
4604      disable_irq(adapter->pdev->irq);    <---- !!! never return from here !!!
4605      e1000_intr(adapter->pdev->irq, netdev, NULL);
4607      enable_irq(adapter->pdev->irq);
4608  }

This looks like bz#435886.

The following patch has been suggested:
--- linux-2.6.9-85.EL/drivers/net/e1000e/netdev.c.orig  2009-04-07 12:47:55.039860485 +0200
+++ linux-2.6.9-85.EL/drivers/net/e1000e/netdev.c       2009-04-07 12:48:57.181231271 +0200
@@ -4601,7 +4601,10 @@ static void e1000_netpoll(struct net_dev
        struct e1000_adapter *adapter = netdev_priv(netdev);

-       disable_irq(adapter->pdev->irq);
+       if (likely(!netdump_mode))
+               disable_irq(adapter->pdev->irq);
+       else
+               disable_irq_nosync(adapter->pdev->irq);
        e1000_intr(adapter->pdev->irq, netdev, NULL);


I'll attach another patch using the same approach used in bz#435886.
Comment 1 Flavio Leitner 2009-04-07 15:30:20 EDT
Created attachment 338583 [details]
Patch fixing netpoll deadlock using the same approach used in bz#435886
Comment 2 Flavio Leitner 2009-04-07 15:35:49 EDT

Could you give a try on the attached patch and report back your results?
Comment 4 Flavio Leitner 2009-04-08 09:26:58 EDT
Copy & Paste of testing feedback
--- <> ---
Many thanks for the patch. Here is a feedback from my colleague.

e1000e-netdump.diff .......... NG (deadlock still occurs, as we learned from igb deadlock)
e1000e-netpoll-deadlock-fix.patch.txt ... NG (no deadlock, but handshaking never complete)

My driver experts also thought the patch from Flavio Leitner in Comment #1
should work fine based on their experiences with igb driver deadlock case.

With the patch, I can see also <handshaking> message as same as the driver without patch.
Instead of the deadlock at disable_irq() in e1000_netpoll(),
it seems that e1000_netpoll() is repeatedly called perhaps endless,
as the result, the <handshaking> never complete....

Do you have any idea about this strange behaviour ?
--- <> ---
Comment 5 Andy Gospodarek 2009-04-08 13:59:09 EDT
This patch doesn't reduce the chances or deadlock, it just delays the chances for deadlock until after the vmcore has been dumped.  Deadlock still occurs with the e1000e-netdump.diff patch because netdump_mode changes from 1 to 0 during the netdump process.  After the dump is complete and the vmcore is copied, netdump_mode is set to 0 when the SHOW_STATE command comes down and for all the commands after it that show the tasks, semaphores, etc.

As for the other patch, I'm not too surprised that it doesn't work correctly.  When the handshaking never completes it's an indication that all of the devices quota has been exhausted and the device is never scheduled again.

The igb driver disables interrupts on the hardware (not the OS) and then calls some of it's own internal routines to do the polling.  The e1000e driver disables interrupts at the OS (not hardware) level and then calls the routine that handles interrupts (rather than other internal routines).  

If you switch and disable interrupts at the hardware level then call e1000_intr, e1000_intr will exit before calling netif_rx_schedule since there is a check to see if the interrupt actually occurred.  With interrupts being disabled at the hardware level netif_rx_schedule will never get called.

We need to find some other way to fix this.  I'm not sure I like the idea of something like this:

@@ -4601,7 +4601,7 @@ static void e1000_netpoll(struct net_device *netdev)
        struct e1000_adapter *adapter = netdev_priv(netdev);

-       disable_irq(adapter->pdev->irq);
+       disable_irq_nosync(adapter->pdev->irq);
        e1000_intr(adapter->pdev->irq, netdev, NULL);


but it might be the only option.
Comment 9 Martin Wilck 2009-04-09 10:57:14 EDT
Created attachment 338926 [details]
Another suggestion

Hi Andy, perhaps this one will do? Admittedly not elegant, but judging from your analysis it should work. If I'm understanding right, when netdump_mode goes from 1 to 0, all that remains to do is log some last messages and reboot.
Comment 10 Andy Gospodarek 2009-04-09 11:28:40 EDT
Something like that could work, but I question whether we need to sync at all?  What is the impact of making this a non-syncing interrupt disabling action?
Comment 11 Takeshi Suzuki 2009-04-10 01:26:30 EDT
Created attachment 339062 [details]
Patch calling disable_irq_nosync instead of disable_irq

Andy, Martin, let me apologize one of my previous report is not correct,
that makes both of you so confused, and the followings are right result.

e1000e-netdump.diff ..................... NG (deadlock at disable_irq())
e1000e-netpoll-deadlock-fix.patch.txt ... NG (handshaking never completes)
e1000e-netpoll-nosync.patch.txt ......... NG (handshaking never completes)
e1000e-netdump-2.diff.txt ............... NG (handshaking never completes)
Comment 12 Larry Troan 2009-04-10 16:07:38 EDT
Fujitsu requesting this be a fixed in 4.8. Requesting blocker status.
On Tue, 2009-04-07 at 22:06 -0400, Kei Tokunaga wrote:
> Hi Linda, Larry,
> I'm sending this to grab your attention on BZ494688.  While
> Fujitsu Siemens were doing test on RHEL4.8 Snapshot 1, they
> found a bug in the e1000e driver.  IT is 283053.  A Red Hat
> engineer, Flavio Leitner, posted a proposal patch in the bug,
> and Fujitsu started testing on it.  Fujitsu/Siemens would like
> to get it included in 4.8.
Also setting fix for bug 452287 as pre-req for this bug fix.
Comment 13 Andy Gospodarek 2009-04-13 11:02:13 EDT
(In reply to comment #11)
> Created an attachment (id=339062) [details]
> Patch calling disable_irq_nosync instead of disable_irq
> Andy, Martin, let me apologize one of my previous report is not correct,
> that makes both of you so confused, and the followings are right result.
> e1000e-netdump.diff ..................... NG (deadlock at disable_irq())
> e1000e-netpoll-deadlock-fix.patch.txt ... NG (handshaking never completes)
> e1000e-netpoll-nosync.patch.txt ......... NG (handshaking never completes)
> e1000e-netdump-2.diff.txt ............... NG (handshaking never completes)  

So there is currently *no* patch that works?  I don't see how we can expect to have this working in time to make 4.8.
Comment 14 Flavio Leitner 2009-04-13 12:31:42 EDT
Created attachment 339340 [details]
Comment 15 Takeshi Suzuki 2009-04-14 01:11:49 EDT
Created attachment 339417 [details]
Patch nosync with direct clean

Here is a patch. Our driver experts believe the patch resolves 
both of the deadlock and the endless handshaking.
Your feedback for the patch would be highly appreciated.

With the patch, 90% of netdump is successfully completed, but I still 
have 10% netdump failure with a new different symptom with this patch,
the system is rebooted with the following message before handshaking.

  poll_lock is locked, unable to take a netdump!

Our netdump expert is now investigating the failure.
Comment 16 Chris Ward 2009-04-14 04:26:18 EDT
Fujitsu, unfortunately, it's definitely too late for this one to make it into 4.8. Once there is a patch available that resolve the issues described, discuss with you partner manager what options are available.
Comment 18 Neil Horman 2009-04-17 15:02:24 EDT
I've been looking over this, and so far, I agree that we should be using disable_irq_nosync, but I'm concerned as to why changing from the use of e1000_intr to the subrouintes of the napi method helps.  We shouldn't need to do that.  It implies that the interrupt routine is deciding there are no frames to schedule reception for erroneously.  If you just change the e1000_netpoll routine to use disable_irq_nosync, and instrument e1000_clean, can you see it get called on the console of this system?

Hmm, I wonder if this in e1000_clean:
         * IMS will not auto-mask if INT_ASSERTED is not set, and if it is
         * not set, then the adapter didn't send an interrupt
        if (!(icr & E1000_ICR_INT_ASSERTED))
                return IRQ_NONE;

Isn't preventing the poll_controllers 'simulated interrupt' from doing its job, although one would think if the hardware got a frame, this would still get asserted, even if the irq was disabled in the 8259/ioapic as we've done via disable_irq)
Comment 19 Neil Horman 2009-04-17 15:07:38 EDT
Created attachment 340072 [details]
patch to force scheduling of napi during poll_controller

Please give this patch a try.  I like it better than the direct calling of the clean subroutines, since I'm not sure if any of the hardware access in e1000_clean is required to prevent the NIC from stalling in some way.

That said, this will likely _not_ prevent the poll_lock issue you ran into.  From what I can see that is the result of an unfortunate incident in which a cpu is halted while in the middle of a loop in net_rx_action.  I'll need to address that with a separate patch.
Comment 20 Neil Horman 2009-04-20 10:24:12 EDT
Created attachment 340350 [details]
follow on patch to correct poll_lock issue

heres a follow on patch to my previous one.  This patch should avoid rebooting when the poll_lock is held at the time of a netdump.  I've looked it over, and in the event the poll_lock is held, there is very little we can do save for resetting the lock and hoping for the best.  Please try this patch in conjunction with my last patch and let me know if all the issues are fixed up.  Thanks!
Comment 21 Larry Troan 2009-04-23 08:14:58 EDT
Do we have a test kernel for partner to verify the fix? If not, could someone build this through brew on top of the latest 4.8 kernel. I can put it on my people.page if necessary.

Partner will need this in a 4.8.z day0/early errata.
Comment 24 Neil Horman 2009-04-23 08:54:50 EDT
I'd like to look further at the subsequent deadlock.  The fact that you're getting those last two messages indicates your seeing a new problem.  Please send a tcpdump of that deadlock/infinite loop from the 16 core test with both the patches from comments 19 and 20 in place.  That will tell us if we're dealing with a real deadlock or an infinite loop
Comment 30 Neil Horman 2009-04-24 09:30:24 EDT
No, they've not replied to my request in comment 24.  I'd like to see the tcpdump from a a failed netdump please.
Comment 41 Neil Horman 2009-05-22 15:44:54 EDT
I'm looking at these logs, and by all rights, it looks like the dump is proceding normally, requests come in for subsequent memory pages, and netdump responds.  Ceratinly the netdump handshake has completed on all of these, which stands in opposition to whats commented on in comment #31.  Do you have corresponding tcpdumps with these?  As it currently stands the logs you've attached show a working netdump session.
Comment 49 RHEL Product and Program Management 2009-06-05 09:48:15 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 52 Vivek Goyal 2009-06-15 17:38:34 EDT
Committed in 89.4.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 56 Vivek Goyal 2010-10-18 17:39:38 EDT
Some errors/confusions while adding the bz to errata tool. Returning bz to MODIFIED state so that it can be added to errata.
Comment 58 Douglas Silas 2011-01-30 18:31:48 EST
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    New Contents:
Using netdump may have caused a kernel deadlock on some systems.
Comment 59 errata-xmlrpc 2011-02-16 10:21:48 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.