Bug 527424 - igb driver does not work with kexec
Summary: igb driver does not work with kexec
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Stefan Assmann
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 527955
TreeView+ depends on / blocked
 
Reported: 2009-10-06 11:20 UTC by Karsten Weiss
Modified: 2010-03-30 07:46 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 07:46:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Karsten Weiss 2009-10-06 11:20:04 UTC
Description of problem:

Booting both kernel 2.6.18-128.7.1 and 2.6.18-164 with kexec fails
during initialization of the igb driver.

System Information:
* Supermicro X8DTT-IBX
* Intel(R) Xeon(R) CPU X5570 @ 2.93GHz

Version-Release number of selected component (if applicable):
2.6.18-128.7.1
2.6.18-164
(Changelog of 2.6.18-164.1.1 and .2.1 does not seem to contain relevant patches but I did not test them)

How reproducible:
Boot a 2.6.18-164 kernel. Try to restart the same kernel with kexec.

Steps to Reproduce:
1. Boot 2.6.18-164
2. kexec -l /boot/vmlinuz-2.6.18-164.el5 --initrd=/boot/initrd-2.6.18-164.el5.img --command-line="$(cat /proc/cmdline)"
3. reboot
  
Actual results:

"Bringing up interface eth0:
igb device eth0 does not seem to be present, delaying initialization"


Expected results:

"Bringing up interface eth0: OK"

Additional info:

See
http://kerneltrap.org/mailarchive/linux-netdev/2009/3/21/5212234
for discussion and a *potential* patch (So far I could not test this
fix on my system). This patch is now also part of the vanilla kernel
tree:

commit 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3
Author: Rafael J. Wysocki <rjw>
Date:   Tue Mar 31 21:23:50 2009 +0000

    net/igb: Fix kexec with igb (rev. 3)

    Impact: Fix

    Yinghai Lu found one system with 82575EB where, in the kernel that is
    kexeced, probe igb failed with -2, the reason being that the adapter
    could not be brought back from D3 by the kexec kernel, most probably
    due to quirky hardware (it looks like the same behavior happened on
    forcedeth).

    Prevent igb from putting the adapter into D3 during shutdown except
    when we going to power off the system.  For this purpose, seperate
    igb_shutdown() from igb_suspend() and use the appropriate PCI PM
    callbacks in both of them.

    Signed-off-by: "Rafael J. Wysocki" <rjw>
    Reported-by: Yinghai Lu <yinghai>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher>
    Signed-off-by: David S. Miller <davem>

Comment 1 Stefan Assmann 2009-10-15 13:31:21 UTC
Having also read the thread about igb: fix kexec with igb at 
http://lkml.org/lkml/2009/3/8/47 I don't think we can easily grab 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3 as it relies on kernel infrastructure we don't have around in RHEL5, namely 404cc2d8ce41ed4031958fba8e633767e8a2e028. However we could look at an earlier version of this patch that doesn't need the above http://lkml.org/lkml/2009/3/11/442.

Andy what's your opinion on this?

Comment 2 Andy Gospodarek 2009-10-15 15:40:23 UTC
I agree we cannot take

commit 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3
Author: Rafael J. Wysocki <rjw>
Date:   Tue Mar 31 21:23:50 2009 +0000

    net/igb: Fix kexec with igb (rev. 3)

exactly as it is, but you should look at the intent of the patch and see if similar functionality can be added to rhel5 to meet the needs.  I think you might be able to do that, but let me know if you have problems.

Comment 3 Stefan Assmann 2009-10-15 15:47:39 UTC
As I tried to explain in comment #1 I'd favour to take http://lkml.org/lkml/2009/3/11/442. Which seems to be a less aggressive approach, do you agree Andy?

Comment 4 Andy Gospodarek 2009-10-15 17:14:58 UTC
(In reply to comment #3)
> As I tried to explain in comment #1 I'd favour to take
> http://lkml.org/lkml/2009/3/11/442. Which seems to be a less aggressive
> approach, do you agree Andy?  

This patch is basically the same functionality as 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3 without the upstream pci changes added during 2.6.26 and 2.6.27.

Using http://lkml.org/lkml/2009/3/11/442 directly or using a slightly modified version of 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3 seems fine as they will be quite similar.

Comment 7 RHEL Program Management 2009-11-25 23:10:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Don Zickus 2009-12-02 21:03:19 UTC
in kernel-2.6.18-176.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 10 Don Zickus 2009-12-02 21:14:39 UTC
in kernel-2.6.18-176.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 12 Karsten Weiss 2009-12-09 09:08:18 UTC
I've just tried the new version 2.6.18-17*7* and can confirm that it fixes the bug. The kexec reboot works fine now including the initializing of eth0 (igb driver). Good job, thank you!

Comment 15 errata-xmlrpc 2010-03-30 07:46:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.