Description of problem: Booting both kernel 2.6.18-128.7.1 and 2.6.18-164 with kexec fails during initialization of the igb driver. System Information: * Supermicro X8DTT-IBX * Intel(R) Xeon(R) CPU X5570 @ 2.93GHz Version-Release number of selected component (if applicable): 2.6.18-128.7.1 2.6.18-164 (Changelog of 2.6.18-164.1.1 and .2.1 does not seem to contain relevant patches but I did not test them) How reproducible: Boot a 2.6.18-164 kernel. Try to restart the same kernel with kexec. Steps to Reproduce: 1. Boot 2.6.18-164 2. kexec -l /boot/vmlinuz-2.6.18-164.el5 --initrd=/boot/initrd-2.6.18-164.el5.img --command-line="$(cat /proc/cmdline)" 3. reboot Actual results: "Bringing up interface eth0: igb device eth0 does not seem to be present, delaying initialization" Expected results: "Bringing up interface eth0: OK" Additional info: See http://kerneltrap.org/mailarchive/linux-netdev/2009/3/21/5212234 for discussion and a *potential* patch (So far I could not test this fix on my system). This patch is now also part of the vanilla kernel tree: commit 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3 Author: Rafael J. Wysocki <rjw> Date: Tue Mar 31 21:23:50 2009 +0000 net/igb: Fix kexec with igb (rev. 3) Impact: Fix Yinghai Lu found one system with 82575EB where, in the kernel that is kexeced, probe igb failed with -2, the reason being that the adapter could not be brought back from D3 by the kexec kernel, most probably due to quirky hardware (it looks like the same behavior happened on forcedeth). Prevent igb from putting the adapter into D3 during shutdown except when we going to power off the system. For this purpose, seperate igb_shutdown() from igb_suspend() and use the appropriate PCI PM callbacks in both of them. Signed-off-by: "Rafael J. Wysocki" <rjw> Reported-by: Yinghai Lu <yinghai> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher> Signed-off-by: David S. Miller <davem>
Having also read the thread about igb: fix kexec with igb at http://lkml.org/lkml/2009/3/8/47 I don't think we can easily grab 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3 as it relies on kernel infrastructure we don't have around in RHEL5, namely 404cc2d8ce41ed4031958fba8e633767e8a2e028. However we could look at an earlier version of this patch that doesn't need the above http://lkml.org/lkml/2009/3/11/442. Andy what's your opinion on this?
I agree we cannot take commit 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3 Author: Rafael J. Wysocki <rjw> Date: Tue Mar 31 21:23:50 2009 +0000 net/igb: Fix kexec with igb (rev. 3) exactly as it is, but you should look at the intent of the patch and see if similar functionality can be added to rhel5 to meet the needs. I think you might be able to do that, but let me know if you have problems.
As I tried to explain in comment #1 I'd favour to take http://lkml.org/lkml/2009/3/11/442. Which seems to be a less aggressive approach, do you agree Andy?
(In reply to comment #3) > As I tried to explain in comment #1 I'd favour to take > http://lkml.org/lkml/2009/3/11/442. Which seems to be a less aggressive > approach, do you agree Andy? This patch is basically the same functionality as 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3 without the upstream pci changes added during 2.6.26 and 2.6.27. Using http://lkml.org/lkml/2009/3/11/442 directly or using a slightly modified version of 3fe7c4c9dca4fbbff92eb61a660690dad7029ec3 seems fine as they will be quite similar.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-176.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
I've just tried the new version 2.6.18-17*7* and can confirm that it fixes the bug. The kexec reboot works fine now including the initializing of eth0 (igb driver). Good job, thank you!
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html