Bug 676875
Summary: | ixgbe: update to 3.0.12-k2 causing a panic on boot | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jason Baron <jbaron> | ||||||||
Component: | kernel | Assignee: | Andy Gospodarek <agospoda> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Weibing Zhang <atzhang> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 6.1 | CC: | atzhang, dtian, hjia, jbaron, jburke, knoel, kzhang, mbelangia, peterm, ypei | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | kernel-2.6.32-118.el6 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2011-05-19 12:42:28 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 676037 | ||||||||||
Attachments: |
|
Got it. Posting patch upstream now.... Posted: http://marc.info/?l=linux-netdev&m=129746077110725&w=2 We will see what Intel thinks about it. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. *** Bug 678459 has been marked as a duplicate of this bug. *** Patch(es) available on kernel-2.6.32-118.el6 While testing kernel-2.6.32-119.el6 & kernel-2.6.32-118.el6 on ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com. With eth0 using the igxbe driver, the kernel doesn't run into panic, but it prints message as pasted below. Meanwhile, eth0 cannot obtain an ip address via dhcp. Kernel repeats printing the message on console. Messages: pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200 pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID) pcieport 0000:00:0a.0: device [1166:0140] error status/mask=00001080/00002000 pcieport 0000:00:0a.0: [ 7] Bad DLLP pcieport 0000:00:0a.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=2200(Transmitter ID) ixgbe 0000:22:00.0: device [8086:10c7] error status/mask=00001040/00002000 ixgbe 0000:22:00.0: [ 6] Bad TLP ixgbe 0000:22:00.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: Error of this Agent(2200) is reported first pcieport 0000:00:0a.0: AER: Corrected error received: id=2200 pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200 pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID) pcieport 0000:00:0a.0: device [1166:0140] error status/mask=00001080/00002000 pcieport 0000:00:0a.0: [ 7] Bad DLLP pcieport 0000:00:0a.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=2200(Transmitter ID) ixgbe 0000:22:00.0: device [8086:10c7] error status/mask=00001040/00002000 ixgbe 0000:22:00.0: [ 6] Bad TLP ixgbe 0000:22:00.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: Error of this Agent(2200) is reported first pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200 pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID) pcieport 0000:00:0a.0: device [1166:0140] error status/mask=00001000/00002000 pcieport 0000:00:0a.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=2200(Receiver ID) ixgbe 0000:22:00.0: device [8086:10c7] error status/mask=00000041/00002000 ixgbe 0000:22:00.0: [ 0] Receiver Error ixgbe 0000:22:00.0: [ 6] Bad TLP ixgbe 0000:22:00.0: Error of this Agent(2200) is reported first pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200 pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID) pcieport 0000:00:0a.0: device [1166:0140] error status/mask=00001000/00002000 pcieport 0000:00:0a.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=2200(Receiver ID) ixgbe 0000:22:00.0: device [8086:10c7] error status/mask=00000041/00002000 ixgbe 0000:22:00.0: [ 0] Receiver Error (First) ixgbe 0000:22:00.0: [ 6] Bad TLP ixgbe 0000:22:00.0: Error of this Agent(2200) is reported first (In reply to comment #10) > While testing kernel-2.6.32-119.el6 & kernel-2.6.32-118.el6 on > ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com. With eth0 using the igxbe driver, > the kernel doesn't run into panic, but it prints message as pasted below. > Meanwhile, eth0 cannot obtain an ip address via dhcp. Kernel repeats printing > the message on console. > > > > Messages: > pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200 > pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link > Layer, id=0050(Transmitter ID) > pcieport 0000:00:0a.0: device [1166:0140] error status/mask=00001080/00002000 > pcieport 0000:00:0a.0: [ 7] Bad DLLP > pcieport 0000:00:0a.0: [12] Replay Timer Timeout > ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, > id=2200(Transmitter ID) > ixgbe 0000:22:00.0: device [8086:10c7] error status/mask=00001040/00002000 > ixgbe 0000:22:00.0: [ 6] Bad TLP > ixgbe 0000:22:00.0: [12] Replay Timer Timeout > ixgbe 0000:22:00.0: Error of this Agent(2200) is reported first Does this happen just after the ixgbe driver is loaded? Do you have dmesg output before these messages started to appear? I suspect this is not a new problem, but we did not see this until the ixgbe panic was fixed. Created attachment 483133 [details]
console log
Created attachment 483134 [details]
dmesg with kernel -119
(In reply to comment #11) > Does this happen just after the ixgbe driver is loaded? Do you have dmesg > output before these messages started to appear? I suspect this is not a new > problem, but we did not see this until the ixgbe panic was fixed. Console logs and dmesg are attached. Set eth0 to ONBOOT and DHCP. #cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE="eth0" HWADDR="00:1B:21:2C:83:B4" NM_CONTROLLED="yes" ONBOOT="yes" BOOTPROTO="dhcp" Booting with kernel-2.6.32-119.el6, here is the log from console. the message comes up after trying to obtain an IP address via DHCP. NET: Registered protocol family 10 lo: Disabled Privacy Extensions Bringing up loopback interface: [ OK ] Bringing up interface eth0: Determining IP information for eth0...ADDRCONF(NETDEV_UP): eth0: link is not ready ixgbe 0000:22:00.0: eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready eth0: IPv6 duplicate address fe80::21b:21ff:fe2c:83b4 detected! pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200 pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID) pcieport 0000:00:0a.0: device [1166:0140] error status/mask=00001080/00002000 pcieport 0000:00:0a.0: [ 7] Bad DLLP pcieport 0000:00:0a.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=2200(Transmitter ID) ixgbe 0000:22:00.0: device [8086:10c7] error status/mask=000010c1/00002000 ixgbe 0000:22:00.0: [ 0] Receiver Error (First) ixgbe 0000:22:00.0: [ 6] Bad TLP ixgbe 0000:22:00.0: [ 7] Bad DLLP ixgbe 0000:22:00.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: Error of this Agent(2200) is reported first pcieport 0000:00:0a.0: AER: Corrected error received: id=2200 Booting with kernel-2.6.32-120.el6, here is the log from console. the message comes up after the ixgbe driver is loaded. Welcome to Red Hat Enterprise Linux Server Starting udev: udev: starting version 147 shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 piix4_smbus 0000:00:08.0: SMBus Host Controller at 0x440, revision 0 sr 0:0:0:0: Attached scsi generic sg0 type 5 sd 2:0:0:0: Attached scsi generic sg1 type 0 scsi 2:1:0:0: Attached scsi generic sg2 type 0 scsi 2:1:1:0: Attached scsi generic sg3 type 0 scsi 2:3:0:0: Attached scsi generic sg4 type 13 dca service started, version 1.12.1 ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 3.0.12-k2 ixgbe: Copyright (c) 1999-2010 Intel Corporation. ixgbe 0000:22:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 ixgbe 0000:22:00.0: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 ixgbe 0000:22:00.0: (PCI Express:2.5Gb/s:Width x8) 00:1b:21:2c:83:b4 pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200 ixgbe 0000:22:00.0: MAC: 1, PHY: 4, PBA No: E18269-001 ixgbe 0000:22:00.0: Intel(R) 10 Gigabit Network Connection pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID) pcieport 0000:00:0a.0: device [1166:0140] error status/mask=00001080/00002000 pcieport 0000:00:0a.0: [ 7] Bad DLLP pcieport 0000:00:0a.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=2200(Transmitter ID) ixgbe 0000:22:00.0: device [8086:10c7] error status/mask=000010c1/00002000 ixgbe 0000:22:00.0: [ 0] Receiver Error (First) ixgbe 0000:22:00.0: [ 6] Bad TLP ixgbe 0000:22:00.0: [ 7] Bad DLLP ixgbe 0000:22:00.0: [12] Replay Timer Timeout ixgbe 0000:22:00.0: Error of this Agent(2200) is reported first pcieport 0000:00:0a.0: AER: Corrected error received: id=2200 ses 2:3:0:0: Attached Enclosure device EDAC MC: Ver: 2.1.0 Mar 7 2011 EDAC amd64_edac: Ver: 3.3.0 Mar 7 2011 EDAC amd64: ECC is enabled by BIOS. EDAC amd64: ECC is enabled by BIOS. EDAC MC0: Giving out device to 'amd64_edac' 'Family 10h': DEV 0000:00:18.2 EDAC MC1: Giving out device to 'amd64_edac' 'Family 10h': DEV 0000:00:19.2 EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI controller': DEV '0000:00:18.2' (POLLED) I've been testing with -122 on ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com and can reproduce the problem. I'm not sure if the driver or the NIC is to blame for this issue, but I hope to narrow it down soon. If this is the only system that demonstrates this problem, I think we can set this bug to VERIFIED. (In reply to comment #17) > I've been testing with -122 on ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com and > can reproduce the problem. I'm not sure if the driver or the NIC is to blame > for this issue, but I hope to narrow it down soon. If this is the only system > that demonstrates this problem, I think we can set this bug to VERIFIED. Confirmed with eng-ops, the NIC on ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com was connected to a private 10Gb network which didn't have DHCP server. https://engineering.redhat.com/rt3/Ticket/Display.html?id=104321 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |
Created attachment 478279 [details] boot panic message Description of problem: On lab machine cisco-b200m1-01.gsslab.rdu.redhat.com, I'm running into a panic on boot when networking is initializing. Version-Release number of selected component (if applicable): reproduced on kernel -115 and -102. How reproducible: boot box with kernel version >= -102 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: no panic Additional info: I bisected this to commit: [netdrv] ixgbe: update to upstream version 3.0.12-k2 commit 780e4d8bafbe46a9118b0cb78ceb4e39c1af7d22 here is lspci -v for the device: 06:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connectio n (rev 01) Subsystem: Cisco Systems Inc Device 004a Flags: bus master, fast devsel, latency 0, IRQ 32 Memory at b19a0000 (32-bit, non-prefetchable) [size=128K] Memory at b1940000 (32-bit, non-prefetchable) [size=256K] I/O ports at 1020 [size=32] Memory at b19c4000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at b2000000 [disabled] [size=256K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [60] MSI-X: Enable+ Count=18 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-25-b5-ff-ff-08-17-60 Kernel driver in use: ixgbe Kernel modules: ixgbe 06:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connectio n (rev 01) Subsystem: Cisco Systems Inc Device 004a Flags: bus master, fast devsel, latency 0, IRQ 42 Memory at b1980000 (32-bit, non-prefetchable) [size=128K] Memory at b1900000 (32-bit, non-prefetchable) [size=256K] I/O ports at 1000 [size=32] Memory at b19c0000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at b2040000 [disabled] [size=256K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [60] MSI-X: Enable+ Count=18 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-25-b5-ff-ff-08-17-60 Kernel driver in use: ixgbe Kernel modules: ixgbe