RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 676875 - ixgbe: update to 3.0.12-k2 causing a panic on boot
Summary: ixgbe: update to 3.0.12-k2 causing a panic on boot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Andy Gospodarek
QA Contact: Weibing Zhang
URL:
Whiteboard:
: 678459 (view as bug list)
Depends On:
Blocks: 6.1KnownIssues
TreeView+ depends on / blocked
 
Reported: 2011-02-11 16:46 UTC by Jason Baron
Modified: 2018-11-14 14:33 UTC (History)
10 users (show)

Fixed In Version: kernel-2.6.32-118.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-19 12:42:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
boot panic message (3.83 KB, application/text)
2011-02-11 16:46 UTC, Jason Baron
no flags Details
console log (715.38 KB, application/octet-stream)
2011-03-09 09:33 UTC, Weibing Zhang
no flags Details
dmesg with kernel -119 (52.60 KB, application/octet-stream)
2011-03-09 09:34 UTC, Weibing Zhang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 11:58:07 UTC

Description Jason Baron 2011-02-11 16:46:03 UTC
Created attachment 478279 [details]
boot panic message

Description of problem:

On lab machine cisco-b200m1-01.gsslab.rdu.redhat.com, I'm running into a panic on boot when networking is initializing.

Version-Release number of selected component (if applicable):

reproduced on kernel -115 and -102.

How reproducible:

boot box with kernel version >= -102


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:

no panic

Additional info:

I bisected this to commit:

[netdrv] ixgbe: update to upstream version 3.0.12-k2
commit 780e4d8bafbe46a9118b0cb78ceb4e39c1af7d22

here is lspci -v for the device:

06:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connectio
n (rev 01)
	Subsystem: Cisco Systems Inc Device 004a
	Flags: bus master, fast devsel, latency 0, IRQ 32
	Memory at b19a0000 (32-bit, non-prefetchable) [size=128K]
	Memory at b1940000 (32-bit, non-prefetchable) [size=256K]
	I/O ports at 1020 [size=32]
	Memory at b19c4000 (32-bit, non-prefetchable) [size=16K]
	Expansion ROM at b2000000 [disabled] [size=256K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [60] MSI-X: Enable+ Count=18 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 00-25-b5-ff-ff-08-17-60
	Kernel driver in use: ixgbe
	Kernel modules: ixgbe

06:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connectio
n (rev 01)
	Subsystem: Cisco Systems Inc Device 004a
	Flags: bus master, fast devsel, latency 0, IRQ 42
	Memory at b1980000 (32-bit, non-prefetchable) [size=128K]
	Memory at b1900000 (32-bit, non-prefetchable) [size=256K]
	I/O ports at 1000 [size=32]
	Memory at b19c0000 (32-bit, non-prefetchable) [size=16K]
	Expansion ROM at b2040000 [disabled] [size=256K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [60] MSI-X: Enable+ Count=18 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 00-25-b5-ff-ff-08-17-60
	Kernel driver in use: ixgbe
	Kernel modules: ixgbe

Comment 2 Andy Gospodarek 2011-02-11 21:19:29 UTC
Got it.  Posting patch upstream now....

Comment 3 Andy Gospodarek 2011-02-11 22:04:01 UTC
Posted:

http://marc.info/?l=linux-netdev&m=129746077110725&w=2

We will see what Intel thinks about it.

Comment 4 RHEL Program Management 2011-02-15 18:00:01 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 Andy Gospodarek 2011-02-18 21:53:46 UTC
*** Bug 678459 has been marked as a duplicate of this bug. ***

Comment 7 Aristeu Rozanski 2011-02-23 18:36:04 UTC
Patch(es) available on kernel-2.6.32-118.el6

Comment 10 Weibing Zhang 2011-03-08 08:37:06 UTC
While testing kernel-2.6.32-119.el6 & kernel-2.6.32-118.el6 on ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com. With eth0 using the igxbe driver, the kernel doesn't run into panic, but it prints message as pasted below. Meanwhile, eth0 cannot obtain an ip address via dhcp. Kernel repeats printing the message on console.



Messages:
pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200
pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID)
pcieport 0000:00:0a.0:   device [1166:0140] error status/mask=00001080/00002000
pcieport 0000:00:0a.0:    [ 7] Bad DLLP              
pcieport 0000:00:0a.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=2200(Transmitter ID)
ixgbe 0000:22:00.0:   device [8086:10c7] error status/mask=00001040/00002000
ixgbe 0000:22:00.0:    [ 6] Bad TLP               
ixgbe 0000:22:00.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0:   Error of this Agent(2200) is reported first
pcieport 0000:00:0a.0: AER: Corrected error received: id=2200
pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200
pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID)
pcieport 0000:00:0a.0:   device [1166:0140] error status/mask=00001080/00002000
pcieport 0000:00:0a.0:    [ 7] Bad DLLP              
pcieport 0000:00:0a.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=2200(Transmitter ID)
ixgbe 0000:22:00.0:   device [8086:10c7] error status/mask=00001040/00002000
ixgbe 0000:22:00.0:    [ 6] Bad TLP               
ixgbe 0000:22:00.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0:   Error of this Agent(2200) is reported first
pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200
pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID)
pcieport 0000:00:0a.0:   device [1166:0140] error status/mask=00001000/00002000
pcieport 0000:00:0a.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=2200(Receiver ID)
ixgbe 0000:22:00.0:   device [8086:10c7] error status/mask=00000041/00002000
ixgbe 0000:22:00.0:    [ 0] Receiver Error        
ixgbe 0000:22:00.0:    [ 6] Bad TLP               
ixgbe 0000:22:00.0:   Error of this Agent(2200) is reported first
pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200
pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID)
pcieport 0000:00:0a.0:   device [1166:0140] error status/mask=00001000/00002000
pcieport 0000:00:0a.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=2200(Receiver ID)
ixgbe 0000:22:00.0:   device [8086:10c7] error status/mask=00000041/00002000
ixgbe 0000:22:00.0:    [ 0] Receiver Error         (First)
ixgbe 0000:22:00.0:    [ 6] Bad TLP               
ixgbe 0000:22:00.0:   Error of this Agent(2200) is reported first

Comment 11 Andy Gospodarek 2011-03-08 18:23:02 UTC
(In reply to comment #10)
> While testing kernel-2.6.32-119.el6 & kernel-2.6.32-118.el6 on
> ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com. With eth0 using the igxbe driver,
> the kernel doesn't run into panic, but it prints message as pasted below.
> Meanwhile, eth0 cannot obtain an ip address via dhcp. Kernel repeats printing
> the message on console.
> 
> 
> 
> Messages:
> pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200
> pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link
> Layer, id=0050(Transmitter ID)
> pcieport 0000:00:0a.0:   device [1166:0140] error status/mask=00001080/00002000
> pcieport 0000:00:0a.0:    [ 7] Bad DLLP              
> pcieport 0000:00:0a.0:    [12] Replay Timer Timeout  
> ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer,
> id=2200(Transmitter ID)
> ixgbe 0000:22:00.0:   device [8086:10c7] error status/mask=00001040/00002000
> ixgbe 0000:22:00.0:    [ 6] Bad TLP               
> ixgbe 0000:22:00.0:    [12] Replay Timer Timeout  
> ixgbe 0000:22:00.0:   Error of this Agent(2200) is reported first

Does this happen just after the ixgbe driver is loaded?  Do you have dmesg output before these messages started to appear?  I suspect this is not a new problem, but we did not see this until the ixgbe panic was fixed.

Comment 14 Weibing Zhang 2011-03-09 09:33:51 UTC
Created attachment 483133 [details]
console log

Comment 15 Weibing Zhang 2011-03-09 09:34:41 UTC
Created attachment 483134 [details]
dmesg with kernel -119

Comment 16 Weibing Zhang 2011-03-09 09:36:38 UTC
(In reply to comment #11)
> Does this happen just after the ixgbe driver is loaded?  Do you have dmesg
> output before these messages started to appear?  I suspect this is not a new
> problem, but we did not see this until the ixgbe panic was fixed.

Console logs and dmesg are attached.

Set eth0 to ONBOOT and DHCP.
#cat /etc/sysconfig/network-scripts/ifcfg-eth0 
DEVICE="eth0"
HWADDR="00:1B:21:2C:83:B4"
NM_CONTROLLED="yes"
ONBOOT="yes"
BOOTPROTO="dhcp"

Booting with kernel-2.6.32-119.el6, here is the log from console. the message comes up after trying to obtain an IP address via DHCP.

NET: Registered protocol family 10
lo: Disabled Privacy Extensions
Bringing up loopback interface:  [  OK  ]
Bringing up interface eth0:  
Determining IP information for eth0...ADDRCONF(NETDEV_UP): eth0: link is not ready
ixgbe 0000:22:00.0: eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: IPv6 duplicate address fe80::21b:21ff:fe2c:83b4 detected!
pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200
pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID)
pcieport 0000:00:0a.0:   device [1166:0140] error status/mask=00001080/00002000
pcieport 0000:00:0a.0:    [ 7] Bad DLLP              
pcieport 0000:00:0a.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=2200(Transmitter ID)
ixgbe 0000:22:00.0:   device [8086:10c7] error status/mask=000010c1/00002000
ixgbe 0000:22:00.0:    [ 0] Receiver Error         (First)
ixgbe 0000:22:00.0:    [ 6] Bad TLP               
ixgbe 0000:22:00.0:    [ 7] Bad DLLP              
ixgbe 0000:22:00.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0:   Error of this Agent(2200) is reported first
pcieport 0000:00:0a.0: AER: Corrected error received: id=2200


Booting with kernel-2.6.32-120.el6, here is the log from console. the message comes up after the ixgbe driver is loaded.

		Welcome to Red Hat Enterprise Linux Server
Starting udev: udev: starting version 147
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
piix4_smbus 0000:00:08.0: SMBus Host Controller at 0x440, revision 0
sr 0:0:0:0: Attached scsi generic sg0 type 5
sd 2:0:0:0: Attached scsi generic sg1 type 0
scsi 2:1:0:0: Attached scsi generic sg2 type 0
scsi 2:1:1:0: Attached scsi generic sg3 type 0
scsi 2:3:0:0: Attached scsi generic sg4 type 13
dca service started, version 1.12.1
ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 3.0.12-k2
ixgbe: Copyright (c) 1999-2010 Intel Corporation.
ixgbe 0000:22:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ixgbe 0000:22:00.0: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
ixgbe 0000:22:00.0: (PCI Express:2.5Gb/s:Width x8) 00:1b:21:2c:83:b4
pcieport 0000:00:0a.0: AER: Multiple Corrected error received: id=2200
ixgbe 0000:22:00.0: MAC: 1, PHY: 4, PBA No: E18269-001
ixgbe 0000:22:00.0: Intel(R) 10 Gigabit Network Connection
pcieport 0000:00:0a.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0050(Transmitter ID)
pcieport 0000:00:0a.0:   device [1166:0140] error status/mask=00001080/00002000
pcieport 0000:00:0a.0:    [ 7] Bad DLLP              
pcieport 0000:00:0a.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=2200(Transmitter ID)
ixgbe 0000:22:00.0:   device [8086:10c7] error status/mask=000010c1/00002000
ixgbe 0000:22:00.0:    [ 0] Receiver Error         (First)
ixgbe 0000:22:00.0:    [ 6] Bad TLP               
ixgbe 0000:22:00.0:    [ 7] Bad DLLP              
ixgbe 0000:22:00.0:    [12] Replay Timer Timeout  
ixgbe 0000:22:00.0:   Error of this Agent(2200) is reported first
pcieport 0000:00:0a.0: AER: Corrected error received: id=2200
ses 2:3:0:0: Attached Enclosure device
EDAC MC: Ver: 2.1.0 Mar  7 2011
EDAC amd64_edac:  Ver: 3.3.0 Mar  7 2011
EDAC amd64: ECC is enabled by BIOS.
EDAC amd64: ECC is enabled by BIOS.
EDAC MC0: Giving out device to 'amd64_edac' 'Family 10h': DEV 0000:00:18.2
EDAC MC1: Giving out device to 'amd64_edac' 'Family 10h': DEV 0000:00:19.2
EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI controller': DEV '0000:00:18.2' (POLLED)

Comment 17 Andy Gospodarek 2011-03-11 14:00:50 UTC
I've been testing with -122 on ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com and can reproduce the problem.  I'm not sure if the driver or the NIC is to blame for this issue, but I hope to narrow it down soon.  If this is the only system that demonstrates this problem, I think we can set this bug to VERIFIED.

Comment 18 Dayong Tian 2011-03-14 01:32:42 UTC
(In reply to comment #17)
> I've been testing with -122 on ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com and
> can reproduce the problem.  I'm not sure if the driver or the NIC is to blame
> for this issue, but I hope to narrow it down soon.  If this is the only system
> that demonstrates this problem, I think we can set this bug to VERIFIED.

Confirmed with eng-ops, the NIC on ibm-x3655-04.ovirt.rhts.eng.bos.redhat.com was connected to a private 10Gb network which didn't have DHCP server.
https://engineering.redhat.com/rt3/Ticket/Display.html?id=104321

Comment 21 errata-xmlrpc 2011-05-19 12:42:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html


Note You need to log in before you can comment on or make changes to this bug.