Bug 747009

Summary: Kernel panic - not syncing: Fatal exception, on boot in MRG2.0 on HS21 hardware.
Product: Red Hat Enterprise MRG Reporter: IBM Bug Proxy <bugproxy>
Component: realtime-kernelAssignee: John Kacur <jkacur>
Status: CLOSED WONTFIX QA Contact: David Sommerseth <davids>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.0CC: bhu, jkachuck, lgoncalv, ovasik, wgomerin, williams
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-28 16:39:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Config file of system
none
Console of a failing system
none
HS20 boot hang with pci=nocrs radeon.modeset=0
none
proposed quirk for HS21 boot issue none

Description IBM Bug Proxy 2011-10-18 15:01:05 UTC
---Problem Description---
Kernel panic - not syncing: Fatal exception, on boot in MRG2.0 on HS21 xm hardware.  
 [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon]
 [<ffffffffa01230bf>] radeon_init+0xbf/0xc1 [radeon]
 [<ffffffff810001f9>] do_one_initcall+0x5e/0x14e
 [<ffffffff810748d6>] sys_init_module+0xd6/0x233
 [<ffffffff81002c9b>] system_call_fastpath+0x16/0x1b
 
Contact Information = Matthew Sabins mhsabins.com 
 
---uname output---
(Machine does not boot)  Linux elm9m96 2.6.33.9-rt31.74.el6rt.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux 
 
Machine Type = HS21 and only HS21 machines 
 
---System Hang---
 Machine does not boot.  
  
---Steps to Reproduce---
 Reboot on a system that is a HS21 machine with Radeon chipset.  LS20 LS21 HS22 x3550 and x3560 have no issues.  
 
---Kernel - Drivers Component Data--- 
Stack trace output:
 Call Trace:
 [<ffffffffa00cbfd4>] r100_fini+0x16/0x7f [radeon]
 [<ffffffffa00a6382>] radeon_device_fini+0x33/0x6a [radeon]
 [<ffffffffa00a70e6>] radeon_driver_unload_kms+0x2b/0x46 [radeon]
 [<ffffffffa00a72aa>] radeon_driver_load_kms+0x1a9/0x1bb [radeon]
 [<ffffffffa003b277>] drm_get_dev+0x3ba/0x4c6 [drm]
 [<ffffffffa00eed5c>] radeon_pci_probe+0x15/0x269 [radeon]
 [<ffffffff811c4c69>] local_pci_probe+0x17/0x1b
 [<ffffffff811c5a2c>] pci_device_probe+0xca/0xfa
 [<ffffffff81243dd1>] ? driver_sysfs_add+0x4c/0x71
 [<ffffffff81243f19>] driver_probe_device+0xa2/0x127
 [<ffffffff81243ffb>] __driver_attach+0x5d/0x81
 [<ffffffff81243f9e>] ? __driver_attach+0x0/0x81
 [<ffffffff81243506>] bus_for_each_dev+0x59/0x8e
 [<ffffffff81243d83>] driver_attach+0x1e/0x20
 [<ffffffff812439c4>] bus_add_driver+0xb9/0x209
 [<ffffffff812442ec>] driver_register+0x9e/0x10f
 [<ffffffff811c5cad>] __pci_register_driver+0x68/0xd8
 [<ffffffff8135b763>] ? printk+0x41/0x46
 [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon]
 [<ffffffffa0035e3f>] drm_init+0x75/0xdb [drm]
 [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon]
 [<ffffffffa01230bf>] radeon_init+0xbf/0xc1 [radeon]
 [<ffffffff810001f9>] do_one_initcall+0x5e/0x14e
 [<ffffffff810748d6>] sys_init_module+0xd6/0x233
 [<ffffffff81002c9b>] system_call_fastpath+0x16/0x1b
Code: 00 45 31 e4 41 bd 40 0e 00 00 48 89 fb eb 3d 48 81 bb c0 00 00 00 40 0e 00 00 48 8b 83 c8 00 00 00 76 08 8b 80 40 0e 00 00 eb 0d <44> 89 28 48 8b 83 c8 00 00 00 8b 40 04 a9 00 00 01 00 74 15 bf 
RIP  [<ffffffffa00ca65e>] r100_cp_fini+0x3c/0xa3 [radeon]
 RSP <ffff880429935bc8>
CR2: 0000000000000000
---[ end trace fe34c90b7ec603e1 ]---
Kernel panic - not syncing: Fatal exception
Pid: 807, comm: modprobe Tainted: G      D    ---------------    2.6.33.9-rt31.74.el6rt.x86_64 #1
Call Trace:
 [<ffffffff8135b668>] panic+0x89/0x143
 [<ffffffff8135eb78>] oops_end+0xae/0xbe
 [<ffffffff81025a22>] no_context+0x1fc/0x20b
 [<ffffffff8123cf75>] ? wait_for_xmitr+0x45/0x90
 [<ffffffff81025baf>] __bad_area_nosemaphore+0x17e/0x1a1
 [<ffffffff81025c2e>] bad_area+0x47/0x4e
 [<ffffffff813606f3>] do_page_fault+0x1a2/0x299
 [<ffffffff8135e08f>] page_fault+0x1f/0x30
 [<ffffffffa00ca65e>] ? r100_cp_fini+0x3c/0xa3 [radeon]
 [<ffffffffa00cbfd4>] r100_fini+0x16/0x7f [radeon]
 [<ffffffffa00a6382>] radeon_device_fini+0x33/0x6a [radeon]
 [<ffffffffa00a70e6>] radeon_driver_unload_kms+0x2b/0x46 [radeon]
 [<ffffffffa00a72aa>] radeon_driver_load_kms+0x1a9/0x1bb [radeon]
 [<ffffffffa003b277>] drm_get_dev+0x3ba/0x4c6 [drm]
 [<ffffffffa00eed5c>] radeon_pci_probe+0x15/0x269 [radeon]
 [<ffffffff811c4c69>] local_pci_probe+0x17/0x1b
 [<ffffffff811c5a2c>] pci_device_probe+0xca/0xfa
 [<ffffffff81243dd1>] ? driver_sysfs_add+0x4c/0x71
 [<ffffffff81243f19>] driver_probe_device+0xa2/0x127
 [<ffffffff81243ffb>] __driver_attach+0x5d/0x81
 [<ffffffff81243f9e>] ? __driver_attach+0x0/0x81
 [<ffffffff81243506>] bus_for_each_dev+0x59/0x8e
 [<ffffffff81243d83>] driver_attach+0x1e/0x20
 [<ffffffff812439c4>] bus_add_driver+0xb9/0x209
 [<ffffffff812442ec>] driver_register+0x9e/0x10f
 [<ffffffff811c5cad>] __pci_register_driver+0x68/0xd8
 [<ffffffff8135b763>] ? printk+0x41/0x46
 [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon]
 [<ffffffffa0035e3f>] drm_init+0x75/0xdb [drm]
 [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon]
 [<ffffffffa01230bf>] radeon_init+0xbf/0xc1 [radeon]
 [<ffffffff810001f9>] do_one_initcall+0x5e/0x14e
 [<ffffffff810748d6>] sys_init_module+0xd6/0x233
 [<ffffffff81002c9b>] system_call_fastpath+0x16/0x1b
 
Oops output:
 Oops: 0002 [#1] PREEMPT SMP 
 
[root@elm9m96 /]# cat /etc/*-release
Red Hat Enterprise Linux Server release 6.1 (Santiago)
Red Hat Enterprise Linux Server release 6.1 (Santiago)

[root@elm9m96 /]# lsmod
Module                  Size  Used by
ipv6                  322899  42 
dm_mirror              14067  0 
dm_region_hash         12136  1 dm_mirror
dm_log                 10120  2 dm_mirror,dm_region_hash
bnx2                   77268  0 
sg                     30186  0 
netxen_nic             93648  0 
microcode             112845  0 
serio_raw               4816  0 
iTCO_wdt               12060  0 
iTCO_vendor_support     3022  1 iTCO_wdt
i5k_amb                 5039  0 
i5000_edac              8833  0 
edac_core              46533  3 i5000_edac
ioatdma                58160  15 
dca                     7099  1 ioatdma
shpchp                 33448  0 
ext3                  133411  2 
jbd                    54480  1 ext3
mbcache                 7918  1 ext3
sd_mod                 38196  4 
crc_t10dif              1507  1 sd_mod
mptsas                 53001  3 
mptscsih               36826  1 mptsas
mptbase                93843  2 mptsas,mptscsih
scsi_transport_sas     35036  1 mptsas
ata_generic             3611  0 
pata_acpi               3667  0 
ata_piix               22652  0 
radeon                927495  1 
ttm                    66971  1 radeon
drm_kms_helper         34896  1 radeon
drm                   213686  3 radeon,ttm,drm_kms_helper
hwmon                   2464  2 i5k_amb,radeon
i2c_algo_bit            5728  1 radeon
i2c_core               31274  4 radeon,drm_kms_helper,drm,i2c_algo_bit
dm_mod                 75539  2 dm_mirror,dm_log

[root@elm9m96 /]# lspci
00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub (rev b1)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 2-3 (rev b1)
00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 3 (rev b1)
00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev b1)
00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 5 (rev b1)
00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 6 (rev b1)
00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 7 (rev b1)
00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev b1)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset SATA IDE Controller (rev 09)
01:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02)
03:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c3)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
05:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c3)
06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
07:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
07:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
09:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
09:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E2 (rev 01)
0c:00.0 Network controller: NetXen Incorporated XG Mgmt (rev 25)
0c:00.2 Ethernet controller: NetXen Incorporated BladeCenter-H 10-Gigabit Ethernet High Speed Daughter Card (rev 25)
0c:00.3 Ethernet controller: NetXen Incorporated BladeCenter-H 10-Gigabit Ethernet High Speed Daughter Card (rev 25)

Issue has been seen on the most updated firmware available.  
Issue has also been seen on 2 other MRG kernels.  

Mirroring Template

1. Server architecture(s): x86
2. Server type HS21
3. Other components involved, Driver
4. Does the server have the latest GA firmware? Yes
5. Has the problem been shown to occur on more than one system? Yes
6. Collect "sosreport" from machine problem was found on, and attach to bug.  Check
7. What is the latest official distro build on which this bug has been seen? 6.1
8. Steps to reproduce.  Just boot
9. Business justification. Failure to boot

Comment 1 IBM Bug Proxy 2011-10-18 15:01:18 UTC
Created attachment 528830 [details]
Config file of system

Comment 2 IBM Bug Proxy 2011-10-18 15:01:25 UTC
Created attachment 528831 [details]
Console of a failing system

Comment 3 IBM Bug Proxy 2011-10-18 16:03:18 UTC
------- Comment From niv.com 2011-10-18 11:51 EDT-------
Red Hat -- Clark Williams is aware of the issue and should be added to bug.

Clark, please send over ptr to any bits you want us to test on MRG 2.1 train as mentioned offline yesterday.  Also, if you're unable to reproduce this issue on your HS21xm, let us know!

Comment 4 Clark Williams 2011-10-18 18:10:52 UTC
Just re-provisioned one of my hs21xm's and it comes up properly with the MRG
2.0 kernel. 

[williams@hs21xm-2 ~]$ lspci | grep -i vga
01:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)

[williams@hs21xm-2 ~]$ uname -a
Linux hs21xm-2.farm.hsv.redhat.com 2.6.33.9-rt31.76.el6rt.x86_64 #1 SMP PREEMPT
RT Tue Oct 4 12:01:48 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

[williams@hs21xm-2 ~]$ cat /proc/cmdline 
ro root=/dev/mapper/vg_hs21xm2-lv_root rd_LVM_LV=vg_hs21xm2/lv_root
rd_LVM_LV=vg_hs21xm2/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8
SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto
pci=nocrs console=tty0 console=ttyS1,19200 radeon.modeset=0  rhgb quiet

Does your kernel command line look similar to the above?

Comment 5 IBM Bug Proxy 2011-10-18 19:40:43 UTC
------- Comment From kravetz.com 2011-10-18 15:36 EDT-------
The command line on our system did not contain "pci=nocrs" or "radeon.modeset=0".  After adding these arguments, we were able to boot MRG 2.0 kernels (well at least 74).

Did we miss the mention of these arguments in some release notes or other documentation?

Comment 6 Clark Williams 2011-10-18 19:59:39 UTC
the pci=nocrs argument was in the release notes (along with radeon.hw_i2c=0):

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2/html-single/MRG_Release_Notes/index.html#chap-MRG_Release_Notes-RT

I suspect that pci=nocrs is the key here.

Comment 7 IBM Bug Proxy 2011-10-18 20:00:34 UTC
------- Comment From kravetz.com 2011-10-18 15:58 EDT-------
They are documented in:
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2/pdf/MRG_Release_Notes/Red_Hat_Enterprise_MRG-2-MRG_Release_Notes-en-US.pdf

Comment 8 IBM Bug Proxy 2011-10-18 20:20:34 UTC
------- Comment From kravetz.com 2011-10-18 16:13 EDT-------
We did not read the release notes.  However, upon closer examination the notes say to add "pci=use_crs", and we can not find mention of "pci=nocrs".

Comment 9 IBM Bug Proxy 2011-10-18 21:20:36 UTC
------- Comment From niv.com 2011-10-18 17:11 EDT-------
I had skimmed the release notes (this is quite a while back now). Nothing I saw had jumped out at me as an urgent or needed item, somehow. I didn't get a "do this else your system will not boot".

I'd recommend clarifying the documentation to correct the parameter setting needed (nocrs?) at the very least.

------- Comment From niv.com 2011-10-18 17:19 EDT-------
Lowering priority due to workaround via boot params.

Comment 10 IBM Bug Proxy 2011-10-21 17:40:53 UTC
Created attachment 529544 [details]
HS20 boot hang with pci=nocrs radeon.modeset=0


------- Comment on attachment From pc.com 2011-10-21 13:34 EDT-------


I just hit this problem on an HS20.  Adding "pci=nocrs" and "radeon.modeset=0" (I don't see either in the release notes) seems to get further, but it panics, oops, or hangs later on in the boot...attaching a boot log.

Comment 11 IBM Bug Proxy 2011-10-21 19:20:38 UTC
------- Comment From pc.com 2011-10-21 15:13 EDT-------
(In reply to comment #27)
> I just hit this problem on an HS20.  Adding "pci=nocrs" and "radeon.modeset=0"
> (I don't see either in the release notes) seems to get further, but it panics,
> oops, or hangs later on in the boot...attaching a boot log.

I posted this too soon.  As will be obvious in the attachment, I was running the .67 kernel.  I've updated to the .75 kernel, and the HS20 has booted successfully..

Comment 12 Clark Williams 2011-11-07 16:39:22 UTC
Closing as working in the current release

Comment 13 Clark Williams 2011-11-07 20:14:11 UTC
reopening to look at adding quirk to deal with HS21 on boot with 2.0 kernel

Comment 14 Clark Williams 2011-11-07 21:52:19 UTC
Created attachment 532141 [details]
proposed quirk for HS21 boot issue

the attached patch has not been tested, it is only provided to illustrate how we can catch that we're booting on an HS21 and automagically set pci=nocrs. I know that there is further testing going on so will hold off on adding this patch to the kernel until I hear more about whether it's always needed, whether it's HS21 and not HS21xm, etc.

Comment 15 IBM Bug Proxy 2012-02-21 21:09:50 UTC
------- Comment From niv.com 2012-02-21 15:50 EDT-------
Will see if we can retest on the MRG 2.0 testing train later this week.

Comment 16 IBM Bug Proxy 2012-02-27 20:00:39 UTC
------- Comment From niv.com 2012-02-27 14:56 EDT-------
Just wanted to confirm with Clark if there is going to be a MRG 2.0 errata kernel. If there is not, we should close this as WILLNOTFIX.

Comment 17 Clark Williams 2012-02-28 16:39:05 UTC
We have no plans to fix this in the 2.6.33.7 kernel.

Just confirmed that the 3.0.x rt kernel series boots properly with any combination of radeon.modeset=0 and pci=nocrs on an hs21xm blade. 

Closing this as WONTFIX

Comment 18 IBM Bug Proxy 2012-03-01 00:40:35 UTC
------- Comment From niv.com 2012-02-29 19:33 EDT-------
Clark,

Closing this from our end as well. The only thing we'll do is make sure the future documentation is right on MRG 2.2. If it's not, we'll open a new bug.