Hide Forgot
---Problem Description--- Kernel panic - not syncing: Fatal exception, on boot in MRG2.0 on HS21 xm hardware. [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon] [<ffffffffa01230bf>] radeon_init+0xbf/0xc1 [radeon] [<ffffffff810001f9>] do_one_initcall+0x5e/0x14e [<ffffffff810748d6>] sys_init_module+0xd6/0x233 [<ffffffff81002c9b>] system_call_fastpath+0x16/0x1b Contact Information = Matthew Sabins mhsabins.com ---uname output--- (Machine does not boot) Linux elm9m96 2.6.33.9-rt31.74.el6rt.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux Machine Type = HS21 and only HS21 machines ---System Hang--- Machine does not boot. ---Steps to Reproduce--- Reboot on a system that is a HS21 machine with Radeon chipset. LS20 LS21 HS22 x3550 and x3560 have no issues. ---Kernel - Drivers Component Data--- Stack trace output: Call Trace: [<ffffffffa00cbfd4>] r100_fini+0x16/0x7f [radeon] [<ffffffffa00a6382>] radeon_device_fini+0x33/0x6a [radeon] [<ffffffffa00a70e6>] radeon_driver_unload_kms+0x2b/0x46 [radeon] [<ffffffffa00a72aa>] radeon_driver_load_kms+0x1a9/0x1bb [radeon] [<ffffffffa003b277>] drm_get_dev+0x3ba/0x4c6 [drm] [<ffffffffa00eed5c>] radeon_pci_probe+0x15/0x269 [radeon] [<ffffffff811c4c69>] local_pci_probe+0x17/0x1b [<ffffffff811c5a2c>] pci_device_probe+0xca/0xfa [<ffffffff81243dd1>] ? driver_sysfs_add+0x4c/0x71 [<ffffffff81243f19>] driver_probe_device+0xa2/0x127 [<ffffffff81243ffb>] __driver_attach+0x5d/0x81 [<ffffffff81243f9e>] ? __driver_attach+0x0/0x81 [<ffffffff81243506>] bus_for_each_dev+0x59/0x8e [<ffffffff81243d83>] driver_attach+0x1e/0x20 [<ffffffff812439c4>] bus_add_driver+0xb9/0x209 [<ffffffff812442ec>] driver_register+0x9e/0x10f [<ffffffff811c5cad>] __pci_register_driver+0x68/0xd8 [<ffffffff8135b763>] ? printk+0x41/0x46 [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon] [<ffffffffa0035e3f>] drm_init+0x75/0xdb [drm] [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon] [<ffffffffa01230bf>] radeon_init+0xbf/0xc1 [radeon] [<ffffffff810001f9>] do_one_initcall+0x5e/0x14e [<ffffffff810748d6>] sys_init_module+0xd6/0x233 [<ffffffff81002c9b>] system_call_fastpath+0x16/0x1b Code: 00 45 31 e4 41 bd 40 0e 00 00 48 89 fb eb 3d 48 81 bb c0 00 00 00 40 0e 00 00 48 8b 83 c8 00 00 00 76 08 8b 80 40 0e 00 00 eb 0d <44> 89 28 48 8b 83 c8 00 00 00 8b 40 04 a9 00 00 01 00 74 15 bf RIP [<ffffffffa00ca65e>] r100_cp_fini+0x3c/0xa3 [radeon] RSP <ffff880429935bc8> CR2: 0000000000000000 ---[ end trace fe34c90b7ec603e1 ]--- Kernel panic - not syncing: Fatal exception Pid: 807, comm: modprobe Tainted: G D --------------- 2.6.33.9-rt31.74.el6rt.x86_64 #1 Call Trace: [<ffffffff8135b668>] panic+0x89/0x143 [<ffffffff8135eb78>] oops_end+0xae/0xbe [<ffffffff81025a22>] no_context+0x1fc/0x20b [<ffffffff8123cf75>] ? wait_for_xmitr+0x45/0x90 [<ffffffff81025baf>] __bad_area_nosemaphore+0x17e/0x1a1 [<ffffffff81025c2e>] bad_area+0x47/0x4e [<ffffffff813606f3>] do_page_fault+0x1a2/0x299 [<ffffffff8135e08f>] page_fault+0x1f/0x30 [<ffffffffa00ca65e>] ? r100_cp_fini+0x3c/0xa3 [radeon] [<ffffffffa00cbfd4>] r100_fini+0x16/0x7f [radeon] [<ffffffffa00a6382>] radeon_device_fini+0x33/0x6a [radeon] [<ffffffffa00a70e6>] radeon_driver_unload_kms+0x2b/0x46 [radeon] [<ffffffffa00a72aa>] radeon_driver_load_kms+0x1a9/0x1bb [radeon] [<ffffffffa003b277>] drm_get_dev+0x3ba/0x4c6 [drm] [<ffffffffa00eed5c>] radeon_pci_probe+0x15/0x269 [radeon] [<ffffffff811c4c69>] local_pci_probe+0x17/0x1b [<ffffffff811c5a2c>] pci_device_probe+0xca/0xfa [<ffffffff81243dd1>] ? driver_sysfs_add+0x4c/0x71 [<ffffffff81243f19>] driver_probe_device+0xa2/0x127 [<ffffffff81243ffb>] __driver_attach+0x5d/0x81 [<ffffffff81243f9e>] ? __driver_attach+0x0/0x81 [<ffffffff81243506>] bus_for_each_dev+0x59/0x8e [<ffffffff81243d83>] driver_attach+0x1e/0x20 [<ffffffff812439c4>] bus_add_driver+0xb9/0x209 [<ffffffff812442ec>] driver_register+0x9e/0x10f [<ffffffff811c5cad>] __pci_register_driver+0x68/0xd8 [<ffffffff8135b763>] ? printk+0x41/0x46 [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon] [<ffffffffa0035e3f>] drm_init+0x75/0xdb [drm] [<ffffffffa0123000>] ? radeon_init+0x0/0xc1 [radeon] [<ffffffffa01230bf>] radeon_init+0xbf/0xc1 [radeon] [<ffffffff810001f9>] do_one_initcall+0x5e/0x14e [<ffffffff810748d6>] sys_init_module+0xd6/0x233 [<ffffffff81002c9b>] system_call_fastpath+0x16/0x1b Oops output: Oops: 0002 [#1] PREEMPT SMP [root@elm9m96 /]# cat /etc/*-release Red Hat Enterprise Linux Server release 6.1 (Santiago) Red Hat Enterprise Linux Server release 6.1 (Santiago) [root@elm9m96 /]# lsmod Module Size Used by ipv6 322899 42 dm_mirror 14067 0 dm_region_hash 12136 1 dm_mirror dm_log 10120 2 dm_mirror,dm_region_hash bnx2 77268 0 sg 30186 0 netxen_nic 93648 0 microcode 112845 0 serio_raw 4816 0 iTCO_wdt 12060 0 iTCO_vendor_support 3022 1 iTCO_wdt i5k_amb 5039 0 i5000_edac 8833 0 edac_core 46533 3 i5000_edac ioatdma 58160 15 dca 7099 1 ioatdma shpchp 33448 0 ext3 133411 2 jbd 54480 1 ext3 mbcache 7918 1 ext3 sd_mod 38196 4 crc_t10dif 1507 1 sd_mod mptsas 53001 3 mptscsih 36826 1 mptsas mptbase 93843 2 mptsas,mptscsih scsi_transport_sas 35036 1 mptsas ata_generic 3611 0 pata_acpi 3667 0 ata_piix 22652 0 radeon 927495 1 ttm 66971 1 radeon drm_kms_helper 34896 1 radeon drm 213686 3 radeon,ttm,drm_kms_helper hwmon 2464 2 i5k_amb,radeon i2c_algo_bit 5728 1 radeon i2c_core 31274 4 radeon,drm_kms_helper,drm,i2c_algo_bit dm_mod 75539 2 dm_mirror,dm_log [root@elm9m96 /]# lspci 00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub (rev b1) 00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 2-3 (rev b1) 00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 3 (rev b1) 00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev b1) 00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 5 (rev b1) 00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 6 (rev b1) 00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 7 (rev b1) 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev b1) 00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1) 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1) 00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1) 00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1) 00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1) 00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1) 00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1) 00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09) 00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09) 00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09) 00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset SATA IDE Controller (rev 09) 01:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) 02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02) 03:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c3) 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12) 05:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c3) 06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12) 07:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01) 07:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01) 09:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01) 09:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E2 (rev 01) 0c:00.0 Network controller: NetXen Incorporated XG Mgmt (rev 25) 0c:00.2 Ethernet controller: NetXen Incorporated BladeCenter-H 10-Gigabit Ethernet High Speed Daughter Card (rev 25) 0c:00.3 Ethernet controller: NetXen Incorporated BladeCenter-H 10-Gigabit Ethernet High Speed Daughter Card (rev 25) Issue has been seen on the most updated firmware available. Issue has also been seen on 2 other MRG kernels. Mirroring Template 1. Server architecture(s): x86 2. Server type HS21 3. Other components involved, Driver 4. Does the server have the latest GA firmware? Yes 5. Has the problem been shown to occur on more than one system? Yes 6. Collect "sosreport" from machine problem was found on, and attach to bug. Check 7. What is the latest official distro build on which this bug has been seen? 6.1 8. Steps to reproduce. Just boot 9. Business justification. Failure to boot
Created attachment 528830 [details] Config file of system
Created attachment 528831 [details] Console of a failing system
------- Comment From niv.com 2011-10-18 11:51 EDT------- Red Hat -- Clark Williams is aware of the issue and should be added to bug. Clark, please send over ptr to any bits you want us to test on MRG 2.1 train as mentioned offline yesterday. Also, if you're unable to reproduce this issue on your HS21xm, let us know!
Just re-provisioned one of my hs21xm's and it comes up properly with the MRG 2.0 kernel. [williams@hs21xm-2 ~]$ lspci | grep -i vga 01:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) [williams@hs21xm-2 ~]$ uname -a Linux hs21xm-2.farm.hsv.redhat.com 2.6.33.9-rt31.76.el6rt.x86_64 #1 SMP PREEMPT RT Tue Oct 4 12:01:48 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux [williams@hs21xm-2 ~]$ cat /proc/cmdline ro root=/dev/mapper/vg_hs21xm2-lv_root rd_LVM_LV=vg_hs21xm2/lv_root rd_LVM_LV=vg_hs21xm2/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto pci=nocrs console=tty0 console=ttyS1,19200 radeon.modeset=0 rhgb quiet Does your kernel command line look similar to the above?
------- Comment From kravetz.com 2011-10-18 15:36 EDT------- The command line on our system did not contain "pci=nocrs" or "radeon.modeset=0". After adding these arguments, we were able to boot MRG 2.0 kernels (well at least 74). Did we miss the mention of these arguments in some release notes or other documentation?
the pci=nocrs argument was in the release notes (along with radeon.hw_i2c=0): http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2/html-single/MRG_Release_Notes/index.html#chap-MRG_Release_Notes-RT I suspect that pci=nocrs is the key here.
------- Comment From kravetz.com 2011-10-18 15:58 EDT------- They are documented in: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2/pdf/MRG_Release_Notes/Red_Hat_Enterprise_MRG-2-MRG_Release_Notes-en-US.pdf
------- Comment From kravetz.com 2011-10-18 16:13 EDT------- We did not read the release notes. However, upon closer examination the notes say to add "pci=use_crs", and we can not find mention of "pci=nocrs".
------- Comment From niv.com 2011-10-18 17:11 EDT------- I had skimmed the release notes (this is quite a while back now). Nothing I saw had jumped out at me as an urgent or needed item, somehow. I didn't get a "do this else your system will not boot". I'd recommend clarifying the documentation to correct the parameter setting needed (nocrs?) at the very least. ------- Comment From niv.com 2011-10-18 17:19 EDT------- Lowering priority due to workaround via boot params.
Created attachment 529544 [details] HS20 boot hang with pci=nocrs radeon.modeset=0 ------- Comment on attachment From pc.com 2011-10-21 13:34 EDT------- I just hit this problem on an HS20. Adding "pci=nocrs" and "radeon.modeset=0" (I don't see either in the release notes) seems to get further, but it panics, oops, or hangs later on in the boot...attaching a boot log.
------- Comment From pc.com 2011-10-21 15:13 EDT------- (In reply to comment #27) > I just hit this problem on an HS20. Adding "pci=nocrs" and "radeon.modeset=0" > (I don't see either in the release notes) seems to get further, but it panics, > oops, or hangs later on in the boot...attaching a boot log. I posted this too soon. As will be obvious in the attachment, I was running the .67 kernel. I've updated to the .75 kernel, and the HS20 has booted successfully..
Closing as working in the current release
reopening to look at adding quirk to deal with HS21 on boot with 2.0 kernel
Created attachment 532141 [details] proposed quirk for HS21 boot issue the attached patch has not been tested, it is only provided to illustrate how we can catch that we're booting on an HS21 and automagically set pci=nocrs. I know that there is further testing going on so will hold off on adding this patch to the kernel until I hear more about whether it's always needed, whether it's HS21 and not HS21xm, etc.
------- Comment From niv.com 2012-02-21 15:50 EDT------- Will see if we can retest on the MRG 2.0 testing train later this week.
------- Comment From niv.com 2012-02-27 14:56 EDT------- Just wanted to confirm with Clark if there is going to be a MRG 2.0 errata kernel. If there is not, we should close this as WILLNOTFIX.
We have no plans to fix this in the 2.6.33.7 kernel. Just confirmed that the 3.0.x rt kernel series boots properly with any combination of radeon.modeset=0 and pci=nocrs on an hs21xm blade. Closing this as WONTFIX
------- Comment From niv.com 2012-02-29 19:33 EDT------- Clark, Closing this from our end as well. The only thing we'll do is make sure the future documentation is right on MRG 2.2. If it's not, we'll open a new bug.