Description of problem: While testing RHEL5.U2 kernel system hp-xw9400-02.rhts.boston.redhat.com reports a BUG: soft lockup - CPU#1 stuck for 16s! [scsi_eh_1:526] Version-Release number of selected component (if applicable): 2.6.18-67.el5 How reproducible: always Steps to Reproduce: 1. Install RHEL5.U1 on hp-xw9400-02.rhts.boston.redhat.com 2. Install kernel 2.6.18-67.el5 reboot. Actual results: BUG: soft lockup - CPU#1 stuck for 16s! [scsi_eh_1:526] CPU 1: Modules linked in: sata_nv libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 526, comm: scsi_eh_1 Not tainted 2.6.18-76.el5 #1 RIP: 0010:[<ffffffff80063ae8>] [<ffffffff80063ae8>] _spin_unlock_irqrestore+0x8/0x9 RSP: 0000:ffff81011ed05d78 EFLAGS: 00000286 RAX: 000000000000007f RBX: ffff8100029ee390 RCX: 0000000033f33516 RDX: 000000000000409a RSI: 0000000000000286 RDI: ffff810037f521d8 RBP: ffffffff880bff08 R08: ffff8100029ee8a0 R09: ffff8100029ee420 R10: ffffffff880b6b32 R11: ffffffff880bf145 R12: 0000000000000206 R13: 00000000fffbac9c R14: ffff8100029ee2f0 R15: ffff8100029ee8a0 FS: 000000001a16c8f0(0000) GS:ffff8101023ddac0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000000082b9d0 CR3: 0000000000201000 CR4: 00000000000006e0 Call Trace: [<ffffffff880c15f1>] :libata:ata_eh_recover+0x518/0xb0f [<ffffffff880b686b>] :libata:ata_std_postreset+0x0/0x9e [<ffffffff880e3529>] :sata_nv:nv_hardreset+0x0/0x13 [<ffffffff880b6b32>] :libata:ata_std_softreset+0x0/0x136 [<ffffffff880ba3f8>] :libata:ata_std_prereset+0x0/0x131 [<ffffffff880b686b>] :libata:ata_std_postreset+0x0/0x9e [<ffffffff880e3529>] :sata_nv:nv_hardreset+0x0/0x13 [<ffffffff880b6b32>] :libata:ata_std_softreset+0x0/0x136 [<ffffffff880ba3f8>] :libata:ata_std_prereset+0x0/0x131 [<ffffffff880c1d8d>] :libata:ata_do_eh+0x3b/0xa6 [<ffffffff880c28eb>] :libata:ata_scsi_error+0x29b/0x5e8 [<ffffffff8009ce5f>] keventd_create_kthread+0x0/0xc4 [<ffffffff88077ff0>] :scsi_mod:scsi_error_handler+0xba/0x4ac [<ffffffff88077f36>] :scsi_mod:scsi_error_handler+0x0/0x4ac [<ffffffff8009ce5f>] keventd_create_kthread+0x0/0xc4 [<ffffffff8003252b>] kthread+0xfe/0x132 [<ffffffff8005cfb1>] child_rip+0xa/0x11 [<ffffffff8009ce5f>] keventd_create_kthread+0x0/0xc4 [<ffffffff8003242d>] kthread+0x0/0x132 [<ffffffff8005cfa7>] child_rip+0x0/0x11 Expected results: System should not report BUG message during normal boot operations Additional info: I have spoken with peterm and jgarzik about this issue. peterm was unable to reproduce this behavior on his 9400
From private email: Here's the situation, a description of the BIOS/ACPI problem on the xw9400, and a suggestion for a solution. The xw9400 has two dual-core processors (note, to add to your confusion I am going to refer to cores as cpus and processors and procs in the remaining text). When the system boots, the boot order (the way the cpus are enumerated) of the cpus is as follows: 00 - proc 0, cpu 0 10 - proc 1, cpu 0 01 - proc 0, cpu 1 11 - proc 1, cpu 1 Note that the choice to enumerate the cpus in this manner is an HP choice, and not the preferred AMD choice. The ACPI _PSD table is a table that is (usually) hardcoded that describes the way that cpus are grouped together in domains which have a common cpu frequency. The ACPI table is laid out as follows on the xw9400: acpi_processor_preregister_performance calling acpi_processor_get_psd on cpu 0 acpi_processor_get_psd: num_entries = 5 acpi_processor_get_psd: revision = 0 acpi_processor_get_psd: domain = 0 acpi_processor_get_psd: coord_type = 253 acpi_processor_get_psd: num_processors = 2 acpi_processor_preregister_performance calling acpi_processor_get_psd on cpu 1 input: AT Translated Set 2 keyboard as /class/input/input0 acpi_processor_get_psd: num_entries = 5 acpi_processor_get_psd: revision = 0 acpi_processor_get_psd: domain = 0 acpi_processor_get_psd: coord_type = 253 acpi_processor_get_psd: num_processors = 2 acpi_processor_preregister_performance calling acpi_processor_get_psd on cpu 2 acpi_processor_get_psd: num_entries = 5 acpi_processor_get_psd: revision = 0 acpi_processor_get_psd: domain = 1 acpi_processor_get_psd: coord_type = 253 acpi_processor_get_psd: num_processors = 2 acpi_processor_preregister_performance calling acpi_processor_get_psd on cpu 3 acpi_processor_get_psd: num_entries = 5 acpi_processor_get_psd: revision = 0 acpi_processor_get_psd: domain = 1 acpi_processor_get_psd: coord_type = 253 acpi_processor_get_psd: num_processors = 2 ie) the table is 00 - proc 0, cpu 0 01 - proc 0, cpu 1 10 - proc 1, cpu 0 11 - proc 1, cpu 1 So ... when acpi_processor_preregister_performance() is called, the OS sets a cpumask which describes the frequency domains in the system. From the ACPI table and data, the domain's cpumasks are: Domain A: 0011 Domain B: 1100 However, going back to the way the cpus were enumerated, 00 - proc 0, cpu 0 10 - proc 1, cpu 0 01 - proc 0, cpu 1 11 - proc 1, cpu 1 From this data, we can see that the domain's cpumasks should be: Domain A: 0101 Domain B: 1010 The issue is that the boot order of the cpus does not match the ACPI provided map for the domains. *This is broken*. As you can see this causes all sorts of chaos in the system. Hard-coding the domains on the xw9400 results in a normal system. Tony, AFAICT it is up to HP to fix this -- this clearly is an ACPI issue and not an OS or AMD issue. Having said that, we're "here and now". We could code a DMI entry so that the OS uses the correct domain cpumasks. I'm obviously open to any other suggestions....
Created attachment 294378 [details] RHEL5 version 1 fix for this patch This patch resolves the issue on only the xw9400. However, it is not a general solution -- which causes me concern. I think a better solution is to do the following: rewrite the code so that the ACPI processor performance information is only pre-initialized when Xen is running and/or when a Barcelona or newer processor is detected. That way we won't break any (more) existing systems. Users would still have an option to manually turn this on via the powernow-k8.preregister_acpi_perf module option that I have added to this version of the patch ...
I added the preregister code specifically because Xen does not provide sibling information correctly (or at all) and there is no way to determine which cores share frequencies. If you disable it by default, most RevF and RevE systems will see time skew with every frequency change. I think I'd rather have a DMI that just turned off Xen power management for xw9400 (and possibly other HP products) unless the customer specifically turns it on with known good BIOS information.
(In reply to comment #3) > I added the preregister code specifically because Xen does not provide sibling > information correctly (or at all) and there is no way to determine which cores > share frequencies. If you disable it by default, most RevF and RevE systems > will see time skew with every frequency change. > > I think I'd rather have a DMI that just turned off Xen power management for > xw9400 (and possibly other HP products) unless the customer specifically turns > it on with known good BIOS information. Good suggestion Mark -- and its similar to the one dzickus (RHEL5 kernel maintainer) and I came up with. (Sorry for the cut-and-paste) + if (preregister_acpi_perf || cpu_family != CPU_OPTERON) { I tested with this where preregister_acpi_perf is a module parameter and can be set on the boot line with powernow-k8.preregister_acpi_perf={0,1} . However, the second part of the test, cpu_family != CPU_OPTERON, is probably too strict. Is there a better way that I can test for RevF & higher processors? /me hasn't looked at the code to check and was hoping for a quick answer from Mark or Bhavana ;) + if (acpi_processor_preregister_performance(acpi_perf_data)) + return -ENODEV; + else + preregister_valid = 1; + } else + printk(KERN_INFO "Disabling ACPI pre-initialization.\n"); + P.
All RevF systems are CPU_OPTERON. All Barcelona and later systems are CPU_HW_PSTATE That doesn't change the fact that there are numerous RevF systems that are not HP xw9400, and that WILL BREAK if you disable preregister on them for Xen. For example, the Tyan "Anaheim" 2P system that I did the original PowerNow! development on works fine with preregister, but will break if it does not have it. If your patch only turns it off for the xw9400, I'm fine with it. If it turns it off for all RevF systems, I think it is overly broad.
Created attachment 294398 [details] RHEL5 version 2 fix for this patch (In reply to comment #5) > All RevF systems are CPU_OPTERON. > All Barcelona and later systems are CPU_HW_PSTATE > > That doesn't change the fact that there are numerous RevF systems that are not > HP xw9400, and that WILL BREAK if you disable preregister on them for Xen. For > example, the Tyan "Anaheim" 2P system that I did the original PowerNow! > development on works fine with preregister, but will break if it does not have it. > > If your patch only turns it off for the xw9400, I'm fine with it. If it turns > it off for all RevF systems, I think it is overly broad. Mark, here's the issue that we (RH) have to face. When updating the kernel we absolutely CANNOT under any circumstances break existing systems' installs or default behavior. To do so causes much grief and strife amongst our support group. I doubt that the xw9400 is the only system with this type of an issue... So here's what I'm proposing to make everyone happy: 1. I am going to modify Rik's original code so that it is _off by default_ for OPTERON systems. This would maintain the existing behavior of RHEL5.1 which does not make use of the preregister call. 2. I am going to add a kernel parameter to turn on (as mentioned above) the call to preregister for OPTERON systems. This would allow users to move ahead of RHEL5.1's behavior if they know they have a good BIOS/ACPI table. 3. Xen with OPTERON users (assuming a good BIOS/ACPI table) will have to add "powernow-k8.preregister_acpi_perf=1" to the boot args in order to boot in Xen. This patch is much lighter weight than the original patch I proposed ....
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
Created attachment 295438 [details] RHEL5 version 3 fix for this patch Final patch posted to RHKL.
This patch also fixes the following message seen during boot. powernow-k8: error - out of sync, fix 0x12 0x2, vid 0x8 0x12 on
Created attachment 296644 [details] BIOS binary update file To flash, do the following: 1) put the binary file(786D6psd.bin) on DOS or Win98-bootable media 2) reboot the system 3) during POST, select F10(setup) 4) select 'flash system rom' and flash the system with '786D6psd.bin' 5) reboot the system
Two questions: 1) Is the plan to include the work-around in the 5.2 kernel? 2) Have you tested the included BIOS to see if we've fixed the problem, or not?
Created attachment 297670 [details] RHEL5 version 4 fix During testing, we found that including this patch causes PV kernels to OOPS on bootup. The problem ends up being that PV kernels don't have dmi data, so the dmi_get_system_info() call returns NULL, and then we OOPS in the following strncmp. The attached patch fixes this very simply by just bombing out earlier if we are a domU; there is no way we can control the frequency state anyway. Chris Lalancette
in kernel-2.6.18-86.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Confirmed we haven't seen these messages in rhts systems for a while and that the fix is in the -91 kernel.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html