Bug 430947 - [RHEL5 U2] Kernel reports BUG: soft lockup - CPU#1 stuck for 16s! [scsi_eh_1:526] during boot
[RHEL5 U2] Kernel reports BUG: soft lockup - CPU#1 stuck for 16s! [scsi_eh_1:...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
low Severity low
: rc
: ---
Assigned To: Prarit Bhargava
Martin Jenner
http://rhts.lab.boston.redhat.com/tes...
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-01-30 14:01 EST by Jeff Burke
Modified: 2008-05-21 11:08 EDT (History)
9 users (show)

See Also:
Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:08:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
RHEL5 version 1 fix for this patch (2.50 KB, patch)
2008-02-08 11:33 EST, Prarit Bhargava
no flags Details | Diff
RHEL5 version 2 fix for this patch (1.63 KB, patch)
2008-02-08 14:13 EST, Prarit Bhargava
no flags Details | Diff
RHEL5 version 3 fix for this patch (6.90 KB, patch)
2008-02-20 13:47 EST, Prarit Bhargava
no flags Details | Diff
BIOS binary update file (1.00 MB, application/octet-stream)
2008-03-03 12:51 EST, Jeff Burrell
no flags Details
RHEL5 version 4 fix (3.84 KB, patch)
2008-03-11 16:16 EDT, Chris Lalancette
no flags Details | Diff

  None (edit)
Description Jeff Burke 2008-01-30 14:01:55 EST
Description of problem:
 While testing RHEL5.U2 kernel system hp-xw9400-02.rhts.boston.redhat.com
reports a BUG: soft lockup - CPU#1 stuck for 16s! [scsi_eh_1:526]

Version-Release number of selected component (if applicable):
 2.6.18-67.el5

How reproducible:
 always

Steps to Reproduce:
1. Install RHEL5.U1 on hp-xw9400-02.rhts.boston.redhat.com
2. Install kernel 2.6.18-67.el5 reboot.
  
Actual results:
BUG: soft lockup - CPU#1 stuck for 16s! [scsi_eh_1:526]
CPU 1:
Modules linked in: sata_nv libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd
uhci_hcd
Pid: 526, comm: scsi_eh_1 Not tainted 2.6.18-76.el5 #1
RIP: 0010:[<ffffffff80063ae8>]  [<ffffffff80063ae8>] _spin_unlock_irqrestore+0x8/0x9
RSP: 0000:ffff81011ed05d78  EFLAGS: 00000286
RAX: 000000000000007f RBX: ffff8100029ee390 RCX: 0000000033f33516
RDX: 000000000000409a RSI: 0000000000000286 RDI: ffff810037f521d8
RBP: ffffffff880bff08 R08: ffff8100029ee8a0 R09: ffff8100029ee420
R10: ffffffff880b6b32 R11: ffffffff880bf145 R12: 0000000000000206
R13: 00000000fffbac9c R14: ffff8100029ee2f0 R15: ffff8100029ee8a0
FS:  000000001a16c8f0(0000) GS:ffff8101023ddac0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000082b9d0 CR3: 0000000000201000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff880c15f1>] :libata:ata_eh_recover+0x518/0xb0f
 [<ffffffff880b686b>] :libata:ata_std_postreset+0x0/0x9e
 [<ffffffff880e3529>] :sata_nv:nv_hardreset+0x0/0x13
 [<ffffffff880b6b32>] :libata:ata_std_softreset+0x0/0x136
 [<ffffffff880ba3f8>] :libata:ata_std_prereset+0x0/0x131
 [<ffffffff880b686b>] :libata:ata_std_postreset+0x0/0x9e
 [<ffffffff880e3529>] :sata_nv:nv_hardreset+0x0/0x13
 [<ffffffff880b6b32>] :libata:ata_std_softreset+0x0/0x136
 [<ffffffff880ba3f8>] :libata:ata_std_prereset+0x0/0x131
 [<ffffffff880c1d8d>] :libata:ata_do_eh+0x3b/0xa6
 [<ffffffff880c28eb>] :libata:ata_scsi_error+0x29b/0x5e8
 [<ffffffff8009ce5f>] keventd_create_kthread+0x0/0xc4
 [<ffffffff88077ff0>] :scsi_mod:scsi_error_handler+0xba/0x4ac
 [<ffffffff88077f36>] :scsi_mod:scsi_error_handler+0x0/0x4ac
 [<ffffffff8009ce5f>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003252b>] kthread+0xfe/0x132
 [<ffffffff8005cfb1>] child_rip+0xa/0x11
 [<ffffffff8009ce5f>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003242d>] kthread+0x0/0x132
 [<ffffffff8005cfa7>] child_rip+0x0/0x11

Expected results:
 System should not report BUG message during normal boot operations

Additional info:
 I have spoken with peterm and jgarzik about this issue. 
peterm was unable to reproduce this behavior on his 9400
Comment 1 Prarit Bhargava 2008-02-08 09:15:12 EST
From private email:

Here's the situation, a description of the BIOS/ACPI problem on the xw9400, and
a suggestion for a solution.

The xw9400 has two dual-core processors (note, to add to your confusion I am
going to refer to cores as cpus and processors and procs in the remaining text).
 When the system boots, the boot order (the way the cpus are enumerated) of the
cpus is as follows:

00 - proc 0, cpu 0
10 - proc 1, cpu 0
01 - proc 0, cpu 1
11 - proc 1, cpu 1

Note that the choice to enumerate the cpus in this manner is an HP choice, and
not the preferred AMD choice.

The ACPI _PSD table is a table that is (usually) hardcoded that describes the
way that cpus are grouped together in domains which have a common cpu frequency.

The ACPI table is laid out as follows on the xw9400:

acpi_processor_preregister_performance calling acpi_processor_get_psd on cpu 0
acpi_processor_get_psd: num_entries = 5
acpi_processor_get_psd: revision = 0
acpi_processor_get_psd: domain = 0
acpi_processor_get_psd: coord_type = 253
acpi_processor_get_psd: num_processors = 2
acpi_processor_preregister_performance calling acpi_processor_get_psd on cpu 1
input: AT Translated Set 2 keyboard as /class/input/input0
acpi_processor_get_psd: num_entries = 5
acpi_processor_get_psd: revision = 0
acpi_processor_get_psd: domain = 0
acpi_processor_get_psd: coord_type = 253
acpi_processor_get_psd: num_processors = 2
acpi_processor_preregister_performance calling acpi_processor_get_psd on cpu 2
acpi_processor_get_psd: num_entries = 5
acpi_processor_get_psd: revision = 0
acpi_processor_get_psd: domain = 1
acpi_processor_get_psd: coord_type = 253
acpi_processor_get_psd: num_processors = 2
acpi_processor_preregister_performance calling acpi_processor_get_psd on cpu 3
acpi_processor_get_psd: num_entries = 5
acpi_processor_get_psd: revision = 0
acpi_processor_get_psd: domain = 1
acpi_processor_get_psd: coord_type = 253
acpi_processor_get_psd: num_processors = 2


ie) the table is

00 - proc 0, cpu 0
01 - proc 0, cpu 1
10 - proc 1, cpu 0
11 - proc 1, cpu 1

So ... when acpi_processor_preregister_performance() is called, the OS sets a
cpumask which describes the frequency domains in the system.

From the ACPI table and data, the domain's cpumasks are:

Domain A: 0011
Domain B: 1100

However, going back to the way the cpus were enumerated,

00 - proc 0, cpu 0
10 - proc 1, cpu 0
01 - proc 0, cpu 1
11 - proc 1, cpu 1

From this data, we can see that the domain's cpumasks should be:

Domain A: 0101
Domain B: 1010

The issue is that the boot order of the cpus does not match the ACPI provided
map for the domains.  *This is broken*.

As you can see this causes all sorts of chaos in the system.

Hard-coding the domains on the xw9400 results in a normal system.

Tony, AFAICT it is up to HP to fix this -- this clearly is an ACPI issue and not
an OS or AMD issue.

Having said that, we're "here and now".  We could code a DMI entry so that the
OS uses the correct domain cpumasks.

I'm obviously open to any other suggestions.... 
Comment 2 Prarit Bhargava 2008-02-08 11:33:34 EST
Created attachment 294378 [details]
RHEL5 version 1 fix for this patch

This patch resolves the issue on only the xw9400.  However, it is not a general

solution -- which causes me concern.

I think a better solution is to do the following: rewrite the code so that the
ACPI processor performance information is only pre-initialized when Xen is
running and/or when a Barcelona or newer processor is detected.

That way we won't break any (more) existing systems.  Users would still have an

option to manually turn this on via the powernow-k8.preregister_acpi_perf
module option that I have added to this version of the patch ...
Comment 3 Mark Langsdorf 2008-02-08 11:48:56 EST
I added the preregister code specifically because Xen does not provide sibling
information correctly (or at all) and there is no way to determine which cores
share frequencies.  If you disable it by default, most RevF and RevE systems
will see time skew with every frequency change.

I think I'd rather have a DMI that just turned off Xen power management for
xw9400 (and possibly other HP products) unless the customer specifically turns
it on with known good BIOS information.
Comment 4 Prarit Bhargava 2008-02-08 13:35:30 EST
(In reply to comment #3)
> I added the preregister code specifically because Xen does not provide sibling
> information correctly (or at all) and there is no way to determine which cores
> share frequencies.  If you disable it by default, most RevF and RevE systems
> will see time skew with every frequency change.
> 
> I think I'd rather have a DMI that just turned off Xen power management for
> xw9400 (and possibly other HP products) unless the customer specifically turns
> it on with known good BIOS information.

Good suggestion Mark -- and its similar to the one dzickus (RHEL5 kernel
maintainer) and I came up with.

(Sorry for the cut-and-paste)

+       if (preregister_acpi_perf || cpu_family != CPU_OPTERON) {

I tested with this where preregister_acpi_perf is a module parameter and can be
set on the boot line with powernow-k8.preregister_acpi_perf={0,1} .  However,
the second part of the test, cpu_family != CPU_OPTERON, is probably too 
strict.  Is there a better way that I can test for RevF & higher processors?

/me hasn't looked at the code to check and was hoping for a quick answer from
Mark or Bhavana ;)

+               if (acpi_processor_preregister_performance(acpi_perf_data))
+                       return -ENODEV;
+               else
+                       preregister_valid = 1;
+       } else
+               printk(KERN_INFO "Disabling ACPI pre-initialization.\n");
+

P.
Comment 5 Mark Langsdorf 2008-02-08 13:44:33 EST
All RevF systems are CPU_OPTERON.
All Barcelona and later systems are CPU_HW_PSTATE

That doesn't change the fact that there are numerous RevF systems that are not
HP xw9400, and that WILL BREAK if you disable preregister on them for Xen.  For
example, the Tyan "Anaheim" 2P system that I did the original PowerNow!
development on works fine with preregister, but will break if it does not have it.

If your patch only turns it off for the xw9400, I'm fine with it.  If it turns
it off for all RevF systems, I think it is overly broad.
Comment 6 Prarit Bhargava 2008-02-08 14:13:37 EST
Created attachment 294398 [details]
RHEL5 version 2 fix for this patch

(In reply to comment #5)
> All RevF systems are CPU_OPTERON.
> All Barcelona and later systems are CPU_HW_PSTATE
> 
> That doesn't change the fact that there are numerous RevF systems that are
not
> HP xw9400, and that WILL BREAK if you disable preregister on them for Xen. 
For
> example, the Tyan "Anaheim" 2P system that I did the original PowerNow!
> development on works fine with preregister, but will break if it does not
have it.
> 
> If your patch only turns it off for the xw9400, I'm fine with it.  If it
turns
> it off for all RevF systems, I think it is overly broad.

Mark, here's the issue that we (RH) have to face.

When updating the kernel we absolutely CANNOT under any circumstances break
existing systems' installs or default behavior.  To do so causes much grief and
strife amongst our support group.

I doubt that the xw9400 is the only system with this type of an issue...

So here's what I'm proposing to make everyone happy:

1.  I am going to modify Rik's original code so that it is _off by default_ for
OPTERON systems.  This would maintain the existing behavior of RHEL5.1 which
does not make use of the preregister call.

2.  I am going to add a kernel parameter to turn on (as mentioned above) the
call to preregister for OPTERON systems.  This would allow users to move ahead
of RHEL5.1's behavior if they know they have a good BIOS/ACPI table.

3.  Xen with OPTERON users (assuming a good BIOS/ACPI table) will have to add
"powernow-k8.preregister_acpi_perf=1" to the boot args in order to boot in Xen.


This patch is much lighter weight than the original patch I proposed ....
Comment 8 RHEL Product and Program Management 2008-02-08 14:39:17 EST
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.
Comment 10 Prarit Bhargava 2008-02-20 13:47:57 EST
Created attachment 295438 [details]
RHEL5 version 3 fix for this patch

Final patch posted to RHKL.
Comment 11 Jeff Burke 2008-02-20 14:27:20 EST
This patch also fixes the following message seen during boot.
powernow-k8: error - out of sync, fix 0x12 0x2, vid 0x8 0x12
on 
Comment 12 Jeff Burrell 2008-03-03 12:51:16 EST
Created attachment 296644 [details]
BIOS binary update file

To flash, do the following:
1) put the binary file(786D6psd.bin) on DOS or Win98-bootable media
2) reboot the system
3) during POST, select F10(setup)
4) select 'flash system rom' and flash the system with '786D6psd.bin'
5) reboot the system
Comment 13 Jeff Burrell 2008-03-10 19:29:27 EDT
Two questions:

1) Is the plan to include the work-around in the 5.2 kernel?
2) Have you tested the included BIOS to see if we've fixed the problem, or not?
Comment 14 Chris Lalancette 2008-03-11 16:16:57 EDT
Created attachment 297670 [details]
RHEL5 version 4 fix

During testing, we found that including this patch causes PV kernels to OOPS on
bootup.  The problem ends up being that PV kernels don't have dmi data, so the
dmi_get_system_info() call returns NULL, and then we OOPS in the following
strncmp.  The attached patch fixes this very simply by just bombing out earlier
if we are a domU; there is no way we can control the frequency state anyway.

Chris Lalancette
Comment 15 Don Zickus 2008-03-19 12:24:24 EDT
in kernel-2.6.18-86.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 17 Mike Gahagan 2008-04-29 13:03:08 EDT
Confirmed we haven't seen these messages in rhts systems for a while and that
the fix is in the -91 kernel.
Comment 19 errata-xmlrpc 2008-05-21 11:08:29 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.