Bug 476184 - RHEL5.3 pv guests crash randomly on reboot orders.
RHEL5.3 pv guests crash randomly on reboot orders.
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
All Linux
high Severity urgent
: rc
: ---
Assigned To: Rik van Riel
Martin Jenner
: Regression
Depends On:
  Show dependency treegraph
Reported: 2008-12-12 05:16 EST by Gurhan Ozen
Modified: 2013-11-03 20:38 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-01-20 15:04:11 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
proposed patch to avoid the issue (1.22 KB, patch)
2008-12-15 14:13 EST, Rik van Riel
no flags Details | Diff

  None (edit)
Description Gurhan Ozen 2008-12-12 05:16:21 EST
Description of problem:
I didn't hit this everytime, but one of several tries are likely to hit. 
Installed 5.3 x86_64 dom0/pv guest. 
When virsh reboot $guest is issued the guest sometime crashes with the following backtrace:

 Checking for hardware changes [  OK  ]
Unable to handle kernel paging request at ffff8800000ce000 RIP: 
 [<ffffffff8020bbb1>] memcmp+0x8/0x22
PGD f5f067 PUD f60067 PMD f61067 PTE 0
Oops: 0000 [1] SMP 
last sysfs file: /class/net/eth0/address
CPU 0 
Modules linked in: powernow_k8 freq_table dm_multipath scsi_dh scsi_mod parport_pc lp parport xennet pcspkr dm_snapshot dm_zero dm_mirror dm_log dm_mod xenblk ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 817, comm: modprobe Not tainted 2.6.18-126.el5xen #1
RIP: e030:[<ffffffff8020bbb1>]  [<ffffffff8020bbb1>] memcmp+0x8/0x22
RSP: e02b:ffff88001d365bf0  EFLAGS: 00010206
RAX: 0000000000000041 RBX: 0000000000000000 RCX: 000000000000000a
RDX: 000000000000000a RSI: ffffffff881760fd RDI: ffff8800000ce000
RBP: ffff88001dd394c0 R08: 0000000000000001 R09: ffff880000098e00
R10: 0000000000000003 R11: 0000000000000000 R12: ffff8800000ce000
R13: 0000000000000000 R14: ffff880000098e00 R15: 00000000fffffff4
FS:  00002ae0b8bc7240(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process modprobe (pid: 817, threadinfo ffff88001d364000, task ffff88001f64b820)
Stack:  ffffffff88174a3c  000000001d365c78  0000000000000003  0000000000000001 
 ffff88001fdb78c0  0000000000000001  ffff88000001fa70  0000000000000001 
 0000000000000000  ffff88000001fa68 
Call Trace:
 [<ffffffff88174a3c>] :powernow_k8:powernowk8_cpu_init+0x55c/0xdec
 [<ffffffff802855c8>] __wake_up_common+0x3e/0x68
 [<ffffffff8028816d>] __cond_resched+0x1c/0x44
 [<ffffffff80263a0d>] _spin_lock_irq+0x9/0x14
 [<ffffffff80262099>] wait_for_completion+0xa1/0xaa
 [<ffffffff80263a0d>] _spin_lock_irq+0x9/0x14
 [<ffffffff8026349f>] __down_write_nested+0x35/0x9a
 [<ffffffff804043f3>] cpufreq_add_dev+0x174/0x57f
 [<ffffffff8021a69c>] vsnprintf+0x559/0x59e
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff80217548>] release_console_sem+0x1b1/0x205
 [<ffffffff8028b9f5>] vprintk+0x308/0x329
 [<ffffffff80261ead>] thread_return+0x96/0x113
 [<ffffffff80286bc9>] task_rq_lock+0x3f/0x71
 [<ffffffff8028830a>] set_cpus_allowed+0xb2/0xbf
 [<ffffffff8028ba68>] printk+0x52/0xc6
 [<ffffffff8039fb09>] sysdev_driver_register+0x61/0xbd
 [<ffffffff80403423>] cpufreq_register_driver+0xb9/0x194
 [<ffffffff802a01a7>] sys_init_module+0xaf/0x1e8
 [<ffffffff8025f106>] system_call+0x86/0x8b
 [<ffffffff8025f080>] system_call+0x0/0x8b

Code: 0f b6 17 29 c2 89 d0 75 10 48 ff c7 48 ff c6 48 ff c9 48 85 
RIP  [<ffffffff8020bbb1>] memcmp+0x8/0x22
 RSP <ffff88001d365bf0>
CR2: ffff8800000ce000
 <0>Kernel panic - not syncing: Fatal exception

Version-Release number of selected component (if applicable):
# rpm -qa | grep xen

How reproducible:
Reliably. As said before one of every several reboot commands are likely to hit this.

Additional info:
It's a 32-cpu (don't know how many cores) system:
processor	: 31
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: Quad-Core AMD Opteron(tm) Processor 8356
stepping	: 3
cpu MHz		: 2300.080
cache size	: 512 KB
physical id	: 31
siblings	: 1
core id		: 0
cpu cores	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall mmxext fxsr_opt lm 3dnowext 3dnow constant_tsc pni monitor cx16 lahf_lm cmp_legacy svm cr8_legacy
bogomips	: 5752.70
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc [6] [7] [8]

132 GB physical memory.
Comment 2 Rik van Riel 2008-12-12 14:17:47 EST
Is this a regression from 5.2?
Comment 3 Gurhan Ozen 2008-12-12 14:58:56 EST
(In reply to comment #2)
> Is this a regression from 5.2?

As far as, I know, yes it is. I didn't hit this with 5.2 . However, I don't know if we ran any 5.2 tests on this particular box.
Comment 4 Rik van Riel 2008-12-12 15:17:57 EST
Gurhan, could I trick you into bisecting the 5.3 development kernels to see when the problem started?

I don't think we changed any Xen cpufreq code between 5.2 and 5.3, but maybe some related code changes broke stuff.
Comment 5 Gurhan Ozen 2008-12-12 15:45:53 EST
Yes you can. Let me know what you'd like me to do.
Comment 6 Rik van Riel 2008-12-12 15:52:48 EST
To begin, I would like to know the kernel version that started breaking :)

This is most easily achieved by picking a kernel somewhere halfway in-between 5.2 and 5.3.  If that one is good, pick the halfway point between that and 5.3, etc. until you find the broken kernel.

It shouldn't be more than a handful of installs and reboots.  I just hope brew hasn't thrown out too many of the intermediate kernels :(
Comment 7 Gurhan Ozen 2008-12-14 02:14:14 EST
I have bad news for you. I installed a 5.2 pv guest, and tried all these kernels on it:

# rpm -qa | grep kernel-xen

Yes, including up to -126 which had crashed on the bug report. It doesn't crash on 5.2 guest.

However, 5.3 guest crashes. Just to make sure that there wasn't something funky with the guest itself, i installed another 5.3 pv guest, and was able to reproduce the issue with the new guest as well. 

So 5.3 distro crashes, but 5.2 doesn't , even with the 5.3 kernel. Any tips to zoom in what might be causing this?
Comment 8 Rik van Riel 2008-12-15 10:21:10 EST
<jarod> riel: the change from built-in powernow-k8 to modular might be a good area to look at closer...
<riel> jarod: *nod*
<riel> except ... 5.2 userspace with 5.3 kernel works fine
<jarod> 5.2 userspace wouldn't ever load powernow-k8
<riel> good point
<jarod> (if my memory serves correctly)
<riel> I wonder why it's trying to do ACPI-anything at all in a xenU
<riel> there should not be any ACPI thing visible
<jarod> so w/5.2 userspace, setting DRIVER=powernow-k8 (iirc) in the config file should cause it to get loaded
<riel> gozen_: could you try ^^^ ? :)
<jarod> and I suspect the problem would probably appear again
<jarod> I'm pretty sure the only relevant change in cpuspeed from 5.2 to 5.3 was the logic in the initscript to modprobe powernow-k8 when needed
<riel> sounds fair
Comment 9 Gurhan Ozen 2008-12-15 10:24:54 EST
Ok, so I did the same thing for the 5.3 installation and tried with these kernels:

# rpm -q kernel-xen

This issue seems to started to with 2.6.18-125 kernel, anything before -125 is fine. To be sure, i rebooted -124 kernel 270 times.
Comment 10 Gurhan Ozen 2008-12-15 13:31:49 EST
riel, trying jarod's suggestion, I was able to crash 5.2 userspace too! 

add DRIVER=powernow-k8 in /etc/sysconfig/cpuspeed and this problem happens in 5.2 userspace as well.
Comment 11 Rik van Riel 2008-12-15 13:44:20 EST
With some gdbing on the 126 debuginfo package, the oops is pinpointed to the memcmp in find_psb_table:

(gdb) list *0x1a3c
0x1a3c is in powernowk8_cpu_init (arch/i386/kernel/cpu/cpufreq/powernow-k8.c:701).
696             for (i = 0xc0000; i < 0xffff0; i += 0x10) {
697                     /* Scan BIOS looking for the signature. */
698                     /* It can not be at ffff0 - it is too big. */
700                     psb = phys_to_virt(i);
701                     if (memcmp(psb, PSB_ID_STRING, PSB_ID_STRING_LEN) != 0)
702                             continue;
704                     dprintk("found PSB header at 0x%p\n", psb);
Comment 12 Rik van Riel 2008-12-15 14:13:44 EST
Created attachment 327006 [details]
proposed patch to avoid the issue
Comment 13 RHEL Product and Program Management 2008-12-15 14:21:51 EST
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.
Comment 14 Rik van Riel 2008-12-15 14:28:39 EST
Oh boy, the patch linux-2.6-i386-Add-check-for-dmi_data-in-powernow_k8-driver.patch from July 2008 removes roughly the same code that my workaround patch adds.

Prarit, why did you remove those lines of code?
Comment 15 Rik van Riel 2008-12-15 17:14:17 EST
<prarit> riel: clalance & I are chatting about it now (along with dzickus, gozen, and jarod).
<prarit> clalance & I  don't think your suggested patch is correct.
<riel> prarit: ok, then I'll reassign the bug to you
<prarit> riel: sure :)
<riel> prarit: your patch removed that check initially :)
<riel> what I don't know is why it took until -125 to show up a sa regression
<prarit> riel: yeah...
<riel> prarit: what would your proposed fix be?
<riel> or ... what is wrong about the patch I proposed? :)
--- fbl is now known as fbl_bbl
--- lvmguy_dinner is now known as lvmguy
<-- ootpa (~ltroan@dhcp231-167.rdu.redhat.com) has left #kernel (Leaving)
<prarit> riel: In theory on PV guests, there is no dmi data.  So all calls to get anything from dmi_data should be NULL, right?
<prarit> Therefore the powernow-k8 driver should have failed to load because of this piece of code:
<prarit>         if (preregister_acpi_perf == 1 && cpu_family == CPU_OPTERON) {
<prarit>                 char * dmi_data = dmi_get_system_info(DMI_BIOS_VENDOR);
<prarit>                 printk("%s: dmi_data = %s\n", __FUNCTION__, dmi_data);
<prarit>                 if (dmi_data && !strncmp(dmi_data, "Hewlett-Packard", 15)) {
<prarit> #ifdef CONFIG_XEN
<prarit>                         /* Disable cpufreq for HP AMD Opteron systems */
<prarit>                         printk("%s: This BIOS is %s .... disabling cpufreq "
<prarit>                                "support\n", __FUNCTION__, dmi_data);
<prarit>                         return -EPERM;
<prarit> #else
<prarit> But the code is continuing to execute.
<riel> where is that code?
<prarit> arch/i386/kernel/cpu/cpufreq/powernow-k8.c
<riel> what function or line?
<riel> oh found it, in powernowk8_init()
<prarit> powerno...
<prarit> :)
<prarit> Sorry for lag riel -- we're chatting on this end.
<riel> can you try "dmidecode" on gozen's test guest?
<riel> just to be sure
--- clark_lunch is now known as clark
<riel> prarit: oh wait - I see
<riel> prarit: if !dmi_data, that return -EPERM is never taken :)
<riel> prarit: and we fall through to the next code
<prarit> .... clalance has a good issue: How did this EVER work?
<prarit> Because this seems to have just started failing...
<riel> yeah, pure luck
<riel> apparently gozen_ sometimes needs to reboot the guest quite a few times before it hits
<riel> at least we now know the culprit
<riel> and the fix - reinstate the Xen test your patch removed
<prarit> riel: Maybe I'm being dense ;) -- I agree that the code is incorrect, but ... I don't see what is left to chance that this sometimes occurs and sometimes succeeds.
<riel> it sure explains why there's no obvious culprit to be found in the -125 changes
<riel> prarit: I'm not sure either - we have 1 hour to find out
<riel> prarit: or we could spend that hour verifying that reinstating that !is_initial_xendomain() test fixes things
<prarit> I'm worried there is some random corruption going on :/  It should always work or always fail.
<prarit> It seems like doing that would be a band-aid ... /me is nervous
<riel> BIOS-provided physical RAM map:
<riel>  Xen: 0000000000000000 - 000000001fc00000 (usable)
<riel> no BIOS area in a domU e820 map
<riel> so what is at the BIOS addresses is rather random
<-- apuch_laptop has quit (Ping timeout: 622 seconds)
<-- vfalico has quit (Ping timeout: 240 seconds)
<prarit> riel: We think we know whats going on -- will drop you an email with patch in 5 mins.
<riel> prarit: ok sweet
<prarit> riel: Basically it's a "what you just said" patch
Comment 17 Don Zickus 2008-12-16 14:15:53 EST
in kernel-2.6.18-127.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 20 errata-xmlrpc 2009-01-20 15:04:11 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.