Bug 494114 - 2.6.18-128.1.6.el5xen panic!
2.6.18-128.1.6.el5xen panic!
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.3
i686 Linux
low Severity high
: ---
: ---
Assigned To: Prarit Bhargava
Red Hat Kernel QE team
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-04 09:57 EDT by Alexander Lindqvist
Modified: 2009-09-02 05:01 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 05:01:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
2.6.18-128.1.6.el5xen panic (166.42 KB, image/jpeg)
2009-04-04 09:57 EDT, Alexander Lindqvist
no flags Details
Just before 2.6.18-128.1.6.el5xen panics (145.32 KB, image/jpeg)
2009-04-04 09:59 EDT, Alexander Lindqvist
no flags Details
Server booting 2.6.18-128.1.6.el5xen filmed with mobile. (3.77 MB, video/3gpp)
2009-04-04 10:02 EDT, Alexander Lindqvist
no flags Details
Capture of serial console (9.48 KB, text/x-log)
2009-04-05 07:13 EDT, Alexander Lindqvist
no flags Details

  None (edit)
Description Alexander Lindqvist 2009-04-04 09:57:57 EDT
Created attachment 338165 [details]
2.6.18-128.1.6.el5xen panic

Description of problem:
2.6.18-128.1.6.el5xen panics during boot.

Version-Release number of selected component (if applicable):
2.6.18-128.1.6.el5xen

How reproducible:
Always during boot.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
panics during boot and restarts server in an endless loop.
2.6.18-92.1.22.el5xen works 

Expected results:
boot without crashing

Additional info:
2 identical Proliant 6400R (both panics)
Server config:
4x P3 Xeon 550MHz 2MB Cache
HP SmartArray 5304
4GB RAM
CentOS 5.3
Upgraded DomU's first to CentOS 5.3 kernel 2.6.18-128.1.6 and they are running ok on this hardware.

Dom0 boots 2.6.18-128.1.6.el5xen kernel but reboots in the middle of the boot process. 2.6.18-92.1.22el5xen boots fine.
It is probably the same problem with the bare metal kernel but this is untested.
Comment 1 Alexander Lindqvist 2009-04-04 09:59:35 EDT
Created attachment 338166 [details]
Just before 2.6.18-128.1.6.el5xen panics
Comment 2 Alexander Lindqvist 2009-04-04 10:02:49 EDT
Created attachment 338167 [details]
Server booting 2.6.18-128.1.6.el5xen filmed with mobile.

Opens in VLC.
Comment 3 Alexander Lindqvist 2009-04-04 10:04:21 EDT
CentOS bugtracker: http://bugs.centos.org/view.php?id=3489
Comment 4 Rik van Riel 2009-04-04 10:25:59 EDT
Are you able to reproduce using RHEL?
Comment 5 Alexander Lindqvist 2009-04-04 12:08:27 EDT
No as I don't have access to RHEL 5 software. 
I could download a test kernel from dzickus or a clalance virttest kernel if you want? If so please specify which kernel you want me to test.
Comment 6 Chris Lalancette 2009-04-05 04:55:13 EDT
If you could test with the virttest kernels at http://people.redhat.com/clalance/virttest, that would be useful.  That being said, I don't know that we have any fixes in place for something like this, so I'm not that hopeful it will make a difference.  Also if you can test whether the bare-metal kernel has the same problem, that would be useful.

Finally, if at all possible, getting a serial console output of the crash would be extremely useful.  While the movie shows the crash, it's too blurry and short to really read the OOPs output, so it's hard to see what's going on.

Chris Lalancette
Comment 7 Alexander Lindqvist 2009-04-05 07:13:43 EDT
Created attachment 338219 [details]
Capture of serial console

Capture of serial console during boot of 2.6.18-128.1.5.el5xen
Comment 8 Alexander Lindqvist 2009-04-05 07:21:34 EDT
Attachment above is 2.6.18-128.1.(6).el5xen (misstyped)

kernel-xen-2.6.18-137.el5virttest15.i686.rpm tested and has the same problem.
Comment 9 Chris Lalancette 2009-04-05 07:24:33 EDT
OK, great, that's very good info.  I've asked one of the PCI bus enumerations experts to have a quick look at this BZ, but in all likelihood it won't be until tomorrow.

Chris Lalancette
Comment 10 Chris Lalancette 2009-04-06 06:27:20 EDT
OK, it seems that there is a patch available that *should* fix this issue. 
I've built a test kernel with it; it's available at:

http://people.redhat.com/clalance/bz494114

Can you give this test kernel a try, and see if it fixes the issue for you?

Thanks,
Chris Lalancette
Comment 13 Alexander Lindqvist 2009-04-06 11:16:27 EDT
That kernel did it !
Im running both servers on this kernel with 8 paravirt guests now and so far so good.

Can you confirm which kernel release will contain this bugfix ?
Comment 14 Chris Lalancette 2009-04-06 11:35:37 EDT
OK, great, thanks for testing.  This patch is currently slated for 5.4, barring any problems we find with it.  I'm going to close this as a dup of BZ 470202.

Chris Lalancette

*** This bug has been marked as a duplicate of bug 470202 ***
Comment 15 Prarit Bhargava 2009-04-07 09:00:12 EDT
Un-duped by clalance, and POSTed by me.

P.
Comment 16 David Ranch 2009-04-19 10:51:56 EDT
I posted a comment to the 481500 bug post which tracked this issue: https://bugzilla.redhat.com/show_bug.cgi?id=481500 and Chris Lalancette redirected me here.  Considering the wait until the 5.4 release is a ways out, can we get the specific PCI enumeration patch for this issue so that we can apply them to the released 5.3 kernels?  I'd rather not run a more experimental kernel than I need to.  Btw, I can reproduce this problem without the Xen HV so this is more of a core kernel issue than a virtualization specific issue.
Comment 17 David Ranch 2009-04-19 12:29:50 EDT
Btw, I can confirm that 2.6.18-138.el5virttest16 does boot on my 2x1ghz P3 platform but XVC serial console redirection (ttyS0,9600n1) is broken and only posts the following:
--
Kernel 2.6.18-138.el5virttest16 on an i686
 Filesystem type is ext2fs, partition type 0x83
dhcp-49 login: -2.6.18-138.el5virttest16 ro root=/dev/VolGroup00/LogVol00 conso
le=xvc console=tty xencons=xvc
   [Linux-bzImage, setup=0x1e00, size=0x1beb74]
initrd /initrd-2.6.18-138.el5virttest16.img
   [Linux-initrd @ 0x37cf2000, 0x2fd280 bytes]

ÿ  <---- that's the last character.  Initally looks like baud mismatch but no other characters come up so I don't think Xen is taking over the serial port as expected
--
Comment 18 Prarit Bhargava 2009-04-19 18:01:50 EDT
(In reply to comment #17)
> Btw, I can confirm that 2.6.18-138.el5virttest16 does boot on my 2x1ghz P3
> platform but XVC serial console redirection (ttyS0,9600n1) is broken and only
> posts the following:
> --
> Kernel 2.6.18-138.el5virttest16 on an i686
>  Filesystem type is ext2fs, partition type 0x83
> dhcp-49 login: -2.6.18-138.el5virttest16 ro root=/dev/VolGroup00/LogVol00 conso
> le=xvc console=tty xencons=xvc
>    [Linux-bzImage, setup=0x1e00, size=0x1beb74]
> initrd /initrd-2.6.18-138.el5virttest16.img
>    [Linux-initrd @ 0x37cf2000, 0x2fd280 bytes]
> 
> ÿ  <---- that's the last character.  Initally looks like baud mismatch but no
> other characters come up so I don't think Xen is taking over the serial port as
> expected
> --  

Seems like a new issue, unrelated to this BZ.  Please open a new bugzilla on your issue.

Thanks,

P.
Comment 19 Chris Lalancette 2009-04-22 06:43:04 EDT
(In reply to comment #17)
> Btw, I can confirm that 2.6.18-138.el5virttest16 does boot on my 2x1ghz P3
> platform but XVC serial console redirection (ttyS0,9600n1) is broken and only
> posts the following:
> --
> Kernel 2.6.18-138.el5virttest16 on an i686
>  Filesystem type is ext2fs, partition type 0x83
> dhcp-49 login: -2.6.18-138.el5virttest16 ro root=/dev/VolGroup00/LogVol00 conso
> le=xvc console=tty xencons=xvc
>    [Linux-bzImage, setup=0x1e00, size=0x1beb74]
> initrd /initrd-2.6.18-138.el5virttest16.img
>    [Linux-initrd @ 0x37cf2000, 0x2fd280 bytes]
> 
> ÿ  <---- that's the last character.  Initally looks like baud mismatch but no
> other characters come up so I don't think Xen is taking over the serial port as
> expected
> --  

As Prarit said, that's something else.  Although to be honest, I can't imagine what could cause that in the virttest kernels.  In any case, please open up a new BZ, with details of which kernel, which guest, which dom0, and the output from /boot/grub/grub.conf.

Chris Lalancette
Comment 20 Don Zickus 2009-04-27 12:00:39 EDT
in kernel-2.6.18-141.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 23 errata-xmlrpc 2009-09-02 05:01:10 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.