Bug 494114

Summary: 2.6.18-128.1.6.el5xen panic!
Product: Red Hat Enterprise Linux 5 Reporter: Alexander Lindqvist <alexander>
Component: kernel-xenAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 5.3CC: clalance, dzickus, mishu, pasteur, qcai, rhelbugzilla, riel, xen-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 09:01:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
2.6.18-128.1.6.el5xen panic
none
Just before 2.6.18-128.1.6.el5xen panics
none
Server booting 2.6.18-128.1.6.el5xen filmed with mobile.
none
Capture of serial console none

Description Alexander Lindqvist 2009-04-04 13:57:57 UTC
Created attachment 338165 [details]
2.6.18-128.1.6.el5xen panic

Description of problem:
2.6.18-128.1.6.el5xen panics during boot.

Version-Release number of selected component (if applicable):
2.6.18-128.1.6.el5xen

How reproducible:
Always during boot.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
panics during boot and restarts server in an endless loop.
2.6.18-92.1.22.el5xen works 

Expected results:
boot without crashing

Additional info:
2 identical Proliant 6400R (both panics)
Server config:
4x P3 Xeon 550MHz 2MB Cache
HP SmartArray 5304
4GB RAM
CentOS 5.3
Upgraded DomU's first to CentOS 5.3 kernel 2.6.18-128.1.6 and they are running ok on this hardware.

Dom0 boots 2.6.18-128.1.6.el5xen kernel but reboots in the middle of the boot process. 2.6.18-92.1.22el5xen boots fine.
It is probably the same problem with the bare metal kernel but this is untested.

Comment 1 Alexander Lindqvist 2009-04-04 13:59:35 UTC
Created attachment 338166 [details]
Just before 2.6.18-128.1.6.el5xen panics

Comment 2 Alexander Lindqvist 2009-04-04 14:02:49 UTC
Created attachment 338167 [details]
Server booting 2.6.18-128.1.6.el5xen filmed with mobile.

Opens in VLC.

Comment 3 Alexander Lindqvist 2009-04-04 14:04:21 UTC
CentOS bugtracker: http://bugs.centos.org/view.php?id=3489

Comment 4 Rik van Riel 2009-04-04 14:25:59 UTC
Are you able to reproduce using RHEL?

Comment 5 Alexander Lindqvist 2009-04-04 16:08:27 UTC
No as I don't have access to RHEL 5 software. 
I could download a test kernel from dzickus or a clalance virttest kernel if you want? If so please specify which kernel you want me to test.

Comment 6 Chris Lalancette 2009-04-05 08:55:13 UTC
If you could test with the virttest kernels at http://people.redhat.com/clalance/virttest, that would be useful.  That being said, I don't know that we have any fixes in place for something like this, so I'm not that hopeful it will make a difference.  Also if you can test whether the bare-metal kernel has the same problem, that would be useful.

Finally, if at all possible, getting a serial console output of the crash would be extremely useful.  While the movie shows the crash, it's too blurry and short to really read the OOPs output, so it's hard to see what's going on.

Chris Lalancette

Comment 7 Alexander Lindqvist 2009-04-05 11:13:43 UTC
Created attachment 338219 [details]
Capture of serial console

Capture of serial console during boot of 2.6.18-128.1.5.el5xen

Comment 8 Alexander Lindqvist 2009-04-05 11:21:34 UTC
Attachment above is 2.6.18-128.1.(6).el5xen (misstyped)

kernel-xen-2.6.18-137.el5virttest15.i686.rpm tested and has the same problem.

Comment 9 Chris Lalancette 2009-04-05 11:24:33 UTC
OK, great, that's very good info.  I've asked one of the PCI bus enumerations experts to have a quick look at this BZ, but in all likelihood it won't be until tomorrow.

Chris Lalancette

Comment 10 Chris Lalancette 2009-04-06 10:27:20 UTC
OK, it seems that there is a patch available that *should* fix this issue. 
I've built a test kernel with it; it's available at:

http://people.redhat.com/clalance/bz494114

Can you give this test kernel a try, and see if it fixes the issue for you?

Thanks,
Chris Lalancette

Comment 13 Alexander Lindqvist 2009-04-06 15:16:27 UTC
That kernel did it !
Im running both servers on this kernel with 8 paravirt guests now and so far so good.

Can you confirm which kernel release will contain this bugfix ?

Comment 14 Chris Lalancette 2009-04-06 15:35:37 UTC
OK, great, thanks for testing.  This patch is currently slated for 5.4, barring any problems we find with it.  I'm going to close this as a dup of BZ 470202.

Chris Lalancette

*** This bug has been marked as a duplicate of bug 470202 ***

Comment 15 Prarit Bhargava 2009-04-07 13:00:12 UTC
Un-duped by clalance, and POSTed by me.

P.

Comment 16 David Ranch 2009-04-19 14:51:56 UTC
I posted a comment to the 481500 bug post which tracked this issue: https://bugzilla.redhat.com/show_bug.cgi?id=481500 and Chris Lalancette redirected me here.  Considering the wait until the 5.4 release is a ways out, can we get the specific PCI enumeration patch for this issue so that we can apply them to the released 5.3 kernels?  I'd rather not run a more experimental kernel than I need to.  Btw, I can reproduce this problem without the Xen HV so this is more of a core kernel issue than a virtualization specific issue.

Comment 17 David Ranch 2009-04-19 16:29:50 UTC
Btw, I can confirm that 2.6.18-138.el5virttest16 does boot on my 2x1ghz P3 platform but XVC serial console redirection (ttyS0,9600n1) is broken and only posts the following:
--
Kernel 2.6.18-138.el5virttest16 on an i686
 Filesystem type is ext2fs, partition type 0x83
dhcp-49 login: -2.6.18-138.el5virttest16 ro root=/dev/VolGroup00/LogVol00 conso
le=xvc console=tty xencons=xvc
   [Linux-bzImage, setup=0x1e00, size=0x1beb74]
initrd /initrd-2.6.18-138.el5virttest16.img
   [Linux-initrd @ 0x37cf2000, 0x2fd280 bytes]

ÿ  <---- that's the last character.  Initally looks like baud mismatch but no other characters come up so I don't think Xen is taking over the serial port as expected
--

Comment 18 Prarit Bhargava 2009-04-19 22:01:50 UTC
(In reply to comment #17)
> Btw, I can confirm that 2.6.18-138.el5virttest16 does boot on my 2x1ghz P3
> platform but XVC serial console redirection (ttyS0,9600n1) is broken and only
> posts the following:
> --
> Kernel 2.6.18-138.el5virttest16 on an i686
>  Filesystem type is ext2fs, partition type 0x83
> dhcp-49 login: -2.6.18-138.el5virttest16 ro root=/dev/VolGroup00/LogVol00 conso
> le=xvc console=tty xencons=xvc
>    [Linux-bzImage, setup=0x1e00, size=0x1beb74]
> initrd /initrd-2.6.18-138.el5virttest16.img
>    [Linux-initrd @ 0x37cf2000, 0x2fd280 bytes]
> 
> ÿ  <---- that's the last character.  Initally looks like baud mismatch but no
> other characters come up so I don't think Xen is taking over the serial port as
> expected
> --  

Seems like a new issue, unrelated to this BZ.  Please open a new bugzilla on your issue.

Thanks,

P.

Comment 19 Chris Lalancette 2009-04-22 10:43:04 UTC
(In reply to comment #17)
> Btw, I can confirm that 2.6.18-138.el5virttest16 does boot on my 2x1ghz P3
> platform but XVC serial console redirection (ttyS0,9600n1) is broken and only
> posts the following:
> --
> Kernel 2.6.18-138.el5virttest16 on an i686
>  Filesystem type is ext2fs, partition type 0x83
> dhcp-49 login: -2.6.18-138.el5virttest16 ro root=/dev/VolGroup00/LogVol00 conso
> le=xvc console=tty xencons=xvc
>    [Linux-bzImage, setup=0x1e00, size=0x1beb74]
> initrd /initrd-2.6.18-138.el5virttest16.img
>    [Linux-initrd @ 0x37cf2000, 0x2fd280 bytes]
> 
> ÿ  <---- that's the last character.  Initally looks like baud mismatch but no
> other characters come up so I don't think Xen is taking over the serial port as
> expected
> --  

As Prarit said, that's something else.  Although to be honest, I can't imagine what could cause that in the virttest kernels.  In any case, please open up a new BZ, with details of which kernel, which guest, which dom0, and the output from /boot/grub/grub.conf.

Chris Lalancette

Comment 20 Don Zickus 2009-04-27 16:00:39 UTC
in kernel-2.6.18-141.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 23 errata-xmlrpc 2009-09-02 09:01:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html