691339 – RHEL6.1 HVM guest with hda+hdc or hdb+hdd crashes; plus hdb/hdd are mapped incorrectly to xvde

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 691339 - RHEL6.1 HVM guest with hda+hdc or hdb+hdd crashes; plus hdb/hdd are mapped incorrectly to xvde

Summary: RHEL6.1 HVM guest with hda+hdc or hdb+hdd crashes; plus hdb/hdd are mapped in...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Andrew Jones
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	690763 718069 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-03-28 09:59 UTC by Yuyu Zhou
Modified:	2011-08-09 07:05 UTC (History)
CC List:	8 users (show)
Fixed In Version:	kernel-2.6.32-130.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-05-23 20:49:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
console log (19.72 KB, text/plain) 2011-03-28 09:59 UTC, Yuyu Zhou	no flags	Details
dmesg log (42.96 KB, text/plain) 2011-03-28 10:00 UTC, Yuyu Zhou	no flags	Details
xm dmesg log (16.00 KB, text/plain) 2011-03-28 10:01 UTC, Yuyu Zhou	no flags	Details
xend log (15.08 KB, text/plain) 2011-03-28 10:01 UTC, Yuyu Zhou	no flags	Details
qemu log (2.11 KB, text/plain) 2011-03-28 10:02 UTC, Yuyu Zhou	no flags	Details
hvm configure file (561 bytes, text/plain) 2011-03-28 10:02 UTC, Yuyu Zhou	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0542	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update	2011-05-19 11:58:07 UTC

Description Yuyu Zhou 2011-03-28 09:59:28 UTC

Description of problem:
Start a RHEL6.1 HVM guest with more than two "file:raw:hdx" QEMU IDE disks cause guest crash.

Version-Release number of selected component (if applicable):
Host:
xen-3.0.3-126.el5
kernel-xen-2.6.18-252.el5
Intel 64bit

Guest:
RHEL-Server-6.1-32-20110323.1 HVM guest.
RHEL-Server-6.1-64-20110323.1 HVM guest.

How reproducible:
100%

Steps to Reproduce:
1. Edit guest configure file with:
disk = ["file:/path/your_vm.raw,hda,w","file:/path/blank_disk1.raw,hdb,w","file:/path/blank_disk2.raw,hdc,w"]
2. Start the guest with 'xm cr your.conf'

Actual results:
VM crashed.

Expected results:
VM should boot and run successfully.

Additional info:
RHEL6.1-20110317.1 HVM guest boot and run successfully on the same condition for both 32bit and 64 bit. 
RHEL5.5 HVM guests also boot and run sucessfully on the same condition.
RHEL6.1-20110323.1 HVM guest boot and run successfully when there are only one or two QEMU IDE disks.

Comment 1 Yuyu Zhou 2011-03-28 09:59:54 UTC

Created attachment 488112 [details]
console log

Comment 2 Yuyu Zhou 2011-03-28 10:00:31 UTC

Created attachment 488113 [details]
dmesg log

Comment 3 Yuyu Zhou 2011-03-28 10:01:12 UTC

Created attachment 488114 [details]
xm dmesg log

Comment 4 Yuyu Zhou 2011-03-28 10:01:40 UTC

Created attachment 488115 [details]
xend log

Comment 5 Yuyu Zhou 2011-03-28 10:02:06 UTC

Created attachment 488116 [details]
qemu log

Comment 7 Yuyu Zhou 2011-03-28 10:02:35 UTC

Created attachment 488117 [details]
hvm configure file

Comment 9 Andrew Jones 2011-03-28 12:51:19 UTC

(In reply to comment #0)
> Additional info:
> RHEL6.1-20110317.1 HVM guest boot and run successfully on the same condition
> for both 32bit and 64 bit. 

This doesn't seem right. First, there isn't a RHEL6.1-20110317.1 on the nay
repo server to check the kernel release. There is a RHEL6.1-20110315.n.0, which
has kernel -122, but it doesn't matter because I've reproduced the guest crash
with -120.

It's true that it may appear like a regression between 6.0 and 6.1 though. 6.1
uses pv-drivers, not qemu, by default, so it has trouble with configs that use
the special hdc and hdd names. There are two ways to deal with this. 1) change
the hdc to xvdc (this makes the most sense as you continue using pv-drivers).
2) boot with xen_emul_unplug=never on the kernel command line. This turns off
pv-drivers and returns the qemu behaviour of 6.0.

I've added ddutile to the CC for him to add any other comments. I know this
issue was considered, but perhaps in the end it was decided breaking these
configs was a necessary evil when changing the default to pv-drivers.

Comment 10 Qixiang Wan 2011-03-28 13:19:15 UTC

Additional Info:

[1] kernel for RHEL6.1-20110317.1 is 122.

http://download.englab.nay.redhat.com/pub/rhel/rel-eng/RHEL6.1-20110317.1/6/Server/x86_64/os/Server/Packages/listing

[2] reporter said it can't be reproduced with RHEL6.1-20110317.1 because we already append 'xen_emul_unplug=never' in kernel parameters of those images. So the workaround of booting up xen_emul_unplug=never works just as Drew said.

Hi Don, do you think the changes of xenpv driver (include xen_emul_unplug behaviour) in RHEL6.1 should be included in RHEL6.1 release/tech notes? I checked the Release notes, it's not mentioned.

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.1_Release_Notes/virtualization.html

Comment 11 Don Dutile (Red Hat) 2011-03-28 15:24:13 UTC

Andrew:
Why does unplug not unplug the hdc & hdd drives?
The loop goes from 0->3, which should include hdc & hdd.
scanning through qemu, it appears that hd[2] which == hdc, is
_always_ set_type_hint()'d to cdrom, which will not unplug it,
which will cause a problem on boot.

Should we remove:
        if (s->is_cdrom)
            continue; /* cdrom */
in  _ide_unplug_harddisks(), since pv-cdrom isn't supported anyhow?
or maybe the is_cdrom check should start a deeper probe/check of the 
device to see if the 'hint' is valid (verify it's a cdrom vs a blkdev)?

Could the reporter also show the boot log of the guest when it fails?
capture with 'xm console <guest-id>' (and add console=tty0 console=ttyS0,115200,n8 to guest boot line)?

Comment 12 Yuyu Zhou 2011-03-29 02:17:31 UTC

Don:
The log named "console log" is the boot log you need.

Comment 13 Don Dutile (Red Hat) 2011-03-29 22:47:11 UTC

ok, i tried to do it on my rhel5 test setup:

kernel-xen-250
xen-3.0.3-126
rhel6: 2.6.32-125

and it all worked for me.
I added the hd disk via virt-manager, and my disk spec is:
disk = [ "file:/var/lib/xen/images/rhel6do1-64-hvm-3.img,hda,w", ",hdc:cdrom,r", "file:/guest-images/rhel5_64_fv.dsk2,hdb,w" ]


Here's what the dmesg | grep xvd says:
from /dev/hd[a-d] to /dev/xvd[a-d]
blkfront: xvda: barriers disabled
 xvda: xvda1 xvda2
blkfront: xvde: barriers disabled
 xvde: xvde1 xvde2
dracut: Scanning devices xvda2 xvde2  for LVM logical volumes vg_rhel664hvm3/lv_root vg_rhel664hvm3/lv_swap 
EXT4-fs (xvda1): mounted filesystem with ordered data mode
SELinux: initialized (dev xvda1, type ext4), uses xattr


Q: did you upgrade the rhel5 dom0, and use existing rhel6 guest that was installed & working as hda/rtl8139 at first, and then switched to xvd<> & vnif???


I did a clean install of a rhel6 guest and then added the additional hd disk to it.

Comment 14 Yuyu Zhou 2011-03-30 02:39:24 UTC

##Summary:
Only the QEMU IDE disks will crash the guest, the QEMU cdrom works fine.
 
##Details:

Disk spec like following work fine:

disk = [ "file:/var/lib/xen/images/rhel6do1-64-hvm-3.img,hda,w",
",hdc:cdrom,r", "file:/guest-images/rhel5_64_fv.dsk2,hdb,w" ]

or

disk = [ "file:/var/lib/xen/images/rhel6do1-64-hvm-3.img,hda,w",
"file/path/iso,hdc:cdrom,r", "file:/guest-images/rhel5_64_fv.dsk2,hdb,w" ]

But disk spec as:

disk =["file:/path/your_vm.raw,hda,w",
"file:/path/blank_disk1.raw,hdb,w","file:/path/blank_disk2.raw,hdc,w"]

or

disk =["file:/path/your_vm.raw,hda,w",
"file:/path/blank_disk1.raw,hdb,w","file:/path/blank_disk2.raw,hdd,w"]

will crash the guest.

Comment 15 Andrew Jones 2011-03-31 16:17:23 UTC

*** Bug 690763 has been marked as a duplicate of this bug. ***

Comment 16 Andrew Jones 2011-03-31 17:59:31 UTC

From the console output we have

WARNING: at fs/sysfs/dir.c:491 sysfs_add_one+0xc9/0x130() (Not tainted)
sysfs: cannot create duplicate filename '/devices/virtual/bdi/202:0'
...
kobject_add_internal failed for 202:0 with -EEXIST, don't try to register
things with the same name in the same directory.
...
kernel BUG at fs/sysfs/group.c:65!

The BUG follows from the WARNING because it's from an assert of the sysfs
dirent not being null. It is, of course, because it didn't get created due to
EEXIST.

So the question is why is the same minor number of 0 getting used twice when we
have a third hd device in the config?

I also dumped a core of the guest after the crash and see the the kobject it's
BUGing on has the name "xvda", which works fine when we have only one or two
devices. That indicates to me that for the third device the xenbus is somehow
presenting xvda twice. That theory is further backed by the console output,
which looks like the following right before the WARNING

xlblk_init: register_blkdev major: 202 
  alloc irq_desc for 17 on node 0
  alloc kstat_irqs on node 0
blkfront: xvda: barriers disabled
  alloc irq_desc for 18 on node 0
  alloc kstat_irqs on node 0
 xvda: xvda1 xvda2
  alloc irq_desc for 19 on node 0
  alloc kstat_irqs on node 0
blkfront: xvde: barriers disabled
 xvde: xvde1
blkfront: xvda: barriers disabled

So we see "blkfront: xvda: barriers disabled" twice. With only two devices, which boots up, we have these logs instead

xlblk_init: register_blkdev major: 202 
  alloc irq_desc for 17 on node 0
  alloc kstat_irqs on node 0
  alloc irq_desc for 18 on node 0
  alloc kstat_irqs on node 0
blkfront: xvda: barriers disabled
vbd vbd-5632: 19 xenbus_dev_probe on device/vbd/5632
 xvda: xvda1 xvda2
blkfront: xvde: barriers disabled
 xvde: xvde1

"blkfront: xvda: barriers disabled" is only output once.

Comment 17 Paolo Bonzini 2011-04-01 07:24:23 UTC

hda and hdc share the same minor number:

$ MAKEDEV -d `pwd` xvda xvdb xvdc xvdd
$ ls -l hd? xvd?
brw-r----- 1 root disk   3,  0 Apr  1 09:23 hda
brw-r----- 1 root disk   3, 64 Apr  1 09:23 hdb
brw-r----- 1 root disk  22,  0 Apr  1 09:23 hdc
brw-r----- 1 root disk  22, 64 Apr  1 09:23 hdd
brw-r----- 1 root disk 202,  0 Apr  1 09:23 xvda
brw-r----- 1 root disk 202, 16 Apr  1 09:23 xvdb
brw-r----- 1 root disk 202, 32 Apr  1 09:23 xvdc
brw-r----- 1 root disk 202, 48 Apr  1 09:23 xvdd

We need to fix the kernel to preserve the "letter" rather than the minor.

Comment 18 Andrew Jones 2011-04-01 10:21:01 UTC

(In reply to comment #17)
> hda and hdc share the same minor number:

Ah, indeed. I see that upstream addressed this

"Prevent adding a disk with the same (major, minor) [and hence the same name and sysfs entries, which leads to oopses]"

as part of a fix for a different issue,

"Prevent prematurely freeing 'struct blkfront_info' instances (when the xenbus data structures are gone, but the Linux ones are still needed)."

The patch is

commit 0e34582699392d67910bd3919bc8fd9bedce115e
Author: Jan Beulich <jbeulich>
Date:   Sat Aug 7 18:28:55 2010 +0200

    blkfront: fixes for 'xm block-detach ... --force'

I'll see if a backport of this resolves this bug.

Comment 19 Paolo Bonzini 2011-04-01 11:05:03 UTC

It would likely fix the oops, and it's likely a good thing to have it in pvops, but not the bug of hdb/hdd being mapped mysteriously to xvde.  I think the right fix is to correct the minor numbers (>> 2).

Comment 20 Andrew Jones 2011-04-01 14:32:46 UTC

(In reply to comment #19)
> It would likely fix the oops, and it's likely a good thing to have it in pvops,
> but not the bug of hdb/hdd being mapped mysteriously to xvde.  I think the
> right fix is to correct the minor numbers (>> 2).

You're right again. I looked more at upstream xen-blkfront and found a better patch to backport (cherry-pick)

commit c80a420995e721099906607b07c09a24543b31d9
Author: Stefano Stabellini <stefano.stabellini.com>
Date:   Thu Dec 2 17:55:00 2010 +0000

    xen-blkfront: handle Xen major numbers other than XENVBD

The previous one is a good one to have too, but it doesn't solve our issue at hand. This one does. I'll post this one for 6.1 to avoid regressing guests with hd? configs and add the other one to a list of backports to consider for later.

Comment 21 RHEL Program Management 2011-04-01 14:40:04 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 22 Paolo Bonzini 2011-04-01 15:03:20 UTC

Yes, this one is the right one.  Thanks!

Comment 23 Aristeu Rozanski 2011-04-07 13:52:54 UTC

Patch(es) available on kernel-2.6.32-130.el6

Comment 26 Yufang Zhang 2011-04-12 08:23:50 UTC

Tested with 2.6.32-130.el6:

Config the guest disk like this:

disk = ['file:/xen-autotest/xen-autotest/client/tests/xen/images/RHEL-Server-6.1-64-hvm.raw,hda,w', 'file:/root/test.img,hdb,w', 'file:/root/test2.img,hdc,w']

With -130 kernel, the guest would boot successfully. Otherwise with -128 kernel, the guest would crash.

Change this bug to VERIFIED.

Comment 27 errata-xmlrpc 2011-05-23 20:49:41 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Comment 28 Paolo Bonzini 2011-08-09 07:05:34 UTC

*** Bug 718069 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.