+++ This bug was initially created as a clone of Bug #217243 +++
Description of problem:
When running xm block-attach to a PV domU, if the attach fails, the device is
not un-set properly. In particular, the following sequence causes a problem:
1. Start up RHEL-5 domU.
2. In the domU, run "modprobe sd_mod"
3. On the dom0, run "xm block-attach rhel5-file file://tmp/testblock.img
The block-attach command will seemingly succeed. However, in the domU, you will
see the following error messages:
Registering block device major 8
register_blkdev: cannot get major 8 for sd
xen_blk: can't get major 8 with name sd
vbd vbd-2160: 19 xlvbd_add at /local/domain/0/backend/vbd/1/2160
Now, trying to run "xm block-detach rhel5-file /dev/sda" on the dom0 will say:
Error: Device /dev/sda not connected
Usage: xm block-detach <Domain> <DevId>
Destroy a domain's virtual block device.
It *thinks* the block device isn't set up, but trying to re-attach with the same
file spits an error (I don't have it right this moment, I'll attach it later).
So there is no (easy) way to detach the broken block device.
I did a little bit of debugging on this. On the domU side, when the "cannot get
major 8 with name sd" message is printed, it looks like the kernel correctly
writes an error into the xenstore. The problem is that the scripts on the dom0
side never check for the error in the xenstore, and hence never know that it
wasn't set up properly. In particular the /etc/xen/scripts/block script doesn't
actually check for any errors.
I think the solution here is probably to properly check for errors in the
block-attach, and then un-setup the loop device and in xenstore (on the dom0)
when it fails. This will also get rid of the xm block-detach problem.
-- Additional comment from firstname.lastname@example.org on 2006-11-29 14:32 EST --
Chris, how does the relevant parts of xenstore-ls looks like ?
I'm also unable to detach them, but due to a different problem:
kernel gives the error message, but hotplug-status in xenstore appears as
connected. (Which partially explains it )
Created attachment 142776 [details]
fix for it
Briefly, problem is:
When frontend finds any problem, it begins the Closing protocol. It also
happens with migration, and backend has no simply way to differentiate between
then. This is done through an "online = 1" state in xenstore. However, frontend
is not able to set its own state to "online = 0", as it lives in the backend
Solution is to check if frontend is okay in the transition to Closing to Close,
and properly unregister the device as we would do in case of a
Created attachment 143570 [details]
addition of --force option
Upstream solution will most probably go through the addition of a --force
Here's the patch for it. Waiting for upstream status...
QE ack for RHEL5.
xen-3.0.3-22.el5 included in 20070125.0.