+++ This bug was initially created as a clone of Bug #217243 +++ Description of problem: When running xm block-attach to a PV domU, if the attach fails, the device is not un-set properly. In particular, the following sequence causes a problem: 1. Start up RHEL-5 domU. 2. In the domU, run "modprobe sd_mod" 3. On the dom0, run "xm block-attach rhel5-file file://tmp/testblock.img /dev/sda w" The block-attach command will seemingly succeed. However, in the domU, you will see the following error messages: Registering block device major 8 register_blkdev: cannot get major 8 for sd xen_blk: can't get major 8 with name sd vbd vbd-2160: 19 xlvbd_add at /local/domain/0/backend/vbd/1/2160 Now, trying to run "xm block-detach rhel5-file /dev/sda" on the dom0 will say: Error: Device /dev/sda not connected Usage: xm block-detach <Domain> <DevId> Destroy a domain's virtual block device. It *thinks* the block device isn't set up, but trying to re-attach with the same file spits an error (I don't have it right this moment, I'll attach it later). So there is no (easy) way to detach the broken block device. I did a little bit of debugging on this. On the domU side, when the "cannot get major 8 with name sd" message is printed, it looks like the kernel correctly writes an error into the xenstore. The problem is that the scripts on the dom0 side never check for the error in the xenstore, and hence never know that it wasn't set up properly. In particular the /etc/xen/scripts/block script doesn't actually check for any errors. I think the solution here is probably to properly check for errors in the block-attach, and then un-setup the loop device and in xenstore (on the dom0) when it fails. This will also get rid of the xm block-detach problem. -- Additional comment from gcosta on 2006-11-29 14:32 EST -- Chris, how does the relevant parts of xenstore-ls looks like ? I'm also unable to detach them, but due to a different problem: kernel gives the error message, but hotplug-status in xenstore appears as connected. (Which partially explains it )
Created attachment 142776 [details] fix for it Briefly, problem is: When frontend finds any problem, it begins the Closing protocol. It also happens with migration, and backend has no simply way to differentiate between then. This is done through an "online = 1" state in xenstore. However, frontend is not able to set its own state to "online = 0", as it lives in the backend entries. Solution is to check if frontend is okay in the transition to Closing to Close, and properly unregister the device as we would do in case of a backend-triggered detach.
Created attachment 143570 [details] addition of --force option Upstream solution will most probably go through the addition of a --force option. Here's the patch for it. Waiting for upstream status...
QE ack for RHEL5.
in 3.0.3-18.el5
xen-3.0.3-22.el5 included in 20070125.0.