Description of problem: When running a 2.6.18-148.el5 kernel in a Xen fully virtualized guest, on shutdown I see these messages on the console: xenbus_dev_shutdown: device/vbd/5632: Initialised != Connected, skipping xenbus_dev_shutdown: device/vbd/768: Closed != Connected, skipping These messages are new; prior to the -133 kernel, these messages didn't appear. In fact, these messages starting appearing after we removed the anaconda hack from RHEL-5 with this git tree commit: eb0e4567548b8ff1777b0bfd1f546c82fe8d0f73. Backing that out gets rid of the messages, but of course isn't a real solution. We'll have to look at it more closely to try to figure out what is going on and come up with a better solution.
After rooting around in the xenbus frontend state machine, this is yet another problem because of us presenting two different "paths" to the storage. In the case of the 768 device, that is our root harddrive. When udev starts up on this 5.4 kernel, it auto-loads the blkfront driver, since there is a device that corresponds to this. However, when it tries to use the device, it finds it is already in use (because the root drive is being driven by the IDE driver), fails, and sets the xenbus state to "XenbusStateClosed". On shutdown, the generic device model runs through all of the busses, and calls the ->shutdown() method on each of them. When it hits the xen bus, it calls xenbus_dev_shutdown(), and then the state of 768 is already "Closed", so it prints the error message. The 5632 device is similar. In that case, it's the CDROM device, but it has no backend plugged in, so it never transitions out of the Initialised state. Again, during shutdown, xenbus_dev_shutdown() notices this, and prints out the error message. The thing is, neither of these cases is "wrong". For the 768 device, it should be set to Closed, since another driver is already driving the storage. And in the case of the 5632 device, it should still be in Initialising, since the frontend never connected up to it. Therefore, my recommendation here would be that we just remove the printk(); it doesn't seem like it is really serving much of a use anyway, and I don't think we'll be hiding bugs by getting rid of it. Don, what do you think? Chris Lalancette
Chris, I had confirmed the new printk's came from the addition of xen pv drivers to FV/bare-metal RHEL5, and new they were related to the (er, um, 'less than optimal) dual paths to the root device, & an unconnected cdrom. (I confirmed the messages existed in an early rev of xen pv on RHEL5 before the anaconda workaround). What the anaconda workaround did is remove unconnected blk dev's at the end of dev probe, solving anaconda's crash-&-burn of empty xvda, *and* these shutdown warning msgs. Thanks for figuring out it was the generic device-shutdown that was triggering the call to do the shutdowns. I hadn't gotten to that point yet. So, IMO, we ought to look at a couple options: (a) does upstream do something different now that keeps these messages from appearing, & can we use it ? (on my to-do list) (b) see if we can qualify the printk's for this scenario -- we know root dev number (768), so why not filter on it ? -- similarly, Initialised & 5632 can be another filter'd non-printk ? -- it's unlikely to actually use an xvdc & have gets stuck at Initialise at shutdown I'm not in favor of nuking the printk's entirely, since I'm sure they are helpful/useful to debug scenarios when new types of xen devices are added but the driver's don't configure properly. Of near-zero concern for RHEL, but I'd like to not stray from upstream any more than we have to. - Don
(In reply to comment #3) > Chris, > > I had confirmed the new printk's came from the addition of xen pv drivers to > FV/bare-metal RHEL5, and new they were related to the (er, um, 'less than > optimal) dual paths to the root device, & an unconnected cdrom. > (I confirmed the messages existed in an early rev of xen pv on RHEL5 before the > anaconda workaround). > What the anaconda workaround did is remove unconnected blk dev's at the end of > dev probe, solving anaconda's crash-&-burn of empty xvda, *and* these shutdown > warning msgs. > > Thanks for figuring out it was the generic device-shutdown that was triggering > the call to do the shutdowns. I hadn't gotten to that point yet. > > So, IMO, we ought to look at a couple options: > (a) does upstream do something different now that keeps these messages from > appearing, & can we use it ? (on my to-do list) Right, good question. I poked around at the upstream sources, and I didn't see anything very different in the state machine, but I certainly could have missed something. We should test and see for sure. > (b) see if we can qualify the printk's for this scenario > -- we know root dev number (768), so why not filter on it ? > -- similarly, Initialised & 5632 can be another filter'd non-printk ? > -- it's unlikely to actually use an xvdc & have gets stuck at > Initialise at shutdown Unfortunately, I don't think this solution will work. For instance, I just added another IDE hard drive to my guest, hdb, and on shutdown I now see: xenbus_dev_shutdown: device/vbd/5632: Initialised != Connected, skipping xenbus_dev_shutdown: device/vbd/832: Closed != Connected, skipping xenbus_dev_shutdown: device/vbd/768: Closed != Connected, skipping Where 832 is the second IDE hard drive. So any devices that have the dual paths to them are going to show this problem. > I'm not in favor of nuking the printk's entirely, since I'm sure they are > helpful/useful to debug scenarios when new types of xen devices are added but > the driver's don't configure properly. True, but you can also debug this from the dom0 side using "xenstore-ls" and friends. I'm not sure it's a huge added benefit in the guest side. > Of near-zero concern for RHEL, but I'd like to not stray from upstream any > more than we have to. Right. So, I think there are 3 things to consider here (in order of what I would prefer): 1) Check what upstream is doing, and if there is indeed a difference, take that. 2) If 1) doesn't yield anything, remove the printk 3) If we can't agree on 2), put back your "anaconda" hack; it's not required for anaconda anymore, but it will get rid of this warning. Chris Lalancette
Created attachment 350224 [details] Posted patch Posted patch
in kernel-2.6.18-157.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html