501474 – [RHEL5.4 Xen]: Xenbus warnings in a FV guest on shutdown

Bug 501474 - [RHEL5.4 Xen]: Xenbus warnings in a FV guest on shutdown

Summary: [RHEL5.4 Xen]: Xenbus warnings in a FV guest on shutdown

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Don Dutile (Red Hat)
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	505081
TreeView+	depends on / blocked

Reported:	2009-05-19 11:34 UTC by Chris Lalancette
Modified:	2009-09-02 08:56 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	505081 (view as bug list)
Environment:
Last Closed:	2009-09-02 08:56:07 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Posted patch (654 bytes, patch) 2009-07-01 23:30 UTC, Don Dutile (Red Hat)	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1243	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update	2009-09-01 08:53:34 UTC

Description Chris Lalancette 2009-05-19 11:34:53 UTC

Description of problem:
When running a 2.6.18-148.el5 kernel in a Xen fully virtualized guest, on shutdown I see these messages on the console:

xenbus_dev_shutdown: device/vbd/5632: Initialised != Connected, skipping
xenbus_dev_shutdown: device/vbd/768: Closed != Connected, skipping

These messages are new; prior to the -133 kernel, these messages didn't appear.  In fact, these messages starting appearing after we removed the anaconda hack from RHEL-5 with this git tree commit: eb0e4567548b8ff1777b0bfd1f546c82fe8d0f73.  Backing that out gets rid of the messages, but of course isn't a real solution.  We'll have to look at it more closely to try to figure out what is going on and come up with a better solution.

Comment 1 Chris Lalancette 2009-06-10 13:32:48 UTC

After rooting around in the xenbus frontend state machine, this is yet another problem because of us presenting two different "paths" to the storage.

In the case of the 768 device, that is our root harddrive. When udev starts up on this 5.4 kernel, it auto-loads the blkfront driver, since there is a device that corresponds to this. However, when it tries to use the device, it finds it is already in use (because the root drive is being driven by the IDE driver), fails, and sets the xenbus state to "XenbusStateClosed". On shutdown, the generic device model runs through all of the busses, and calls the ->shutdown() method on each of them. When it hits the xen bus, it calls xenbus_dev_shutdown(), and then the state of 768 is already "Closed", so it prints the error message.

The 5632 device is similar. In that case, it's the CDROM device, but it has no backend plugged in, so it never transitions out of the Initialised state. Again, during shutdown, xenbus_dev_shutdown() notices this, and prints out the error message.

The thing is, neither of these cases is "wrong". For the 768 device, it should be set to Closed, since another driver is already driving the storage. And in the case of the 5632 device, it should still be in Initialising, since the frontend never connected up to it. Therefore, my recommendation here would be that we just remove the printk(); it doesn't seem like it is really serving much of a use anyway, and I don't think we'll be hiding bugs by getting rid of it. Don, what do you think?

Chris Lalancette

Comment 3 Don Dutile (Red Hat) 2009-06-10 18:13:37 UTC

Chris,

I had confirmed the new printk's came from the addition of xen pv drivers to FV/bare-metal RHEL5, and new they were related to the (er, um, 'less than optimal) dual paths to the root device, & an unconnected cdrom.
(I confirmed the messages existed in an early rev of xen pv on RHEL5 before the anaconda workaround).
What the anaconda workaround did is remove unconnected blk dev's at the end of dev probe, solving anaconda's crash-&-burn of empty xvda, *and* these shutdown warning msgs.

Thanks for figuring out it was the generic device-shutdown that was triggering
the call to do the shutdowns. I hadn't gotten to that point yet.

So, IMO, we ought to look at a couple options:
(a) does upstream do something different now that keeps these messages from appearing, & can we use it ? (on my to-do list)
(b) see if we can qualify the printk's for this scenario
     -- we know root dev number (768), so why not filter on it ?
     -- similarly, Initialised & 5632 can be another filter'd non-printk ?
        -- it's unlikely to actually use an xvdc & have gets stuck at
           Initialise at shutdown

I'm not in favor of nuking the printk's entirely, since I'm sure they are
helpful/useful to debug scenarios when new types of xen devices are added but
the driver's don't configure properly.
Of near-zero concern for RHEL, but I'd like to not stray from upstream any
more than we have to.

- Don

Comment 4 Chris Lalancette 2009-06-11 09:12:36 UTC

(In reply to comment #3)
> Chris,
> 
> I had confirmed the new printk's came from the addition of xen pv drivers to
> FV/bare-metal RHEL5, and new they were related to the (er, um, 'less than
> optimal) dual paths to the root device, & an unconnected cdrom.
> (I confirmed the messages existed in an early rev of xen pv on RHEL5 before the
> anaconda workaround).
> What the anaconda workaround did is remove unconnected blk dev's at the end of
> dev probe, solving anaconda's crash-&-burn of empty xvda, *and* these shutdown
> warning msgs.
> 
> Thanks for figuring out it was the generic device-shutdown that was triggering
> the call to do the shutdowns. I hadn't gotten to that point yet.
> 
> So, IMO, we ought to look at a couple options:
> (a) does upstream do something different now that keeps these messages from
> appearing, & can we use it ? (on my to-do list)

Right, good question.  I poked around at the upstream sources, and I didn't see anything very different in the state machine, but I certainly could have missed something.  We should test and see for sure.

> (b) see if we can qualify the printk's for this scenario
>      -- we know root dev number (768), so why not filter on it ?
>      -- similarly, Initialised & 5632 can be another filter'd non-printk ?
>         -- it's unlikely to actually use an xvdc & have gets stuck at
>            Initialise at shutdown

Unfortunately, I don't think this solution will work.  For instance, I just added another IDE hard drive to my guest, hdb, and on shutdown I now see:

xenbus_dev_shutdown: device/vbd/5632: Initialised != Connected, skipping
xenbus_dev_shutdown: device/vbd/832: Closed != Connected, skipping
xenbus_dev_shutdown: device/vbd/768: Closed != Connected, skipping

Where 832 is the second IDE hard drive.  So any devices that have the dual paths to them are going to show this problem.

> I'm not in favor of nuking the printk's entirely, since I'm sure they are
> helpful/useful to debug scenarios when new types of xen devices are added but
> the driver's don't configure properly.

True, but you can also debug this from the dom0 side using "xenstore-ls" and friends.  I'm not sure it's a huge added benefit in the guest side.

> Of near-zero concern for RHEL, but I'd like to not stray from upstream any
> more than we have to.

Right.  So, I think there are 3 things to consider here (in order of what I would prefer):

1)  Check what upstream is doing, and if there is indeed a difference, take that.
2)  If 1) doesn't yield anything, remove the printk
3)  If we can't agree on 2), put back your "anaconda" hack; it's not required for anaconda anymore, but it will get rid of this warning.

Chris Lalancette

Comment 12 Don Dutile (Red Hat) 2009-07-01 23:30:56 UTC

Created attachment 350224 [details]
Posted patch

Posted patch

Comment 14 Don Zickus 2009-07-07 15:05:12 UTC

in kernel-2.6.18-157.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 17 errata-xmlrpc 2009-09-02 08:56:07 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.