Bug 250381 - xenbus suspend_mutex remains locked after transaction failure
Summary: xenbus suspend_mutex remains locked after transaction failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen
Version: 4.5
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Don Dutile (Red Hat)
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-08-01 08:59 UTC by Ian Campbell
Modified: 2008-07-24 19:14 UTC (History)
2 users (show)

Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 19:14:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
xen-unstable 9921:bbce4d115189 ported to 2.6.9-67.EL (1.02 KB, application/octet-stream)
2007-12-14 09:00 UTC, Ian Campbell
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0665 0 normal SHIPPED_LIVE Moderate: Updated kernel packages for Red Hat Enterprise Linux 4.7 2008-07-24 16:41:06 UTC

Description Ian Campbell 2007-08-01 08:59:25 UTC
If a xenbus transaction end command fails it is possible for the suspend_mutex
to remain locked preventing any further xenbus traffic. e.g.
shutdown/reboot/suspend requests/notifications etc.

Kernel 2.6.9-55.0.2.EL is affected.

Upstream fix is
http://hg.uk.xensource.com/xen-unstable.hg/?cs=bbce4d115189

Comment 1 Ian Campbell 2007-12-14 09:00:33 UTC
Created attachment 288741 [details]
xen-unstable 9921:bbce4d115189 ported to 2.6.9-67.EL

We recently stopped using the rhel4x.hg port from xenbits and switched to using
a set of targetted fixes to your kernels. I have attached the patches from our
queue relevant to this issue.

Comment 2 Don Dutile (Red Hat) 2007-12-14 19:37:38 UTC
Is there a way to excite/force a transaction end failure, so
a test can be applied to show the problem, and that the fix works?


Comment 3 Don Dutile (Red Hat) 2007-12-14 19:39:48 UTC
This bug was fixed in 4.6 (in the linux-2.6.9-xen-newfiles.patch,
which included many fixes for hotplug.


Comment 4 Ian Campbell 2007-12-14 19:49:33 UTC
If I remember right you can reproduce by using xenstore-write in in a tight loop
the domU. i.e. something like "while : ; do xenstore-write foo bar ; done"

I checked 2.6.9-67.EL and it still has this problem. Is that not 4.6 kernel?

Comment 5 Don Dutile (Red Hat) 2007-12-14 20:00:42 UTC
2.6.9-67.EL is rhel4.6.

The fix that is shown in the attachment in #1 is in 4.6.

So, either you didn't test 4.6, or... the fix isn't sufficient,
or you built a -67 kernel without doing a "make prep", which
would not apply the patch listed in comment #3 to the file (before building).

Do you have the src.rpm for 4.6 to verify (from sources) that
the fix provided is the one in 4.6 ?




Comment 6 Ian Campbell 2007-12-15 09:41:20 UTC
I got my source tree by installing the .src.rpm and running rpmbuild -bp on the
spec file which leaves a source tree in /usr/src/redhat/BUILD/something, I am
pretty certain it has the patches applied or drivers/xen/xenbus/xenbus_xs.c
wouldn't even exist.

linux-2.6.9-xen-newfiles.patch in 2.6.9-67.EL contains as part of
drivers/xen/xenbus/xenbus_xs.c:xenbus_dev_request_and_reply():
+       if ((msg->type == XS_TRANSACTION_END) ||
+           ((req_msg.type == XS_TRANSACTION_START) &&
+            (msg->type == XS_ERROR)))
+               up_read(&xs_state.suspend_mutex);
and if 9921:bbce4d115189 was applied it would contain
+       if ((req_msg.type == XS_TRANSACTION_END) ||
+           ((req_msg.type == XS_TRANSACTION_START) &&
+            (msg->type == XS_ERROR)))
+               up_read(&xs_state.suspend_mutex);

Note the first line which has changed from msg->type to req_msg.type.

Comment 7 Don Dutile (Red Hat) 2007-12-17 03:28:44 UTC
My bad; I missed the subtlety of msg->type changed to req_msg.type.

I'll post a patch for 4.7 on Monday. Thanks for the test to verify the fix.


Comment 8 Ian Campbell 2007-12-17 11:33:04 UTC
Thanks, I always have to look at that particular patch twice, it's very easy to
mis-read...

Comment 9 Don Dutile (Red Hat) 2007-12-17 20:14:03 UTC
Well, the patch is actually part rhel5 & part rhel4.

The 'mutex_unlock' is in rhel5, but not rhel4;  rhel4 uses 'up'.

the patch applies, but with a fuzz warning;  i'll submit a clean rhel4 patch
that doesn't generate a patch warning.



Comment 10 Ian Campbell 2007-12-17 21:10:29 UTC
Yes, somehow quilt still applies the patch even though the context clearly
doesn't match -- I hadn't noticed that before.

Comment 11 Bill Burns 2008-01-04 19:14:47 UTC
Reopening for Don Dutile. Setting flags for 4.7.

Comment 12 Vivek Goyal 2008-03-03 20:39:47 UTC
Committed in 68.16.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 16 errata-xmlrpc 2008-07-24 19:14:53 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html


Note You need to log in before you can comment on or make changes to this bug.