Bug 250381 - xenbus suspend_mutex remains locked after transaction failure
xenbus suspend_mutex remains locked after transaction failure
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen (Show other bugs)
All Linux
low Severity low
: ---
: ---
Assigned To: Don Dutile
Martin Jenner
: Reopened
Depends On:
  Show dependency treegraph
Reported: 2007-08-01 04:59 EDT by Ian Campbell
Modified: 2008-07-24 15:14 EDT (History)
2 users (show)

See Also:
Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-07-24 15:14:53 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
xen-unstable 9921:bbce4d115189 ported to 2.6.9-67.EL (1.02 KB, application/octet-stream)
2007-12-14 04:00 EST, Ian Campbell
no flags Details

  None (edit)
Description Ian Campbell 2007-08-01 04:59:25 EDT
If a xenbus transaction end command fails it is possible for the suspend_mutex
to remain locked preventing any further xenbus traffic. e.g.
shutdown/reboot/suspend requests/notifications etc.

Kernel 2.6.9-55.0.2.EL is affected.

Upstream fix is
Comment 1 Ian Campbell 2007-12-14 04:00:33 EST
Created attachment 288741 [details]
xen-unstable 9921:bbce4d115189 ported to 2.6.9-67.EL

We recently stopped using the rhel4x.hg port from xenbits and switched to using
a set of targetted fixes to your kernels. I have attached the patches from our
queue relevant to this issue.
Comment 2 Don Dutile 2007-12-14 14:37:38 EST
Is there a way to excite/force a transaction end failure, so
a test can be applied to show the problem, and that the fix works?
Comment 3 Don Dutile 2007-12-14 14:39:48 EST
This bug was fixed in 4.6 (in the linux-2.6.9-xen-newfiles.patch,
which included many fixes for hotplug.
Comment 4 Ian Campbell 2007-12-14 14:49:33 EST
If I remember right you can reproduce by using xenstore-write in in a tight loop
the domU. i.e. something like "while : ; do xenstore-write foo bar ; done"

I checked 2.6.9-67.EL and it still has this problem. Is that not 4.6 kernel?
Comment 5 Don Dutile 2007-12-14 15:00:42 EST
2.6.9-67.EL is rhel4.6.

The fix that is shown in the attachment in #1 is in 4.6.

So, either you didn't test 4.6, or... the fix isn't sufficient,
or you built a -67 kernel without doing a "make prep", which
would not apply the patch listed in comment #3 to the file (before building).

Do you have the src.rpm for 4.6 to verify (from sources) that
the fix provided is the one in 4.6 ?

Comment 6 Ian Campbell 2007-12-15 04:41:20 EST
I got my source tree by installing the .src.rpm and running rpmbuild -bp on the
spec file which leaves a source tree in /usr/src/redhat/BUILD/something, I am
pretty certain it has the patches applied or drivers/xen/xenbus/xenbus_xs.c
wouldn't even exist.

linux-2.6.9-xen-newfiles.patch in 2.6.9-67.EL contains as part of
+       if ((msg->type == XS_TRANSACTION_END) ||
+           ((req_msg.type == XS_TRANSACTION_START) &&
+            (msg->type == XS_ERROR)))
+               up_read(&xs_state.suspend_mutex);
and if 9921:bbce4d115189 was applied it would contain
+       if ((req_msg.type == XS_TRANSACTION_END) ||
+           ((req_msg.type == XS_TRANSACTION_START) &&
+            (msg->type == XS_ERROR)))
+               up_read(&xs_state.suspend_mutex);

Note the first line which has changed from msg->type to req_msg.type.
Comment 7 Don Dutile 2007-12-16 22:28:44 EST
My bad; I missed the subtlety of msg->type changed to req_msg.type.

I'll post a patch for 4.7 on Monday. Thanks for the test to verify the fix.
Comment 8 Ian Campbell 2007-12-17 06:33:04 EST
Thanks, I always have to look at that particular patch twice, it's very easy to
Comment 9 Don Dutile 2007-12-17 15:14:03 EST
Well, the patch is actually part rhel5 & part rhel4.

The 'mutex_unlock' is in rhel5, but not rhel4;  rhel4 uses 'up'.

the patch applies, but with a fuzz warning;  i'll submit a clean rhel4 patch
that doesn't generate a patch warning.

Comment 10 Ian Campbell 2007-12-17 16:10:29 EST
Yes, somehow quilt still applies the patch even though the context clearly
doesn't match -- I hadn't noticed that before.
Comment 11 Bill Burns 2008-01-04 14:14:47 EST
Reopening for Don Dutile. Setting flags for 4.7.
Comment 12 Vivek Goyal 2008-03-03 15:39:47 EST
Committed in 68.16.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 16 errata-xmlrpc 2008-07-24 15:14:53 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.