This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 456649 - xenbus suspend_mutex remains locked after transaction failure
xenbus suspend_mutex remains locked after transaction failure
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen (Show other bugs)
4.7
All Linux
low Severity low
: rc
: ---
Assigned To: Andrew Jones
Virtualization Bugs
:
Depends On:
Blocks: 458302
  Show dependency treegraph
 
Reported: 2008-07-25 05:52 EDT by Ian Campbell
Modified: 2011-02-16 11:03 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-02-16 11:03:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ian Campbell 2008-07-25 05:52:10 EDT
Unfortunately the fix for #250381 reverted in 2.6.9-78.EL by
linux-2.6.9-xen-modifications-to-drivers-xen-files-for-pv-on-h.patch which is
patch 12300 in the spec file.

linux-2.6.9-xen-xenbus-suspend_mutex-remains-locked-after-trans.patch is 12089.

I was unable to reopen #250381 and bugzilla advised me to make a clone.

+++ This bug was initially created as a clone of Bug #250381 +++

If a xenbus transaction end command fails it is possible for the suspend_mutex
to remain locked preventing any further xenbus traffic. e.g.
shutdown/reboot/suspend requests/notifications etc.

Kernel 2.6.9-55.0.2.EL is affected.

Upstream fix is
http://hg.uk.xensource.com/xen-unstable.hg/?cs=bbce4d115189

-- Additional comment from ijc@hellion.org.uk on 2007-12-14 04:00 EST --
Created an attachment (id=288741)
xen-unstable 9921:bbce4d115189 ported to 2.6.9-67.EL

We recently stopped using the rhel4x.hg port from xenbits and switched to using
a set of targetted fixes to your kernels. I have attached the patches from our
queue relevant to this issue.

-- Additional comment from ddutile@redhat.com on 2007-12-14 14:37 EST --
Is there a way to excite/force a transaction end failure, so
a test can be applied to show the problem, and that the fix works?


-- Additional comment from ddutile@redhat.com on 2007-12-14 14:39 EST --
This bug was fixed in 4.6 (in the linux-2.6.9-xen-newfiles.patch,
which included many fixes for hotplug.


-- Additional comment from ijc@hellion.org.uk on 2007-12-14 14:49 EST --
If I remember right you can reproduce by using xenstore-write in in a tight loop
the domU. i.e. something like "while : ; do xenstore-write foo bar ; done"

I checked 2.6.9-67.EL and it still has this problem. Is that not 4.6 kernel?

-- Additional comment from ddutile@redhat.com on 2007-12-14 15:00 EST --

2.6.9-67.EL is rhel4.6.

The fix that is shown in the attachment in #1 is in 4.6.

So, either you didn't test 4.6, or... the fix isn't sufficient,
or you built a -67 kernel without doing a "make prep", which
would not apply the patch listed in comment #3 to the file (before building).

Do you have the src.rpm for 4.6 to verify (from sources) that
the fix provided is the one in 4.6 ?




-- Additional comment from ijc@hellion.org.uk on 2007-12-15 04:41 EST --
I got my source tree by installing the .src.rpm and running rpmbuild -bp on the
spec file which leaves a source tree in /usr/src/redhat/BUILD/something, I am
pretty certain it has the patches applied or drivers/xen/xenbus/xenbus_xs.c
wouldn't even exist.

linux-2.6.9-xen-newfiles.patch in 2.6.9-67.EL contains as part of
drivers/xen/xenbus/xenbus_xs.c:xenbus_dev_request_and_reply():
+       if ((msg->type == XS_TRANSACTION_END) ||
+           ((req_msg.type == XS_TRANSACTION_START) &&
+            (msg->type == XS_ERROR)))
+               up_read(&xs_state.suspend_mutex);
and if 9921:bbce4d115189 was applied it would contain
+       if ((req_msg.type == XS_TRANSACTION_END) ||
+           ((req_msg.type == XS_TRANSACTION_START) &&
+            (msg->type == XS_ERROR)))
+               up_read(&xs_state.suspend_mutex);

Note the first line which has changed from msg->type to req_msg.type.

-- Additional comment from ddutile@redhat.com on 2007-12-16 22:28 EST --
My bad; I missed the subtlety of msg->type changed to req_msg.type.

I'll post a patch for 4.7 on Monday. Thanks for the test to verify the fix.


-- Additional comment from ijc@hellion.org.uk on 2007-12-17 06:33 EST --
Thanks, I always have to look at that particular patch twice, it's very easy to
mis-read...

-- Additional comment from ddutile@redhat.com on 2007-12-17 15:14 EST --
Well, the patch is actually part rhel5 & part rhel4.

The 'mutex_unlock' is in rhel5, but not rhel4;  rhel4 uses 'up'.

the patch applies, but with a fuzz warning;  i'll submit a clean rhel4 patch
that doesn't generate a patch warning.



-- Additional comment from ijc@hellion.org.uk on 2007-12-17 16:10 EST --
Yes, somehow quilt still applies the patch even though the context clearly
doesn't match -- I hadn't noticed that before.

-- Additional comment from bburns@redhat.com on 2008-01-04 14:14 EST --
Reopening for Don Dutile. Setting flags for 4.7.

-- Additional comment from vgoyal@redhat.com on 2008-03-03 15:39 EST --
Committed in 68.16.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

-- Additional comment from errata-xmlrpc@redhat.com on 2008-07-24 15:14 EST --
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html
Comment 2 Don Dutile 2008-07-25 12:34:37 EDT
Ian,

I ran the above test loop:
     while : ; do xenstore-write foo bar ; done

in one dom0 window, and in another dom0, ran an infinite save/restore loop on
the domU.  I could not cause the save/restore to fail/hang/stop, which is what I
would expect if xenbus transaction processing was hung due to suspend_mutex
remaining locked.

Is there some other test you can recommend ?
Without a valid regression test/cause-effect, acking the patch will be tough to
do (in 4.8).

- Don
Comment 3 Ian Campbell 2009-01-28 05:54:37 EST
I've just noticed the old needinfo on this bug. I could have sworn I responded at the time but I must have written it and not hit send/submit or something.

My memory of this bug is very fuzzy but I think you need to run the while ... xenstore-write... loop in a domU which is being repeatedly suspended and resumed, rather than running it in the dom0 as you were doing (having a loop in both dom0 and domU can't hurt I suppose...)
Comment 5 Andrew Jones 2009-07-01 14:24:49 EDT
This is a difficult bug to recreate, but the proposed patch has been integrated into a test build at http://people.redhat.com/drjones/virttest/1-2/. The build is available for anyone who has seen the bug and would like to test the patch to see if it goes away.

Also note that the link in the description pointing to the upstream patch is out of date, you can find it here now http://xenbits.xensource.com/xen-unstable.hg?rev/bbce4d115189
Comment 7 RHEL Product and Program Management 2010-10-12 13:51:24 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 8 Vivek Goyal 2010-10-13 12:11:29 EDT
Committed in 89.42.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 10 Jinxin Zheng 2011-01-10 05:07:19 EST
Confirmed the patch is in -94.EL.

Never reproduced this. There were a few rhel4 patches that are just integrated back then this looked safe and we got runtime with them by them being integrated. I guess sanity checking is the best we can do.
Comment 11 errata-xmlrpc 2011-02-16 11:03:34 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html

Note You need to log in before you can comment on or make changes to this bug.