Bug 250381
Summary: | xenbus suspend_mutex remains locked after transaction failure | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Ian Campbell <ijc> | ||||
Component: | kernel-xen | Assignee: | Don Dutile (Red Hat) <ddutile> | ||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 4.5 | CC: | ddutile, xen-maint | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHSA-2008-0665 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-07-24 19:14:53 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Ian Campbell
2007-08-01 08:59:25 UTC
Created attachment 288741 [details]
xen-unstable 9921:bbce4d115189 ported to 2.6.9-67.EL
We recently stopped using the rhel4x.hg port from xenbits and switched to using
a set of targetted fixes to your kernels. I have attached the patches from our
queue relevant to this issue.
Is there a way to excite/force a transaction end failure, so a test can be applied to show the problem, and that the fix works? This bug was fixed in 4.6 (in the linux-2.6.9-xen-newfiles.patch, which included many fixes for hotplug. If I remember right you can reproduce by using xenstore-write in in a tight loop the domU. i.e. something like "while : ; do xenstore-write foo bar ; done" I checked 2.6.9-67.EL and it still has this problem. Is that not 4.6 kernel? 2.6.9-67.EL is rhel4.6. The fix that is shown in the attachment in #1 is in 4.6. So, either you didn't test 4.6, or... the fix isn't sufficient, or you built a -67 kernel without doing a "make prep", which would not apply the patch listed in comment #3 to the file (before building). Do you have the src.rpm for 4.6 to verify (from sources) that the fix provided is the one in 4.6 ? I got my source tree by installing the .src.rpm and running rpmbuild -bp on the spec file which leaves a source tree in /usr/src/redhat/BUILD/something, I am pretty certain it has the patches applied or drivers/xen/xenbus/xenbus_xs.c wouldn't even exist. linux-2.6.9-xen-newfiles.patch in 2.6.9-67.EL contains as part of drivers/xen/xenbus/xenbus_xs.c:xenbus_dev_request_and_reply(): + if ((msg->type == XS_TRANSACTION_END) || + ((req_msg.type == XS_TRANSACTION_START) && + (msg->type == XS_ERROR))) + up_read(&xs_state.suspend_mutex); and if 9921:bbce4d115189 was applied it would contain + if ((req_msg.type == XS_TRANSACTION_END) || + ((req_msg.type == XS_TRANSACTION_START) && + (msg->type == XS_ERROR))) + up_read(&xs_state.suspend_mutex); Note the first line which has changed from msg->type to req_msg.type. My bad; I missed the subtlety of msg->type changed to req_msg.type. I'll post a patch for 4.7 on Monday. Thanks for the test to verify the fix. Thanks, I always have to look at that particular patch twice, it's very easy to mis-read... Well, the patch is actually part rhel5 & part rhel4. The 'mutex_unlock' is in rhel5, but not rhel4; rhel4 uses 'up'. the patch applies, but with a fuzz warning; i'll submit a clean rhel4 patch that doesn't generate a patch warning. Yes, somehow quilt still applies the patch even though the context clearly doesn't match -- I hadn't noticed that before. Reopening for Don Dutile. Setting flags for 4.7. Committed in 68.16.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html |