Bug 286821
Summary: | GFS2 lock_dlm1 deadlock itself | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Wendy Cheng <nobody+wcheng> | ||||||||||
Component: | kernel | Assignee: | Don Zickus <dzickus> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 5.1 | CC: | lwang, rkenna | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2007-11-07 20:04:00 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Wendy Cheng
2007-09-11 20:08:13 UTC
Created attachment 192951 [details]
node B glock dump
Created attachment 192961 [details]
Node A glock dump
Will try to move this piece of logic into gfs2_glockd daemon. This could end up hanging the whole cluster due to the blocked lock_dlm1. Created attachment 193211 [details]
RHEL5 patch
Also add into overnight dd_io run to make sure it doesn't break anything else.
I don't think this patch is quite right... the newly added "from_deamon" argument to the function seems to be a duplicate with the "remote" argument, and due to this the branch: + if (from_daemon) { + gfs2_glock_schedule_for_reclaim(gl); + spin_unlock(&gl->gl_spin); + } else { will always be executed, so that the other branch is now never used. If this does the trick though, then I'm happy with making the substitution. Yes, you're right. It was a quick and dirty patch. I didn't pay attention to that "remote" argument :). It is a duplicate. But don't pull into git tree yet. Would like to check something else. On the other hand, apparently dd_io is happy with the patch. Last night's loop also ran thru withouut issues. Created attachment 193371 [details]
RHEL5 revised patch
Posted to rh-kernel list. Moved to POST state. in 2.6.18-48.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |