Bug 210359
Summary: | Cluster nodes hang in vgscan at reboot time | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Robert Peterson <rpeterso> | ||||||
Component: | kernel | Assignee: | David Teigland <teigland> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.0 | CC: | cluster-maint, rkenna | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 5.0.0 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2006-11-28 21:28:50 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Robert Peterson
2006-10-11 17:41:03 UTC
Created attachment 138261 [details]
console output, /proc/net/sctp/assocs and group_tool -v for all nodes
Created attachment 138262 [details]
Time-adjusted, sorted and collated cman_tool dump from all nodes
This is the output from a tool I wrote called grimoire.
Its function is to figure out all nodes in a cluster from cluster.conf,
collect daemon information from each (group_tool -dump), time-adjust
them all and collate them together. The result is a timeline of what
happened from a groupd daemon point of view.
The DLM sctp messages are the clue. Here's a patch to fix, I'll send this upstream. diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 7bcea7c..867f93d 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -548,7 +548,7 @@ static int receive_from_sock(void) } len = iov[0].iov_len + iov[1].iov_len; - r = ret = kernel_recvmsg(sctp_con.sock, &msg, iov, 1, len, + r = ret = kernel_recvmsg(sctp_con.sock, &msg, iov, msg.msg_iovlen, len, MSG_NOSIGNAL | MSG_DONTWAIT); if (ret <= 0) goto out_close; DLM kernel module change required to pass the cluster beta2 release criteria. Changing the component to dlm-kernel and rhel beta product. Devel ACK. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering. This request is not yet committed for inclusion in release. yes, this patch is in RHEL5B2. |