Bug 207197
Summary: | Cman will hang initializing the daemons if started on all nodes simultaneously | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Josef Bacik <jbacik> |
Component: | cman | Assignee: | Christine Caulfield <ccaulfie> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.0 | CC: | cfeist, cluster-maint, rkenna, teigland |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 5.0.0 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-11-28 21:31:56 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Josef Bacik
2006-09-19 21:22:58 UTC
Sep 19 15:49:18 rh5cluster1 kernel: dlm: no local IP address has been set Sep 19 15:49:18 rh5cluster1 kernel: dlm: cannot start dlm lowcomms -22 Obviously those are the key messages. if dlm_controld isn't running then that might explain why the DLM hasn't been configured - it might be that it crashed perhaps? A debug log from dlm_control would be really helpful here if you can get one. Oh, and it's also checking whether configfs is mounted. The times when I see this message, I have found that configfs hasn't mounted for some reason. I saw something that is possibly similar to #1 today, where a node was added to the DLM members list before dlm_groupd knew its IP address. DLM kicked out the error: dlm: Initiating association with node 13 dlm: no address for nodeid 13 Is it possible there's a race here? The cman event callback arriving after dlm_controld has decided to add the new node ? Devel ACK for RHEL 5.0.0 Beta 2 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering. This request is not yet committed for inclusion in release. This is slightly hacky and I can't seem to reproduce it any more. but it should fix the problem. Basically, if a lockspace contains a node that dlm_control doesn't know about then it re-reads the cman nodes list. Checking in action.c; /cvs/cluster/cluster/group/dlm_controld/action.c,v <-- action.c new revision: 1.7; previous revision: 1.6 done Checking in dlm_daemon.h; /cvs/cluster/cluster/group/dlm_controld/dlm_daemon.h,v <-- dlm_daemon.h new revision: 1.4; previous revision: 1.3 done Checking in member_cman.c; /cvs/cluster/cluster/group/dlm_controld/member_cman.c,v <-- member_cman.c new revision: 1.3; previous revision: 1.2 done Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed. |