Bug 144309
Summary: | cman_tool leave remove: not adjusting quorum for continued operation | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Derek Anderson <danderso> |
Component: | cman | Assignee: | Christine Caulfield <ccaulfie> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | cluster-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-02-11 18:11:11 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Derek Anderson
2005-01-05 20:44:43 UTC
This was fixed in a checkin on the 16th December - and works for me with current CVS. I am still seeing this with the RPMs built yesterday, Monday January 10. Missed a corner case, sorry Checking in membership.c; /cvs/cluster/cluster/cman-kernel/src/membership.c,v <-- membership.c new revision: 1.47; previous revision: 1.46 done Still doesn't appear to be working. These messages are from link-10; ran 'cman_tool leave remove' on link-11 and then link-12. 15 seconds later activity is blocked. CMAN: removing node link-12 from the cluster : Removed Feb 10 11:03:16 link-10 kernel: CMAN: Node link-12 is leaving the cluster, Removed Feb 10 11:03:16 link-10 kernel: CMAN: removing node link-12 from the cluster : Removed CMAN: removing node link-11 from the cluster : Removed Feb 10 11:03:41 link-10 kernel: CMAN: Node link-11 is leaving the cluster, Removed Feb 10 11:03:41 link-10 kernel: CMAN: removing node link-11 from the cluster : Removed CMAN: quorum lost, blocking activity Feb 10 11:03:56 link-10 kernel: CMAN: quorum lost, blocking activity Node Votes Exp Sts Name 1 1 3 M link-10 2 1 3 X link-11 3 1 3 X link-12 Protocol version: 5.0.1 Config version: 2 Cluster name: MILTON Cluster ID: 4812 Membership state: Cluster-Member Nodes: 1 Expected_votes: 3 Total_votes: 1 Quorum: 2 Activity blocked Active subsystems: 9 Node addresses: 192.168.44.160 Service Name GID LID State Code Fence Domain: "default" 1 2 run - [1] DLM Lock Space: "clvmd" 2 3 run - [1] DLM Lock Space: "data1" 3 4 run - [1] DLM Lock Space: "data2" 5 6 run - [1] GFS Mount Group: "data1" 4 5 run - [1] GFS Mount Group: "data2" 6 7 run - [1] You've got 2 GFS filesystems mounted, it shouldn't even start shutdown on that node. Did you get a failure message from cman_tool leave? <puzzled> ? Yes, I have 2 filesystems mounted on link-10. I'm not trying to shut down cman on link-10, however. The expectation is that since the other two nodes left with "remove" the last node would remain quorate and not go into "Activity Blocked" mode. Apologies, I read that message just before I left and thought it referred to a different bz. Looks like a bit more patience would have helped me when testing the previous fix too. It seems that the transition timer was being set for a single-node transition when it shouldn't have. So after the node had settled down nicely and self-quorate, the timer kicked in 15 seconds later and spoiled it all. This checkin should fix it, and it will also get rid of the duplicate leave message too. Checking in membership.c; /cvs/cluster/cluster/cman-kernel/src/membership.c,v <-- membership.c new revision: 1.60; previous revision: 1.59 done Checking in membership.c; /cvs/cluster/cluster/cman-kernel/src/membership.c,v <-- membership.c new revision: 1.44.2.9; previous revision: 1.44.2.8 done Fix verified in cman-kernel-2.6.9-18.0. |