Bug 182233

Summary: Last node in a cluster doesn't send "down" notification to userspace
Product: [Retired] Red Hat Cluster Suite Reporter: Christine Caulfield <ccaulfie>
Component: cmanAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: U4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-03 12:03:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 180185    
Attachments:
Description Flags
Lon's test program to demonstrate the problem none

Description Christine Caulfield 2006-02-21 09:34:22 UTC
Description of problem:

If you take all the nodes but one out of a cluster, the last node does not send
the last state change notification to userspace.
By implication, if a node dies in a two node cluster then the remaining node
doesn't get notification.

This /only/ applies to userspace applications using cman directly. NOT to kernel
applications (eg DLM) or those using the service manager API. 


Version-Release number of selected component (if applicable):


How reproducible:
terrifyingly.

Steps to Reproduce:
1. Run the attached program on one node of a two node cluster
2. Take one node down
3. Notice there is no state change notification
  
Actual results:
Nothing. The program does not notice that the node has left

Expected results:
A state change notification.

Additional info:
I'm not yet sure how badly clvmd is affected by this. Lon should be able to
judge if it affects any of his bailiwick.

The fix is trivial

Comment 1 Christine Caulfield 2006-02-21 09:34:23 UTC
Created attachment 124945 [details]
Lon's test program to demonstrate the problem

Comment 2 Christine Caulfield 2006-02-21 10:45:27 UTC
Fixed in STABLE:
Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/membership.c,v  <--  membership.c
new revision: 1.44.2.18.6.3; previous revision: 1.44.2.18.6.2
done

Fixed in RHEL4:
Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/membership.c,v  <--  membership.c
new revision: 1.44.2.21; previous revision: 1.44.2.20
done

The effect on clvmd is simply to make the first command after a transition wait
for ages until it times out. After that everything is fine because the timeout
will cause clvmd to re-read the nodes list.

Comment 3 Lon Hohberger 2006-02-21 14:50:21 UTC
Patch confirmed.