Description of problem: After removing a ctdb node, 'ctdb status' reports the same number of nodes as before the node was removed. Size however, is reported correctly. Version-Release number of selected component (if applicable): # ctdb version CTDB version: 1.0.114.3-4.el6 How reproducible: Remove a node as per the steps outlined below. Steps to Reproduce: 1. Verify cluster status is healthy, all node up (clustat, ctdb status) 2. On all nodes, edit the /etc/ctdb/nodes file and comment out the node to be removed. Do not delete the line for that node, just comment it out by adding a ´#´ at the beginning of the line. 3. On one of the nodes not being removed, run ´ctdb reloadnodes´ to force all nodes to reload the nodesfile. Note - this will automatically stop ctdb on the node being removed. 4. Use ´ctdb status´ on all remaining nodes and verify that the deleted node no longer shows up in the list Actual results: # ctdb status Number of nodes:3 pnn:0 10.0.0.101 OK (THIS NODE) pnn:1 10.0.0.102 OK Generation:935874625 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:0 Expected results: # ctdb status Number of nodes:2 pnn:0 10.0.0.101 OK (THIS NODE) pnn:1 10.0.0.102 OK Generation:935874625 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:0 Additional info: I spoke to Sumit Bose and he suggested filing this bug so the change can be made to it upstream. Configuration details: - 3 node cluster, all nodes running RHEL 6.3 - HA, RS Add-ons - 2 CLVM volumes - 1 lock, 1 data. All fibrechannel (HP MSA) # cat /etc/ctdb/nodes 10.0.0.101 10.0.0.102 #10.0.0.103 # cat /etc/ctdb/public_addresses 10.16.142.111/21 bond0 10.16.142.112/21 bond0 10.16.142.113/21 bond0 [root@smb-srv1 ~]# cat /etc/sysconfigtab/ctdb cat: /etc/sysconfigtab/ctdb: No such file or directory # cat /etc/sysconfig/ctdb CTDB_DEBUGLEVEL=ERR CTDB_NODES=/etc/ctdb/nodes CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses CTDB_RECOVERY_LOCK=/share/ctdb/.ctdb.lock CTDB_MANAGES_SAMBA=yes CTDB_MANAGES_WINBIND=yes
Verified in ctdb-1.0.114.5-3. We now get the message "(including 1 deleted nodes)" from ctdb_status. -bash-4.1$ for i in `seq 1 3`; do qarsh root@dash-0$i rpm -q ctdb; done ctdb-1.0.114.5-3.el6.x86_64 ctdb-1.0.114.5-3.el6.x86_64 ctdb-1.0.114.5-3.el6.x86_64 -bash-4.1$ for i in `seq 1 3`; do qarsh root@dash-0$i clustat; done Cluster Status for dash @ Thu Jan 24 15:45:07 2013 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ dash-01 1 Online, Local dash-02 2 Online dash-03 3 Online Cluster Status for dash @ Thu Jan 24 15:45:05 2013 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ dash-01 1 Online dash-02 2 Online, Local dash-03 3 Online Cluster Status for dash @ Thu Jan 24 15:45:09 2013 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ dash-01 1 Online dash-02 2 Online dash-03 3 Online, Local -bash-4.1$ for i in `seq 1 3`; do qarsh root@dash-0$i ctdb status; done Number of nodes:3 pnn:0 10.15.89.168 OK (THIS NODE) pnn:1 10.15.89.169 OK pnn:2 10.15.89.170 OK Generation:1377632015 Size:3 hash:0 lmaster:0 hash:1 lmaster:1 hash:2 lmaster:2 Recovery mode:NORMAL (0) Recovery master:1 Number of nodes:3 pnn:0 10.15.89.168 OK pnn:1 10.15.89.169 OK (THIS NODE) pnn:2 10.15.89.170 OK Generation:1377632015 Size:3 hash:0 lmaster:0 hash:1 lmaster:1 hash:2 lmaster:2 Recovery mode:NORMAL (0) Recovery master:1 Number of nodes:3 pnn:0 10.15.89.168 OK pnn:1 10.15.89.169 OK pnn:2 10.15.89.170 OK (THIS NODE) Generation:1377632015 Size:3 hash:0 lmaster:0 hash:1 lmaster:1 hash:2 lmaster:2 Recovery mode:NORMAL (0) Recovery master:1 ================ Comment out node dash-03 in nodes file ======================== [root@dash-01 ~]# ctdb reloadnodes 2013/01/24 15:48:50.552267 [14067]: Reloading nodes file on node 1 2013/01/24 15:48:50.552954 [14067]: Reloading nodes file on node 2 2013/01/24 15:48:50.553580 [14067]: Reloading nodes file on node 0 -bash-4.1$ for i in `seq 1 2`; do qarsh root@dash-0$i ctdb status; done Number of nodes:3 (including 1 deleted nodes) pnn:0 10.15.89.168 OK (THIS NODE) pnn:1 10.15.89.169 OK Generation:1884797164 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:1 Number of nodes:3 (including 1 deleted nodes) pnn:0 10.15.89.168 OK pnn:1 10.15.89.169 OK (THIS NODE) Generation:1884797164 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0337.html