866670 – ctdb status reports more nodes then available

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 866670 - ctdb status reports more nodes then available

Summary: ctdb status reports more nodes then available

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	ctdb
Sub Component:
Version:	6.4
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Sumit Bose
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	881827
TreeView+	depends on / blocked

Reported:	2012-10-15 20:20 UTC by Mark Heslin 🎸
Modified:	2013-02-21 08:44 UTC (History)
CC List:	2 users (show)
Fixed In Version:	ctdb-1.0.114.5-3.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-21 08:44:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0337	0	normal	SHIPPED_LIVE	ctdb bug fix update	2013-02-20 20:54:08 UTC

Description Mark Heslin 🎸 2012-10-15 20:20:19 UTC

Description of problem:

  After removing a ctdb node, 'ctdb status' reports the same number of nodes
  as before the node was removed. Size however, is reported correctly.

Version-Release number of selected component (if applicable):

  # ctdb version
  CTDB version: 1.0.114.3-4.el6

How reproducible:

   Remove a node as per the steps outlined below.

Steps to Reproduce:

 1. Verify cluster status is healthy, all node up (clustat, ctdb status)

 2. On all nodes, edit the /etc/ctdb/nodes file and comment out the node
    to be removed. Do not delete the line for that node, just comment it out
    by adding a ´#´ at the beginning of the line.

 3. On one of the nodes not being removed, run ´ctdb reloadnodes´ 
    to force all nodes to reload the nodesfile. 

    Note - this will automatically stop ctdb on the node being removed.

 4. Use ´ctdb status´ on all remaining nodes and verify that the deleted node 
    no longer shows up in the list
  
Actual results:

  # ctdb status
  Number of nodes:3
  pnn:0 10.0.0.101       OK (THIS NODE)
  pnn:1 10.0.0.102       OK
  Generation:935874625
  Size:2
  hash:0 lmaster:0
  hash:1 lmaster:1
  Recovery mode:NORMAL (0)
  Recovery master:0

Expected results:

  # ctdb status
  Number of nodes:2
  pnn:0 10.0.0.101       OK (THIS NODE)
  pnn:1 10.0.0.102       OK
  Generation:935874625
  Size:2
  hash:0 lmaster:0
  hash:1 lmaster:1
  Recovery mode:NORMAL (0)
  Recovery master:0

Additional info:

I spoke to Sumit Bose and he suggested filing this bug so the change can be made to it upstream.

Configuration details:

  - 3 node cluster, all nodes running RHEL 6.3 
  - HA, RS Add-ons
  - 2 CLVM volumes - 1 lock, 1 data. All fibrechannel (HP MSA) 

# cat /etc/ctdb/nodes
10.0.0.101
10.0.0.102
#10.0.0.103

# cat /etc/ctdb/public_addresses
10.16.142.111/21 bond0
10.16.142.112/21 bond0
10.16.142.113/21 bond0
[root@smb-srv1 ~]# cat /etc/sysconfigtab/ctdb
cat: /etc/sysconfigtab/ctdb: No such file or directory

# cat /etc/sysconfig/ctdb
CTDB_DEBUGLEVEL=ERR
CTDB_NODES=/etc/ctdb/nodes
CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
CTDB_RECOVERY_LOCK=/share/ctdb/.ctdb.lock
CTDB_MANAGES_SAMBA=yes
CTDB_MANAGES_WINBIND=yes

Comment 3 Justin Payne 2013-01-24 22:01:49 UTC

Verified in ctdb-1.0.114.5-3. We now get the message "(including 1 deleted nodes)" from ctdb_status.

-bash-4.1$ for i in `seq 1 3`; do qarsh root@dash-0$i rpm -q ctdb; done
ctdb-1.0.114.5-3.el6.x86_64
ctdb-1.0.114.5-3.el6.x86_64
ctdb-1.0.114.5-3.el6.x86_64

-bash-4.1$ for i in `seq 1 3`; do qarsh root@dash-0$i clustat; done
Cluster Status for dash @ Thu Jan 24 15:45:07 2013
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 dash-01                                     1 Online, Local
 dash-02                                     2 Online
 dash-03                                     3 Online

Cluster Status for dash @ Thu Jan 24 15:45:05 2013
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 dash-01                                     1 Online
 dash-02                                     2 Online, Local
 dash-03                                     3 Online

Cluster Status for dash @ Thu Jan 24 15:45:09 2013
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 dash-01                                     1 Online
 dash-02                                     2 Online
 dash-03                                     3 Online, Local

-bash-4.1$ for i in `seq 1 3`; do qarsh root@dash-0$i ctdb status; done
Number of nodes:3
pnn:0 10.15.89.168     OK (THIS NODE)
pnn:1 10.15.89.169     OK
pnn:2 10.15.89.170     OK
Generation:1377632015
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:1
Number of nodes:3
pnn:0 10.15.89.168     OK
pnn:1 10.15.89.169     OK (THIS NODE)
pnn:2 10.15.89.170     OK
Generation:1377632015
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:1
Number of nodes:3
pnn:0 10.15.89.168     OK
pnn:1 10.15.89.169     OK
pnn:2 10.15.89.170     OK (THIS NODE)
Generation:1377632015
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:1

================ Comment out node dash-03 in nodes file ========================

[root@dash-01 ~]# ctdb reloadnodes
2013/01/24 15:48:50.552267 [14067]: Reloading nodes file on node 1
2013/01/24 15:48:50.552954 [14067]: Reloading nodes file on node 2
2013/01/24 15:48:50.553580 [14067]: Reloading nodes file on node 0

-bash-4.1$ for i in `seq 1 2`; do qarsh root@dash-0$i ctdb status; done
Number of nodes:3 (including 1 deleted nodes)
pnn:0 10.15.89.168     OK (THIS NODE)
pnn:1 10.15.89.169     OK
Generation:1884797164
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:1
Number of nodes:3 (including 1 deleted nodes)
pnn:0 10.15.89.168     OK
pnn:1 10.15.89.169     OK (THIS NODE)
Generation:1884797164
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:1

Comment 5 errata-xmlrpc 2013-02-21 08:44:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0337.html

Note You need to log in before you can comment on or make changes to this bug.