Bug 1202328 - CTDB:Three nodes out of 4 node CTDB cluster remains in UNHEALTHY state when these three nodes are rebooted
Summary: CTDB:Three nodes out of 4 node CTDB cluster remains in UNHEALTHY state when ...
Keywords:
Status: CLOSED DUPLICATE of bug 1177603
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: ctdb
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Anoop C S
QA Contact: Vivek Das
URL:
Whiteboard: gluster
Depends On:
Blocks: 1408949
TreeView+ depends on / blocked
 
Reported: 2015-03-16 12:11 UTC by surabhi
Modified: 2020-07-16 08:32 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-10 10:05:13 UTC
Embargoed:


Attachments (Terms of Use)

Description surabhi 2015-03-16 12:11:59 UTC
Description of problem:
**********************************************
Rebooting 3 nodes out of 4 node ctdb cluster ends up in keeping the three nodes in UNHEALTHY state and on one of the node /gluster/lock is not getting mounted.


Version-Release number of selected component (if applicable):
glusterfs-3.6.0.51-1.el6rhs.x86_64
ctdb2.5-2.5.4-1.el6rhs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create 4 node ctdb cluster
2.Reboot 3 nodes out of 4 nodes 
3.verify the ctdb status , ctdb ip 

Actual results:
**************************
Three nodes remains in UNHEALTHY state and on one of the node /gluster/lock is not getting mounted.

Expected results:
**************************
All nodes should come to OK state.

Additional info:

Comment 2 surabhi 2015-03-16 12:13:34 UTC
Reproducing the issue again with log level DEBUG then will update the sosreports.

Comment 3 surabhi 2015-05-04 10:03:28 UTC
When rebooting one node in 4 node ctdb cluster , one of the node remains in unhealthy state because the /gluster/lock mount doesn't happen on this node.
The errors from the logs:


2015/05/04 02:53:09.889335 [set_recmode: 9120]: ERROR: recovery lock file /gluster/lock/lockfile not locked when recovering!
2015/05/04 02:53:09.900173 [ 3109]: Freeze priority 1
2015/05/04 02:53:09.901113 [ 3109]: Freeze priority 2
2015/05/04 02:53:09.901862 [ 3109]: Freeze priority 3
2015/05/04 02:53:11.034595 [ 3109]: Thawing priority 1
2015/05/04 02:53:11.034631 [ 3109]: Release freeze handler for prio 1
2015/05/04 02:53:11.034667 [ 3109]: Thawing priority 2
2015/05/04 02:53:11.034678 [ 3109]: Release freeze handler for prio 2
2015/05/04 02:53:11.034697 [ 3109]: Thawing priority 3
2015/05/04 02:53:11.034708 [ 3109]: Release freeze handler for prio 3
2015/05/04 02:53:11.035309 [set_recmode: 9182]: ERROR: recovery lock file /gluster/lock/lockfile not locked when recovering!


The snippet from gluster logs:

[2015-05-04 06:52:55.385969] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo] 0-: Unable to get transaction opinfo for transaction ID : 5ceff865-23a8-48fb-b13e-d2252ee5d0f4
[2015-05-04 06:52:55.387695] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo] 0-: Unable to get transaction opinfo for transaction ID : 486078dd-0eab-4223-b80e-59099da74ec2
[2015-05-04 06:52:55.420389] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo] 0-: Unable to get transaction opinfo for transaction ID : 5f495b59-54f2-4949-b1ee-5fd0a324d582
[2015-05-04 06:52:55.423631] W [glusterd-op-sm.c:3975:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed
[2015-05-04 06:52:55.425004] I [glusterd-handler.c:3841:__glusterd_handle_status_volume] 0-management: Received status volume req for volume ctdb
[2015-05-04 06:52:55.427132] W [glusterd-locks.c:547:glusterd_mgmt_v3_lock] 0-management: Lock for ctdb held by 0e6d1fb9-e34e-4733-bb87-734dd920080b
[2015-05-04 06:52:55.427151] E [glusterd-op-sm.c:3054:glusterd_op_ac_lock] 0-management: Unable to acquire lock for ctdb
[2015-05-04 06:52:55.427178] E [glusterd-op-sm.c:6539:glusterd_op_sm] 0-management: handler returned: -1
[2015-05-04 06:52:55.429000] E [glusterd-syncop.c:86:gd_mgmt_v3_collate_errors] 0-: Locking failed on 10.16.157.78. Please check log file for details.
[2015-05-04 06:52:55.429042] W [glusterd-locks.c:641:glusterd_mgmt_v3_unlock] 0-management: Lock owner mismatch. Lock for vol ctdb held by 0e6d1fb9-e34e-4733-bb87-734dd920080b
[2015-05-04 06:52:55.429056] E [glusterd-op-sm.c:3102:glusterd_op_ac_unlock] 0-management: Unable to release lock for ctdb
[2015-05-04 06:52:55.429084] E [glusterd-op-sm.c:6539:glusterd_op_sm] 0-management: handler returned: 1
[2015-05-04 06:52:55.429104] E [glusterd-syncop.c:1724:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2015-05-04 06:52:55.430345] E [glusterd-syncop.c:86:gd_mgmt_v3_collate_errors] 0-: Unlocking failed on 10.16.157.78. Please check log file for details.

Version-Release number of selected component (if applicable):
 rpm -qa | grep ctdb
ctdb2.5-2.5.4-1.el6rhs.x86_64
glusterfs-3.6.0.53-1.el6rhs.x86_64


How reproducible:
Inconsistent. 1/5 


Steps to Reproduce:
1. Create a 4 node ctdb cluster
2. Reboot one node to verify failover
3. check ctdb status , ctdb ip

Actual results:
The node that was rebooted remains in unhealthy state because the /gluster/lock is not mounted once the node comes up.


Expected results:
Once the node comes up it should come back to healthy state and all /gluster/lock should be mounted .

Additional info:

After doing force start of the ctdb volume ,the mount happened and the node becomes healthy.

Comment 8 Bipin Kunal 2017-08-01 14:00:18 UTC
Setting needinfo of Michael to get reply for comment #6

Comment 9 Anoop C S 2017-09-13 10:52:01 UTC
Please see https://bugzilla.redhat.com/show_bug.cgi?id=1177603#c10 for an explanation along the lines of this bug.

Comment 10 Michael Adam 2018-04-10 10:05:13 UTC

*** This bug has been marked as a duplicate of bug 1177603 ***


Note You need to log in before you can comment on or make changes to this bug.