Bug 1202328
| Summary: | CTDB:Three nodes out of 4 node CTDB cluster remains in UNHEALTHY state when these three nodes are rebooted | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | surabhi <sbhaloth> |
| Component: | ctdb | Assignee: | Anoop C S <anoopcs> |
| Status: | CLOSED DUPLICATE | QA Contact: | Vivek Das <vdas> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.0 | CC: | anoopcs, bkunal, gdeschner, madam, nlevinki, rhs-smb, sheggodu, wenshi |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | gluster | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-10 10:05:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1408949 | ||
|
Description
surabhi
2015-03-16 12:11:59 UTC
Reproducing the issue again with log level DEBUG then will update the sosreports. When rebooting one node in 4 node ctdb cluster , one of the node remains in unhealthy state because the /gluster/lock mount doesn't happen on this node. The errors from the logs: 2015/05/04 02:53:09.889335 [set_recmode: 9120]: ERROR: recovery lock file /gluster/lock/lockfile not locked when recovering! 2015/05/04 02:53:09.900173 [ 3109]: Freeze priority 1 2015/05/04 02:53:09.901113 [ 3109]: Freeze priority 2 2015/05/04 02:53:09.901862 [ 3109]: Freeze priority 3 2015/05/04 02:53:11.034595 [ 3109]: Thawing priority 1 2015/05/04 02:53:11.034631 [ 3109]: Release freeze handler for prio 1 2015/05/04 02:53:11.034667 [ 3109]: Thawing priority 2 2015/05/04 02:53:11.034678 [ 3109]: Release freeze handler for prio 2 2015/05/04 02:53:11.034697 [ 3109]: Thawing priority 3 2015/05/04 02:53:11.034708 [ 3109]: Release freeze handler for prio 3 2015/05/04 02:53:11.035309 [set_recmode: 9182]: ERROR: recovery lock file /gluster/lock/lockfile not locked when recovering! The snippet from gluster logs: [2015-05-04 06:52:55.385969] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo] 0-: Unable to get transaction opinfo for transaction ID : 5ceff865-23a8-48fb-b13e-d2252ee5d0f4 [2015-05-04 06:52:55.387695] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo] 0-: Unable to get transaction opinfo for transaction ID : 486078dd-0eab-4223-b80e-59099da74ec2 [2015-05-04 06:52:55.420389] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo] 0-: Unable to get transaction opinfo for transaction ID : 5f495b59-54f2-4949-b1ee-5fd0a324d582 [2015-05-04 06:52:55.423631] W [glusterd-op-sm.c:3975:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed [2015-05-04 06:52:55.425004] I [glusterd-handler.c:3841:__glusterd_handle_status_volume] 0-management: Received status volume req for volume ctdb [2015-05-04 06:52:55.427132] W [glusterd-locks.c:547:glusterd_mgmt_v3_lock] 0-management: Lock for ctdb held by 0e6d1fb9-e34e-4733-bb87-734dd920080b [2015-05-04 06:52:55.427151] E [glusterd-op-sm.c:3054:glusterd_op_ac_lock] 0-management: Unable to acquire lock for ctdb [2015-05-04 06:52:55.427178] E [glusterd-op-sm.c:6539:glusterd_op_sm] 0-management: handler returned: -1 [2015-05-04 06:52:55.429000] E [glusterd-syncop.c:86:gd_mgmt_v3_collate_errors] 0-: Locking failed on 10.16.157.78. Please check log file for details. [2015-05-04 06:52:55.429042] W [glusterd-locks.c:641:glusterd_mgmt_v3_unlock] 0-management: Lock owner mismatch. Lock for vol ctdb held by 0e6d1fb9-e34e-4733-bb87-734dd920080b [2015-05-04 06:52:55.429056] E [glusterd-op-sm.c:3102:glusterd_op_ac_unlock] 0-management: Unable to release lock for ctdb [2015-05-04 06:52:55.429084] E [glusterd-op-sm.c:6539:glusterd_op_sm] 0-management: handler returned: 1 [2015-05-04 06:52:55.429104] E [glusterd-syncop.c:1724:gd_sync_task_begin] 0-management: Locking Peers Failed. [2015-05-04 06:52:55.430345] E [glusterd-syncop.c:86:gd_mgmt_v3_collate_errors] 0-: Unlocking failed on 10.16.157.78. Please check log file for details. Version-Release number of selected component (if applicable): rpm -qa | grep ctdb ctdb2.5-2.5.4-1.el6rhs.x86_64 glusterfs-3.6.0.53-1.el6rhs.x86_64 How reproducible: Inconsistent. 1/5 Steps to Reproduce: 1. Create a 4 node ctdb cluster 2. Reboot one node to verify failover 3. check ctdb status , ctdb ip Actual results: The node that was rebooted remains in unhealthy state because the /gluster/lock is not mounted once the node comes up. Expected results: Once the node comes up it should come back to healthy state and all /gluster/lock should be mounted . Additional info: After doing force start of the ctdb volume ,the mount happened and the node becomes healthy. Setting needinfo of Michael to get reply for comment #6 Please see https://bugzilla.redhat.com/show_bug.cgi?id=1177603#c10 for an explanation along the lines of this bug. *** This bug has been marked as a duplicate of bug 1177603 *** |