Bug 1046908
Summary: | [Rebalance]:on restarting glusterd, the completed rebalance is starting again on that node | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | senaik | |
Component: | distribute | Assignee: | Susant Kumar Palai <spalai> | |
Status: | CLOSED ERRATA | QA Contact: | shylesh <shmohan> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 2.1 | CC: | htaira, nsathyan, psriniva, rgowdapp, rhs-bugs, rwheeler, shmohan, spalai, spandura, storage-qa-internal, vbellur | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.0.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.6.0.14-1.el6rhs | Doc Type: | Bug Fix | |
Doc Text: |
Previously the glusterd Management Service would not maintain the status of rebalance. As a result, after a node reboot, rebalance processes that were complete would also restart. With this fix, after a node reboot the completed rebalance processes do not restart.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1075087 1136310 (view as bug list) | Environment: | ||
Last Closed: | 2014-09-22 19:31:00 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 923774, 1075087, 1136310, 1136798 |
Description
senaik
2013-12-27 11:02:45 UTC
I am able to re-create the same issue multiple times. Case executed:- =============== 1. Create 2 x 2 distribute-replicate volume.Start the volume 2. Create fuse mount. Create files/dirs from mount point 3. Add 2 more bricks to volume to change the type to 3 x 2. 4. Start rebalance. 5. Wait for rebalance to complete. 6. restart "glusterd" on any of the storage nodes. Result:- ========= restarts "rebalance" process. Updating the bug with my findings: Root Cause: ========== Glusterd remembers the status of rebalance process (if running) for every volume in the file /var/lib/glusterd/vols/<volname>/node_state.info. Example: [root@localhost rhs]# gluster v status dis tasks Task Status of Volume dis ------------------------------------------------------------------------------ Task : Rebalance ID : b3bbbf09-d783-4e18-a81e-8f1ee846edf0 Status : completed [root@localhost rhs]# cat /var/lib/glusterd/vols/dis/node_state.info rebalance_status=1 rebalance_op=19 rebalance-id=b3bbbf09-d783-4e18-a81e-8f1ee846edf0 However, after rebalance is complete, the rebalance state is not cleaned up. And when glusterd is stopped and started again, it tries to restart all the daemons that it thinks it had spawned before being brought down. It reads (now obsolete) rebalance configuration from node_state.info and restarts the rebalance process. For a volume that is no longer undergoing rebalance in a given node, its node_state.info should look like the following: [root@localhost rhs]# gluster v status kd tasks Task Status of Volume kd ------------------------------------------------------------------------------ There are no active volume tasks [root@localhost rhs]# cat /var/lib/glusterd/vols/kd/node_state.info rebalance_status=0 rebalance_op=0 rebalance-id=00000000-0000-0000-0000-000000000000 upstream patch : http://review.gluster.org/#/c/7214/ *** Bug 996003 has been marked as a duplicate of this bug. *** verified on 3.6.0.18-1.el6rhs.x86_64 Now restarting glusterd or rebooting the machine doesn't trigger rebalance which is already completed. Hi Susant, Please review the edited doc text for technical accuracy and sign off. Yes Pavithra, the doc looks fine. *** Bug 1136798 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |