+++ This bug was initially created as a clone of Bug #1123732 +++ Description of problem: ************************************************** Created a 2x2 dis-rep volume.Mount it via cifs and create few directories and files on the mount point.Did volume set operation required for samba shares to be mounted via cifs.restarted glusterd on all the nodes and checked volume status. After executing volume status it shows following errors in the logs: *********************************************** volume req for volume newafr [2014-07-28 05:50:50.986840] E [glusterd-utils.c:10038:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (1) and remote tasks count (0) do not match. Not aggregating tasks status. [2014-07-28 05:50:50.986893] E [glusterd-syncop.c:1014:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2014-07-28 05:50:50.987082] E [glusterd-utils.c:10038:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (1) and remote tasks count (0) do not match. Not aggregating tasks status. [2014-07-28 05:50:50.987106] E [glusterd-syncop.c:1014:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick How reproducible: tried once. Steps to Reproduce: 1.create a 2x2 dis-rep volume 2.Mount it via cifs 3.create few directories/files on the mount point. 4.Run arequal checksum. 5.do volume set operation on the volume which is mounted. 6.Service glusterd restart 7.execute gluster vol status 8.Check the volume logs. Actual results: ************************************* the logs shows following error : volume req for volume newafr [2014-07-28 05:50:50.986840] E [glusterd-utils.c:10038:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (1) and remote tasks count (0) do not match. Not aggregating tasks status. [2014-07-28 05:50:50.986893] E [glusterd-syncop.c:1014:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2014-07-28 05:50:50.987082] E [glusterd-utils.c:10038:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (1) and remote tasks count (0) do not match. Not aggregating tasks status. [2014-07-28 05:50:50.987106] E [glusterd-syncop.c:1014:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick Expected results: There should not be such errors on execution of gluster vol status Additional info: ********************************* Volume Name: newafr Type: Distributed-Replicate Volume ID: bd60f186-4bb0-49fa-bdd8-521e07e1b728 Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: srv1:/rhs/brick1/newafr/b1 Brick2: srv2:/rhs/brick1/newafr/b2 Brick3: srv1:/rhs/brick1/newafr/b3 Brick4: srv2:/rhs/brick1/newafr/b4 Options Reconfigured: performance.readdir-ahead: on storage.batch-fsync-delay-usec: 0 server.allow-insecure: on performance.stat-prefetch: off auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 [root@srv2 glusterfs]# gluster vol status newafr Status of volume: newafr Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick srv1:/rhs/brick1/newafr/b1 49167 Y 2829 Brick srv2:/rhs/brick1/newafr/b2 49165 Y 5690 Brick srv1:/rhs/brick1/newafr/b3 49168 Y 2834 Brick srv2:/rhs/brick1/newafr/b4 49166 Y 5746 NFS Server on localhost 2049 Y 24418 Self-heal Daemon on localhost N/A Y 24425 NFS Server on srv3 2049 Y 21568 Self-heal Daemon on srv3 N/A Y 21575 NFS Server on srv4 2049 Y 16899 Self-heal Daemon on srv4 N/A Y 16906 NFS Server on srv1 2049 Y 17658 Self-heal Daemon on srv1 N/A Y 17665 --- Additional comment from Atin Mukherjee on 2014-07-28 16:36:19 IST --- Surabhi, Can you please attach the sosreports of all the nodes? Have you executed remove-brick/rebalance or replace-brick in between as this mismatch can be seen when u execute any of these operations. --Atin --- Additional comment from surabhi on 2014-07-28 18:06:48 IST --- For this particular test when these errors were observed ,remove-brick and rebalance is not been executed but there were several tests executed before which included remove-brick/rebalance operation. --- Additional comment from Kaushal on 2014-10-28 13:01:35 IST --- This issue is caused by peers not participating in the rebalance not storing the rebalance task. When a rebalance task is started, the task details are stored in the node_state.info file. But this store was being performed only on nodes on which rebalance process is started. On the non-participating nodes, the task information would not be stored and would be only present in memory. This meant the information was lost when Glusterd is restarted, which leads to the above situation of having error logs. A simple reproducer for this is, 1. Create a 3 node cluster 2. Create a distribute volume with bricks only on 2 of the peers. 3. Start rebalance on the volume. 4. Restart the 3rd peer. 5. Run 'volume status' from either of the first 2 peers. This is not really a serious issue as it doesn't affect any operations. But I will fix it.
REVIEW: http://review.gluster.org/8998 (glusterd: Store rebalance state on all peers) posted (#1) for review on master by Kaushal M (kaushal)
COMMIT: http://review.gluster.org/8998 committed in master by Krishnan Parthasarathi (kparthas) ------ commit 96e1c33b681b34124bdc78174a21865623c9795b Author: Kaushal M <kaushal> Date: Tue Oct 28 13:06:50 2014 +0530 glusterd: Store rebalance state on all peers The rebalance state was being saved only on the peers participating in the rebalance on a rebalance start. This change makes sure all nodes save the rebalance state. Change-Id: I436e5c34bcfb88f7da7378cec807328ce32397bc BUG: 1157979 Signed-off-by: Kaushal M <kaushal> Reviewed-on: http://review.gluster.org/8998 Reviewed-by: Atin Mukherjee <amukherj> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krishnan Parthasarathi <kparthas> Tested-by: Krishnan Parthasarathi <kparthas>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user