Bug 1889966

Summary:	Volume status shows rebalance as not started on doing a replace-brick on dist-arbiter volume
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Upasana <ubansal>
Component:	distribute	Assignee:	Tamar Shacked <tshacked>
Status:	CLOSED ERRATA	QA Contact:	Pranav Prakash <prprakas>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rhgs-3.5	CC:	aspandey, pprakash, rhs-bugs, sajmoham, sasundar, sheggodu, tshacked, vdas
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.5.z Batch Update 7
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-6.0-57	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-10-05 07:56:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Upasana 2020-10-21 06:12:59 UTC

Provide version-Release number of selected component (if applicable):
=====================================================================
glusterfs-server-6.0-45.el7rhgs.x86_64 (3.5.3) && the HOTFIX build glusterfs-server-3.12.2-40.el7rhgs.2.HOTFIX.sfdc02762348.BZ1884244.x86_64

 
Have you searched the Bugzilla archives for same/similar issues reported.
=========================================================================
yes



Describe the issue:(please be detailed as possible and provide log snippets)
[Provide TimeStamp when the issue is seen]
==============================================================================
Steps -
1.create a arbiter volume
2.add-brick and rebalance
3.Wait for rebalance to complete (check if it shows completed on volume status)
4.replace brick 
5.the volume status shows rebalance as not started

logs
====
[root@dhcp35-100 ~]# gluster v create arbiter-vol replica 3 arbiter 1 dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test
volume create: arbiter-vol: success: please start the volume to access data
[root@dhcp35-100 ~]# gluster v start arbiter-vol
volume start: arbiter-vol: success
[root@dhcp35-100 ~]# gluster v status arbiter-vol
Status of volume: arbiter-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       7908 
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test                            49175     0          Y       29058
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       18812
Self-heal Daemon on localhost               N/A       N/A        Y       3306 
Self-heal Daemon on 10.70.35.31             N/A       N/A        Y       29201
Self-heal Daemon on 10.70.35.104            N/A       N/A        Y       18969
Self-heal Daemon on dhcp35-202.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       7979 
 
Task Status of Volume arbiter-vol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-100 ~]# gluster v add-brick arbiter-vol dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-add dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test-add dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test-add
volume add-brick: success
[root@dhcp35-100 ~]# gluster v rebalance arbiter-vol start 
volume rebalance: arbiter-vol: success: Rebalance on arbiter-vol has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: ad76c0ac-1914-4df4-99d0-5bcb273beb65
[root@dhcp35-100 ~]# gluster v rebalance arbiter-vol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
       dhcp35-202.lab.eng.blr.redhat.com                0        0Bytes             0             0             0            completed        0:00:01
                             10.70.35.31                0        0Bytes             0             0             0            completed        0:00:01
                            10.70.35.104                0        0Bytes             0             0             0            completed        0:00:01
volume rebalance: arbiter-vol: success
[root@dhcp35-100 ~]# 
[root@dhcp35-100 ~]# 
[root@dhcp35-100 ~]# 
[root@dhcp35-100 ~]# 
[root@dhcp35-100 ~]# gluster v status arbiter-vol
Status of volume: arbiter-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       7908 
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test                            49175     0          Y       29058
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       18812
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test-add                       49176     0          Y       10504
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test-add                        49176     0          Y       31636
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test-add                       49176     0          Y       21198
Self-heal Daemon on localhost               N/A       N/A        Y       5664 
Self-heal Daemon on 10.70.35.104            N/A       N/A        Y       21227
Self-heal Daemon on 10.70.35.31             N/A       N/A        Y       31657
Self-heal Daemon on dhcp35-202.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       10540
 
Task Status of Volume arbiter-vol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : ad76c0ac-1914-4df4-99d0-5bcb273beb65
Status               : completed           
 
[root@dhcp35-100 ~]# gluster v replace-brick arbiter-vol dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-replace commit force
volume replace-brick: success: replace-brick commit force operation successful
[root@dhcp35-100 ~]# gluster v status arbiter-vol
Status of volume: arbiter-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test-replace                   49175     0          Y       12915
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test                            49175     0          Y       29058
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       18812
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test-add                       49176     0          Y       10504
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test-add                        49176     0          Y       31636
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test-add                       49176     0          Y       21198
Self-heal Daemon on localhost               N/A       N/A        Y       7738 
Self-heal Daemon on 10.70.35.31             N/A       N/A        Y       1301 
Self-heal Daemon on 10.70.35.104            N/A       N/A        Y       23385
Self-heal Daemon on dhcp35-202.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       12926
 
Task Status of Volume arbiter-vol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : ad76c0ac-1914-4df4-99d0-5bcb273beb65
Status               : not started        




Is this issue reproducible? If yes, share more details.:
-------------------------------------------------------------
yes


Actual results:
===============
The volume status shows rebalance as not started 
 
Expected results:
=================
The volume status should not show rebalance as not started 
 
[root@dhcp35-100 ~]# gluster v heal arbiter-vol info
Brick dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-replace
Status: Connected
Number of entries: 0

Brick dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test
Status: Connected
Number of entries: 0

Brick dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test
Status: Connected
Number of entries: 0

Brick dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-add
Status: Connected
Number of entries: 0

Brick dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test-add
Status: Connected
Number of entries: 0

Brick dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test-add
Status: Connected
Number of entries: 0



The issue is easily reproducible with or without IO's

Comment 17 SATHEESARAN 2021-08-25 10:15:51 UTC

Verified with glusterfs-6.0-59.el8rhgs with the following steps:

1. Created 3 node trusted storage pool ( gluster cluster ) with RHEL 8.4 platform ( layered installation )
2. Created a 1x3 replicate volume and started the volume
3. Fuse mounted the volume and created few files ( around 100 files, each of size 15MB )
4. Expanded the volume in to 2x3
5. Triggered rebalance. 
After the above step, rebalance completed successfully.
6. Verified the rebalance status with the 'gluster volume status' output
7. Performed the 'replace brick' step to replace one of the faulty brick with another brick.
Now post this 'replace brick' step, 'gluster volume status' output changed the status of rebalance as 'reset by replace-brick'

<snip>
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.129:/gluster/brick1/newbrick 49154     0          Y       2380 
Brick 10.70.35.165:/gluster/brick1/b1       49152     0          Y       1778 
Brick 10.70.35.206:/gluster/brick1/b1       49152     0          Y       1805 
Brick 10.70.35.129:/gluster/brick2/b2       49153     0          Y       2211 
Brick 10.70.35.165:/gluster/brick2/b2       49153     0          Y       1879 
Brick 10.70.35.206:/gluster/brick2/b2       49153     0          Y       1905 
Self-heal Daemon on localhost               N/A       N/A        Y       2387 
Self-heal Daemon on 10.70.35.165            N/A       N/A        Y       1975 
Self-heal Daemon on 10.70.35.206            N/A       N/A        Y       1999 
 
Task Status of Volume repvol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : None                
Status               : reset due to replace-brick
</snip>

8. Now to cover other scenario of 'reset brick'. Expand the same volume in to 3x3 by adding 3 more bricks.
9. Trigger rebalance and wait for rebalance to complete, and 'gluster volume status' to report rebalance
status as 'completed'
10. Reset one of the brick in this 3x3 volume
# gluster volume reset-brick <vol> <old_brick> start
# gluster volume reset-brick <vol> <old_brick> <same_brick> commit

After this checking with 'gluster volume status' again provides the rebalance status as 'reset by reset-brick'
<snip>
[root@ ]# gluster volume reset-brick repvol 10.70.35.129:/gluster/brick2/b2 10.70.35.129:/gluster/brick2/b2 commit
volume reset-brick: success: reset-brick commit operation successful

[root@ ]# gluster v status
Status of volume: repvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.129:/gluster/brick1/newbrick 49154     0          Y       2380 
Brick 10.70.35.165:/gluster/brick1/b1       49152     0          Y       1778 
Brick 10.70.35.206:/gluster/brick1/b1       49152     0          Y       1805 
Brick 10.70.35.129:/gluster/brick2/b2       49153     0          Y       2667 
Brick 10.70.35.165:/gluster/brick2/b2       49153     0          Y       1879 
Brick 10.70.35.206:/gluster/brick2/b2       49153     0          Y       1905 
Brick 10.70.35.129:/gluster/brick1/b3       49152     0          Y       2480 
Brick 10.70.35.165:/gluster/brick2/B3       49154     0          Y       2015 
Brick 10.70.35.206:/gluster/brick2/b3       49154     0          Y       2039 
Self-heal Daemon on localhost               N/A       N/A        Y       2674 
Self-heal Daemon on 10.70.35.206            N/A       N/A        Y       2140 
Self-heal Daemon on 10.70.35.165            N/A       N/A        Y       2112 
 
Task Status of Volume repvol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : None                
Status               : reset due to reset-brick
</snip>

11. Restarting gluster or nodes, and restarting all the nodes retains the status.
12. Repeated the test with distributed-arbitrated replicate volume with the same steps as above
and the results are successful as expected


Based on the above observations verifying this bug

Comment 19 errata-xmlrpc 2021-10-05 07:56:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHGS 3.5.z Batch Update 5 glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3729