Bug 1889966 - Volume status shows rebalance as not started on doing a replace-brick on dist-arbiter volume
Summary: Volume status shows rebalance as not started on doing a replace-brick on dist...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: RHGS 3.5.z Batch Update 7
Assignee: Tamar Shacked
QA Contact: Pranav Prakash
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-21 06:12 UTC by Upasana
Modified: 2021-10-05 07:56 UTC (History)
8 users (show)

Fixed In Version: glusterfs-6.0-57
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-05 07:56:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:3729 0 None None None 2021-10-05 07:56:39 UTC

Description Upasana 2020-10-21 06:12:59 UTC
Provide version-Release number of selected component (if applicable):
=====================================================================
glusterfs-server-6.0-45.el7rhgs.x86_64 (3.5.3) && the HOTFIX build glusterfs-server-3.12.2-40.el7rhgs.2.HOTFIX.sfdc02762348.BZ1884244.x86_64

 
Have you searched the Bugzilla archives for same/similar issues reported.
=========================================================================
yes



Describe the issue:(please be detailed as possible and provide log snippets)
[Provide TimeStamp when the issue is seen]
==============================================================================
Steps -
1.create a arbiter volume
2.add-brick and rebalance
3.Wait for rebalance to complete (check if it shows completed on volume status)
4.replace brick 
5.the volume status shows rebalance as not started

logs
====
[root@dhcp35-100 ~]# gluster v create arbiter-vol replica 3 arbiter 1 dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test
volume create: arbiter-vol: success: please start the volume to access data
[root@dhcp35-100 ~]# gluster v start arbiter-vol
volume start: arbiter-vol: success
[root@dhcp35-100 ~]# gluster v status arbiter-vol
Status of volume: arbiter-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       7908 
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test                            49175     0          Y       29058
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       18812
Self-heal Daemon on localhost               N/A       N/A        Y       3306 
Self-heal Daemon on 10.70.35.31             N/A       N/A        Y       29201
Self-heal Daemon on 10.70.35.104            N/A       N/A        Y       18969
Self-heal Daemon on dhcp35-202.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       7979 
 
Task Status of Volume arbiter-vol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-100 ~]# gluster v add-brick arbiter-vol dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-add dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test-add dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test-add
volume add-brick: success
[root@dhcp35-100 ~]# gluster v rebalance arbiter-vol start 
volume rebalance: arbiter-vol: success: Rebalance on arbiter-vol has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: ad76c0ac-1914-4df4-99d0-5bcb273beb65
[root@dhcp35-100 ~]# gluster v rebalance arbiter-vol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
       dhcp35-202.lab.eng.blr.redhat.com                0        0Bytes             0             0             0            completed        0:00:01
                             10.70.35.31                0        0Bytes             0             0             0            completed        0:00:01
                            10.70.35.104                0        0Bytes             0             0             0            completed        0:00:01
volume rebalance: arbiter-vol: success
[root@dhcp35-100 ~]# 
[root@dhcp35-100 ~]# 
[root@dhcp35-100 ~]# 
[root@dhcp35-100 ~]# 
[root@dhcp35-100 ~]# gluster v status arbiter-vol
Status of volume: arbiter-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       7908 
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test                            49175     0          Y       29058
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       18812
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test-add                       49176     0          Y       10504
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test-add                        49176     0          Y       31636
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test-add                       49176     0          Y       21198
Self-heal Daemon on localhost               N/A       N/A        Y       5664 
Self-heal Daemon on 10.70.35.104            N/A       N/A        Y       21227
Self-heal Daemon on 10.70.35.31             N/A       N/A        Y       31657
Self-heal Daemon on dhcp35-202.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       10540
 
Task Status of Volume arbiter-vol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : ad76c0ac-1914-4df4-99d0-5bcb273beb65
Status               : completed           
 
[root@dhcp35-100 ~]# gluster v replace-brick arbiter-vol dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-replace commit force
volume replace-brick: success: replace-brick commit force operation successful
[root@dhcp35-100 ~]# gluster v status arbiter-vol
Status of volume: arbiter-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test-replace                   49175     0          Y       12915
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test                            49175     0          Y       29058
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test                           49175     0          Y       18812
Brick dhcp35-202.lab.eng.blr.redhat.com:/gl
uster/brick1/test-add                       49176     0          Y       10504
Brick dhcp35-31.lab.eng.blr.redhat.com:/glu
ster/brick1/test-add                        49176     0          Y       31636
Brick dhcp35-104.lab.eng.blr.redhat.com:/gl
uster/brick1/test-add                       49176     0          Y       21198
Self-heal Daemon on localhost               N/A       N/A        Y       7738 
Self-heal Daemon on 10.70.35.31             N/A       N/A        Y       1301 
Self-heal Daemon on 10.70.35.104            N/A       N/A        Y       23385
Self-heal Daemon on dhcp35-202.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       12926
 
Task Status of Volume arbiter-vol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : ad76c0ac-1914-4df4-99d0-5bcb273beb65
Status               : not started        




Is this issue reproducible? If yes, share more details.:
-------------------------------------------------------------
yes


Actual results:
===============
The volume status shows rebalance as not started 
 
Expected results:
=================
The volume status should not show rebalance as not started 
 
[root@dhcp35-100 ~]# gluster v heal arbiter-vol info
Brick dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-replace
Status: Connected
Number of entries: 0

Brick dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test
Status: Connected
Number of entries: 0

Brick dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test
Status: Connected
Number of entries: 0

Brick dhcp35-202.lab.eng.blr.redhat.com:/gluster/brick1/test-add
Status: Connected
Number of entries: 0

Brick dhcp35-31.lab.eng.blr.redhat.com:/gluster/brick1/test-add
Status: Connected
Number of entries: 0

Brick dhcp35-104.lab.eng.blr.redhat.com:/gluster/brick1/test-add
Status: Connected
Number of entries: 0



The issue is easily reproducible with or without IO's

Comment 17 SATHEESARAN 2021-08-25 10:15:51 UTC
Verified with glusterfs-6.0-59.el8rhgs with the following steps:

1. Created 3 node trusted storage pool ( gluster cluster ) with RHEL 8.4 platform ( layered installation )
2. Created a 1x3 replicate volume and started the volume
3. Fuse mounted the volume and created few files ( around 100 files, each of size 15MB )
4. Expanded the volume in to 2x3
5. Triggered rebalance. 
After the above step, rebalance completed successfully.
6. Verified the rebalance status with the 'gluster volume status' output
7. Performed the 'replace brick' step to replace one of the faulty brick with another brick.
Now post this 'replace brick' step, 'gluster volume status' output changed the status of rebalance as 'reset by replace-brick'

<snip>
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.129:/gluster/brick1/newbrick 49154     0          Y       2380 
Brick 10.70.35.165:/gluster/brick1/b1       49152     0          Y       1778 
Brick 10.70.35.206:/gluster/brick1/b1       49152     0          Y       1805 
Brick 10.70.35.129:/gluster/brick2/b2       49153     0          Y       2211 
Brick 10.70.35.165:/gluster/brick2/b2       49153     0          Y       1879 
Brick 10.70.35.206:/gluster/brick2/b2       49153     0          Y       1905 
Self-heal Daemon on localhost               N/A       N/A        Y       2387 
Self-heal Daemon on 10.70.35.165            N/A       N/A        Y       1975 
Self-heal Daemon on 10.70.35.206            N/A       N/A        Y       1999 
 
Task Status of Volume repvol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : None                
Status               : reset due to replace-brick
</snip>

8. Now to cover other scenario of 'reset brick'. Expand the same volume in to 3x3 by adding 3 more bricks.
9. Trigger rebalance and wait for rebalance to complete, and 'gluster volume status' to report rebalance
status as 'completed'
10. Reset one of the brick in this 3x3 volume
# gluster volume reset-brick <vol> <old_brick> start
# gluster volume reset-brick <vol> <old_brick> <same_brick> commit

After this checking with 'gluster volume status' again provides the rebalance status as 'reset by reset-brick'
<snip>
[root@ ]# gluster volume reset-brick repvol 10.70.35.129:/gluster/brick2/b2 10.70.35.129:/gluster/brick2/b2 commit
volume reset-brick: success: reset-brick commit operation successful

[root@ ]# gluster v status
Status of volume: repvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.129:/gluster/brick1/newbrick 49154     0          Y       2380 
Brick 10.70.35.165:/gluster/brick1/b1       49152     0          Y       1778 
Brick 10.70.35.206:/gluster/brick1/b1       49152     0          Y       1805 
Brick 10.70.35.129:/gluster/brick2/b2       49153     0          Y       2667 
Brick 10.70.35.165:/gluster/brick2/b2       49153     0          Y       1879 
Brick 10.70.35.206:/gluster/brick2/b2       49153     0          Y       1905 
Brick 10.70.35.129:/gluster/brick1/b3       49152     0          Y       2480 
Brick 10.70.35.165:/gluster/brick2/B3       49154     0          Y       2015 
Brick 10.70.35.206:/gluster/brick2/b3       49154     0          Y       2039 
Self-heal Daemon on localhost               N/A       N/A        Y       2674 
Self-heal Daemon on 10.70.35.206            N/A       N/A        Y       2140 
Self-heal Daemon on 10.70.35.165            N/A       N/A        Y       2112 
 
Task Status of Volume repvol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : None                
Status               : reset due to reset-brick
</snip>

11. Restarting gluster or nodes, and restarting all the nodes retains the status.
12. Repeated the test with distributed-arbitrated replicate volume with the same steps as above
and the results are successful as expected


Based on the above observations verifying this bug

Comment 19 errata-xmlrpc 2021-10-05 07:56:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHGS 3.5.z Batch Update 5 glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3729


Note You need to log in before you can comment on or make changes to this bug.