Bug 923560 - remove-brick operation on distribute-replicate volume fails to start when specifying more than one replica pair
Summary: remove-brick operation on distribute-replicate volume fails to start when spe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.0
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: RHGS 2.1.2
Assignee: Susant Kumar Palai
QA Contact: senaik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-20 05:33 UTC by Rejy M Cyriac
Modified: 2015-09-01 12:24 UTC (History)
12 users (show)

Fixed In Version: glusterfs-3.4.0.35.1u2rhs-1
Doc Type: Bug Fix
Doc Text:
Previously, the remove-brick operation supported removal of only one replica pair at a time. With this update, multiple replica pair removal is supported. If the bricks are from the same sub-volumes, removal is successful irrespective of the order of the bricks displayed on the CLI.
Clone Of:
Environment:
Last Closed: 2014-02-25 07:25:32 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0208 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #2 2014-02-25 12:20:30 UTC

Description Rejy M Cyriac 2013-03-20 05:33:45 UTC
Description of problem:

On a distribute-replicate volume added as storage domain, for VM image store, in RHEV, remove-brick operation was attempted specifying two replica pairs of bricks. But the start failed with message "Bricks not from same subvol for replica"

When a single replica pair of bricks was given, the remove-brick operation started successfully

Details given below.

--------------------------------------------

[root@rhs-client45 ~]# gluster volume info
 
Volume Name: RHS_VM_imagestore
Type: Distributed-Replicate
Volume ID: 8f97ac5c-3269-46be-a0f7-d33e61bc7128
Status: Started
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks:
Brick1: rhs-client45.lab.eng.blr.redhat.com:/brick1
Brick2: rhs-client37.lab.eng.blr.redhat.com:/brick1
Brick3: rhs-client15.lab.eng.blr.redhat.com:/brick1
Brick4: rhs-client10.lab.eng.blr.redhat.com:/brick1
Brick5: rhs-client45.lab.eng.blr.redhat.com:/brick2
Brick6: rhs-client37.lab.eng.blr.redhat.com:/brick2
Brick7: rhs-client15.lab.eng.blr.redhat.com:/brick2
Brick8: rhs-client10.lab.eng.blr.redhat.com:/brick2
Brick9: rhs-client45.lab.eng.blr.redhat.com:/brick3
Brick10: rhs-client37.lab.eng.blr.redhat.com:/brick3
Brick11: rhs-client15.lab.eng.blr.redhat.com:/brick3
Brick12: rhs-client10.lab.eng.blr.redhat.com:/brick3
Brick13: rhs-client45.lab.eng.blr.redhat.com:/brick4
Brick14: rhs-client37.lab.eng.blr.redhat.com:/brick4
Brick15: rhs-client15.lab.eng.blr.redhat.com:/brick4
Brick16: rhs-client10.lab.eng.blr.redhat.com:/brick4
Brick17: rhs-client45.lab.eng.blr.redhat.com:/brick5
Brick18: rhs-client37.lab.eng.blr.redhat.com:/brick5
Brick19: rhs-client15.lab.eng.blr.redhat.com:/brick5
Brick20: rhs-client10.lab.eng.blr.redhat.com:/brick5
Brick21: rhs-client45.lab.eng.blr.redhat.com:/brick6
Brick22: rhs-client37.lab.eng.blr.redhat.com:/brick6
Brick23: rhs-client15.lab.eng.blr.redhat.com:/brick6
Brick24: rhs-client10.lab.eng.blr.redhat.com:/brick6
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
network.remote-dio: on
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
 
Volume Name: RHS_extra
Type: Distributed-Replicate
Volume ID: bc9d2c6e-0c5e-4beb-a43d-82c9e0490da3
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhs-client45.lab.eng.blr.redhat.com:/test
Brick2: rhs-client37.lab.eng.blr.redhat.com:/test
Brick3: rhs-client15.lab.eng.blr.redhat.com:/test
Brick4: rhs-client10.lab.eng.blr.redhat.com:/test
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
network.remote-dio: on
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
[root@rhs-client45 ~]# gluster volume remove-brick RHS_VM_imagestore rhs-client45.lab.eng.blr.redhat.com:/brick6 rhs-client37.lab.eng.blr.redhat.com:/brick6 rhs-client15.lab.eng.blr.redhat.com:/brick6 rhs-client10.lab.eng.blr.redhat.com:/brick6 start
Bricks not from same subvol for replica
[root@rhs-client45 ~]# gluster volume remove-brick RHS_VM_imagestore rhs-client45.lab.eng.blr.redhat.com:/brick1 rhs-client37.lab.eng.blr.redhat.com:/brick1 rhs-client15.lab.eng.blr.redhat.com:/brick1 rhs-client10.lab.eng.blr.redhat.com:/brick1 start
Bricks not from same subvol for replica
[root@rhs-client45 ~]# gluster volume remove-brick RHS_VM_imagestore rhs-client45.lab.eng.blr.redhat.com:/brick6 rhs-client37.lab.eng.blr.redhat.com:/brick6  startRemove Brick start successful
[root@rhs-client45 ~]# gluster volume remove-brick RHS_VM_imagestore rhs-client45.lab.eng.blr.redhat.com:/brick6 rhs-client37.lab.eng.blr.redhat.com:/brick6  status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost                1            0            9            0    in progress
     rhs-client37.lab.eng.blr.redhat.com                0            0           31            0      completed
     rhs-client15.lab.eng.blr.redhat.com                0            0            0            0    not started
     rhs-client10.lab.eng.blr.redhat.com                0            0            0            0    not started


--------------------------------------------

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.On a distribute-replicate volume used for RHEV VM image store, attempt to run remove-brick operation specifying two replica pairs of bricks
2.operation does not start
3.try remove-brick operation with one replica pair, and it starts
  
Actual results:

remove-brick operation only starts when single replica pair of bricks is specified

Expected results:

remove-brick operation must start successfully, with any number of replica pairs of bricks given in one command

Additional info:

Comment 1 Gowrishankar Rajaiyan 2013-04-26 11:17:45 UTC
Not specific to rhev-rhs, hence updating summary.

Comment 2 Rejy M Cyriac 2013-05-16 11:47:20 UTC
Issue reproducible on glusterfs-server-3.4.0.8rhs-1.el6rhs.x86_64

------------------------------------------------------------------

[Thu May 16 17:07:06 root@rhs-client45:~ ] #gluster volume info RHEV-BigBend_extra
 
Volume Name: RHEV-BigBend_extra
Type: Distributed-Replicate
Volume ID: 02a49c67-dcfd-42ef-98ee-d6690a61617b
Status: Started
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Bricks:
Brick1: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-BigBend_extra
Brick2: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-BigBend_extra
Brick3: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-BigBend_extra
Brick4: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick4/RHEV-BigBend_extra
Brick5: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick5/RHEV-BigBend_extra
Brick6: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick5/RHEV-BigBend_extra
Brick7: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick5/RHEV-BigBend_extra
Brick8: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick5/RHEV-BigBend_extra
Brick9: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick6/RHEV-BigBend_extra
Brick10: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick6/RHEV-BigBend_extra
Brick11: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick6/RHEV-BigBend_extra
Brick12: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick6/RHEV-BigBend_extra
Brick13: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra
Brick14: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra
Brick15: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra
Brick16: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick8/RHEV-BigBend_extra
Brick17: rhs-client45.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra
Brick18: rhs-client37.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra
Brick19: rhs-client15.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra
Brick20: rhs-client4.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off

[Thu May 16 17:07:17 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client45.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client37.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra start
volume remove-brick start: failed: Bricks not from same subvol for replica

[Thu May 16 17:08:23 root@rhs-client45:~ ] #gluster volume remove-brick RHEV-BigBend_extra rhs-client45.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client37.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra startvolume remove-brick start: success
ID: ec497c4e-4b5d-4761-990e-a6a080797116
[Thu May 16 17:10:27 root@rhs-client45:~ ] #

------------------------------------------------------------------

This behaviour is inconsistent with that on adding bricks, wherein multiple replica pairs may be added together. The above bricks were previously added with the command given below.

------------------------------------------------------------------

gluster volume add-brick RHEV-BigBend_extra rhs-client45.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client37.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client15.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra rhs-client4.lab.eng.blr.redhat.com:/rhs/brick9/RHEV-BigBend_extra

------------------------------------------------------------------

Comment 3 Ravishankar N 2013-08-20 05:30:27 UTC
Patch Review URL: https://code.engineering.redhat.com/gerrit/#/c/11584/

Comment 4 Pavithra 2014-01-08 06:51:25 UTC
Can you please review the doc text for technical accuracy?

Comment 5 Susant Kumar Palai 2014-01-08 09:38:19 UTC
Doc text looks good to me.

Comment 6 shylesh 2014-01-20 07:06:40 UTC
Verified on 3.4.0.57rhs-1.el6rhs.x86_64.
Now multiple bricks can be removed 

[root@rhs-client4 mnt]# gluster v remove-brick another rhs-client39.lab.eng.blr.redhat.com:/home/another11 rhs-client9.lab.eng.blr.redhat.com:/home/another10 rhs-client4.lab.eng.blr.redhat.com:/home/another9 rhs-client39.lab.eng.blr.redhat.com:/home/another8 rhs-client9.lab.eng.blr.redhat.com:/home/another7 rhs-client4.lab.eng.blr.redhat.com:/home/another6 start
volume remove-brick start: success
ID: c5a8ebd3-62e6-4ab1-aa57-b0be509a9d55
[root@rhs-client4 mnt]# gluster v remove-brick another rhs-client39.lab.eng.blr.redhat.com:/home/another11 rhs-client9.lab.eng.blr.redhat.com:/home/another10 rhs-client4.lab.eng.blr.redhat.com:/home/another9 rhs-client39.lab.eng.blr.redhat.com:/home/another8 rhs-client9.lab.eng.blr.redhat.com:/home/another7 rhs-client4.lab.eng.blr.redhat.com:/home/another6 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost               30        15.0MB           129             0             0          in progress               6.00
     rhs-client39.lab.eng.blr.redhat.com               28        13.5MB           121             0             0          in progress               6.00
      rhs-client9.lab.eng.blr.redhat.com               23        11.5MB           131             0             0          in progress               6.00

Comment 8 errata-xmlrpc 2014-02-25 07:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html


Note You need to log in before you can comment on or make changes to this bug.