Bug 972653 - DHT:Rebalance:- Newly created directories doesn't get hash ranges after decommissioning a brick without committing and running fix-layout
DHT:Rebalance:- Newly created directories doesn't get hash ranges after decom...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: shishir gowda
shylesh
:
Depends On:
Blocks: 973073
  Show dependency treegraph
 
Reported: 2013-06-10 07:02 EDT by shylesh
Modified: 2013-12-08 20:36 EST (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.12rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 973073 (view as bug list)
Environment:
Last Closed: 2013-09-23 18:35:37 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description shylesh 2013-06-10 07:02:34 EDT
Description of problem:
starting remove brick , once migration is over run rebalance fix-layout without committing. newly created directories on the mount point will not get hash ranges

Version-Release number of selected component (if applicable):
[root@rhs3-alpha t2]# rpm -qa | grep gluster
glusterfs-server-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-3.4.0.9rhs-1.el6rhs.x86_64

 

Steps to Reproduce:
1.created a 3 brick distribute volume and created some files and directories
2.remove brick start on one of the brick
gluster volume remove-brick <vol> <brick> start
3.once migration is over run fix-layout so that decommissioned brick will be included again with the volume
4. Now create a new directory and check the hash ranges on all the bricks

Actual results:
Newly created directory doesn't get the hash range on the brick which was decommissioned and added back
 

Additional info:

Cluster details
=================
RHS nodes
===========
10.70.35.62
10.70.35.64 === > remove-brick command was executed from this node

mounted on 
==========
10.70.35.203 

mount point
===========
/mnt


Volume Name: test
Type: Distribute
Volume ID: 7396b44d-e46e-41fc-8a31-3b707e08eb47
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 10.70.35.64:/brick2/t1
Brick2: 10.70.35.62:/brick2/t2
Brick3: 10.70.35.64:/brick2/t3 ===> brick which was decommissioned and added back

[root@rhs1-alpha t3]# getfattr -d -m . -e hex ../t*
# file: ../t1
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
trusted.glusterfs.volume-id=0x7396b44de46e41fc8a313b707e08eb47

[root@rhs3-alpha t2]# getfattr -d -m . -e hex ../t2
# file: ../t2
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000000000000055555554
trusted.glusterfs.volume-id=0x7396b44de46e41fc8a313b707e08eb47



# file: ../t3
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
trusted.glusterfs.volume-id=0x7396b44de46e41fc8a313b707e08eb47


newly created directory
======
# file: ../t1/test
trusted.gfid=0x70c62e6cb98d4f98b368e35bcd78277d
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
 
# file: ../t3/test
trusted.gfid=0x70c62e6cb98d4f98b368e35bcd78277d

[root@rhs3-alpha t2]# getfattr -d -m . -e hex ../t2/test
# file: ../t2/test
trusted.gfid=0x70c62e6cb98d4f98b368e35bcd78277d
trusted.glusterfs.dht=0x00000001000000007fffffffffffffff




attached the sosreports
Comment 4 shylesh 2013-06-11 03:16:56 EDT
 I forgot to mention one of the step which is i stopped the remove-brick before running the fix-layout.

so actual steps will be 
======================
1.created a 3 brick distribute volume and created some files and directories
2.remove brick start on one of the brick
 gluster volume remove-brick <vol> <brick> start
3. stop the remove-brick using the command
 gluster volume remove-brick <vol> <brick> stop
verify the same by status command
4.once stopped run fix-layout so that decommissioned brick will be included again with the volume, verify by checking the hash layout of the brick
5. Now create a new directory and check the hash ranges on all the bricks

NOTE:one more note this bug is reproducible with fix-layout as well as start force options.
 
sorry for the inconvenience caused.
Comment 5 shishir gowda 2013-06-11 04:28:36 EDT
The problem seems to be in dht_reconfigure handling. In dht_reconfigure we do not call GF_OPTION_RECONF for the decommissioned option. We just check if the option is set, and handle it.

When remove-brick stop is called, it removes the decommissioned option from the volfile. Hence this key is never passed to dht, and hence we fail to remove it leading to the brick always being identified as decommissioned.
Comment 7 shylesh 2013-08-16 10:30:05 EDT
Verified on 3.4.0.19rhs-2.el6rhs.x86_64. 
Newly created directories gets the hash ranges after rebalancing
Comment 8 Scott Haines 2013-09-23 18:35:37 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.