Bug 1241862 - EC volume: Replace bricks is not healing version of root directory
Summary: EC volume: Replace bricks is not healing version of root directory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.1.1
Assignee: Ashish Pandey
QA Contact: Bhaskarakiran
URL:
Whiteboard:
Depends On:
Blocks: 1243382 1243384 1251815
TreeView+ depends on / blocked
 
Reported: 2015-07-10 09:27 UTC by RajeshReddy
Modified: 2016-09-17 15:06 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.7.1-14
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1243382 (view as bug list)
Environment:
Last Closed: 2015-10-05 07:19:35 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1845 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.1 update 2015-10-05 11:06:22 UTC

Description RajeshReddy 2015-07-10 09:27:28 UTC
Description of problem:
=====================
In EC volume, replace bricks is not working 

Version-Release number of selected component (if applicable):
===============
glusterfs-fuse-3.7.1-8


How reproducible:


Steps to Reproduce:
===================
1.Create (4+2) EC volume and mount it on client and create few directories and files on the volume 
2.Bring down of the brick and replace with new brick by running gluster volume replace-brick
3. Bring down one more brick and replace with new brick (Volume status shows all bricks are online and running) 
4. Bring down any of the old brick and try to access the data from the mount point but IO error is coming 

Actual results:
==============
Looks like replace brick is not working properly 


Expected results:
==================
Replace brick should work properly 

Additional info:

Comment 2 RajeshReddy 2015-07-10 10:59:09 UTC
Even after running gluster vol heal <ECVOL> full i am hitting this issue, I am able to see metadata version difference between old brick and replaced brick 

Old brick
============
[root@rhs-hpc-srv2 bitrot]# getfattr -d -e hex -m. /rhs/brick1/ECVOL/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/ECVOL/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a64656661756c745f743a733000
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.version=0x00000000000000010000000000000004
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xfe1bd921b58e47b4bcdefcb2abf64a5f



Replaced Brick:
================[root@rhs-hpc-srv2 bitrot]# getfattr -d -e hex -m. /rhs/brick2/ECVOL/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ECVOL/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.ec.version=0x00000000000000000000000000000004
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xfe1bd921b58e47b4bcdefcb2abf64a5f

Comment 3 Pranith Kumar K 2015-07-12 13:18:58 UTC
Heal is not yet completed. ec.version xattr should become trusted.ec.version=0x00000000000000010000000000000004 after the heal, then only it will start to work.

Comment 4 RajeshReddy 2015-07-13 09:10:22 UTC
Other sub directories has same version but root folder doesn't have same version

Comment 5 Pranith Kumar K 2015-07-13 11:05:45 UTC
With help from Rajesh we found the root cause. Replace-brick functionality is working but the versions of root directory are not set correctly due to which when other bricks are taken down EIO error is coming. Changed the description of the bug to reflect this behaviour as replace-brick itself is healing data.

Comment 7 Anjana Suparna Sriram 2015-07-27 10:15:15 UTC
Please review and sign off to be included in the known issues chapter.

Comment 8 Pranith Kumar K 2015-07-27 10:16:43 UTC
Looks good to me Anjana

Comment 12 Bhaskarakiran 2015-09-03 09:44:03 UTC
Verified this on 3.7.1-14 build and root directory is getting healed.

[root@interstellar ~]# gluster v info vol2
 
Volume Name: vol2
Type: Disperse
Volume ID: d4a0627c-7a03-4fdd-bf32-c6f4eff9e0d6
Status: Started
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: transformers:/rhs/brick7/vol2-1
Brick2: interstellar:/rhs/brick7/vol2-2
Brick3: transformers:/rhs/brick8/vol2-3
Brick4: interstellar:/rhs/brick8/vol2-4
Brick5: transformers:/rhs/brick9/vol2-5
Brick6: interstellar:/rhs/brick9/vol2-6
Options Reconfigured:
cluster.disperse-self-heal-daemon: enable
disperse.background-heals: 0
server.event-threads: 2
client.event-threads: 2
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
features.uss: on
performance.readdir-ahead: on

old brick:
==========

[root@interstellar ~]# getfattr -d -e hex -m. /rhs/brick10/vol2-4/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick10/vol2-4/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.version=0x00000000000000050000000000000007
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000033ee1000000000000000ff6b000000000000276f
trusted.glusterfs.volume-id=0xd4a0627c7a034fddbf32c6f4eff9e0d6

replaced brick :
================

[root@interstellar ~]# getfattr -d -e hex -m. /rhs/brick8/vol2-4/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick8/vol2-4/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.version=0x00000000000000050000000000000007
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000033ee1000000000000000ff6b000000000000276f
trusted.glusterfs.volume-id=0xd4a0627c7a034fddbf32c6f4eff9e0d6

[root@interstellar ~]# 

Moving this to verified.

Comment 14 errata-xmlrpc 2015-10-05 07:19:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html


Note You need to log in before you can comment on or make changes to this bug.