Description of problem: ===================== In EC volume, replace bricks is not working Version-Release number of selected component (if applicable): =============== glusterfs-fuse-3.7.1-8 How reproducible: Steps to Reproduce: =================== 1.Create (4+2) EC volume and mount it on client and create few directories and files on the volume 2.Bring down of the brick and replace with new brick by running gluster volume replace-brick 3. Bring down one more brick and replace with new brick (Volume status shows all bricks are online and running) 4. Bring down any of the old brick and try to access the data from the mount point but IO error is coming Actual results: ============== Looks like replace brick is not working properly Expected results: ================== Replace brick should work properly Additional info:
Even after running gluster vol heal <ECVOL> full i am hitting this issue, I am able to see metadata version difference between old brick and replaced brick Old brick ============ [root@rhs-hpc-srv2 bitrot]# getfattr -d -e hex -m. /rhs/brick1/ECVOL/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/ECVOL/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a64656661756c745f743a733000 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.version=0x00000000000000010000000000000004 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0xfe1bd921b58e47b4bcdefcb2abf64a5f Replaced Brick: ================[root@rhs-hpc-srv2 bitrot]# getfattr -d -e hex -m. /rhs/brick2/ECVOL/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/ECVOL/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.ec.version=0x00000000000000000000000000000004 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0xfe1bd921b58e47b4bcdefcb2abf64a5f
Heal is not yet completed. ec.version xattr should become trusted.ec.version=0x00000000000000010000000000000004 after the heal, then only it will start to work.
Other sub directories has same version but root folder doesn't have same version
With help from Rajesh we found the root cause. Replace-brick functionality is working but the versions of root directory are not set correctly due to which when other bricks are taken down EIO error is coming. Changed the description of the bug to reflect this behaviour as replace-brick itself is healing data.
Please review and sign off to be included in the known issues chapter.
Looks good to me Anjana
Verified this on 3.7.1-14 build and root directory is getting healed. [root@interstellar ~]# gluster v info vol2 Volume Name: vol2 Type: Disperse Volume ID: d4a0627c-7a03-4fdd-bf32-c6f4eff9e0d6 Status: Started Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: transformers:/rhs/brick7/vol2-1 Brick2: interstellar:/rhs/brick7/vol2-2 Brick3: transformers:/rhs/brick8/vol2-3 Brick4: interstellar:/rhs/brick8/vol2-4 Brick5: transformers:/rhs/brick9/vol2-5 Brick6: interstellar:/rhs/brick9/vol2-6 Options Reconfigured: cluster.disperse-self-heal-daemon: enable disperse.background-heals: 0 server.event-threads: 2 client.event-threads: 2 features.quota-deem-statfs: on features.inode-quota: on features.quota: on features.uss: on performance.readdir-ahead: on old brick: ========== [root@interstellar ~]# getfattr -d -e hex -m. /rhs/brick10/vol2-4/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick10/vol2-4/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.version=0x00000000000000050000000000000007 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x0000000033ee1000000000000000ff6b000000000000276f trusted.glusterfs.volume-id=0xd4a0627c7a034fddbf32c6f4eff9e0d6 replaced brick : ================ [root@interstellar ~]# getfattr -d -e hex -m. /rhs/brick8/vol2-4/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick8/vol2-4/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.ec.version=0x00000000000000050000000000000007 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.size=0x0000000033ee1000000000000000ff6b000000000000276f trusted.glusterfs.volume-id=0xd4a0627c7a034fddbf32c6f4eff9e0d6 [root@interstellar ~]# Moving this to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html