+++ This bug was initially created as a clone of Bug #1225452 +++ Description of problem: ======================= Removed one of the complete subvolume from (2x3) volume, which must have triggered the rebalance. Once rebalance status is shown as completed. Copied a file "/etc/hosts" to fuse mount and copied another file "/etc/hosts.allow" to nfs mount. file hosts is written on the available subvolume (which was not removed) but file hosts.allow is written on the subvolume which was removed. After this did a ls from fuse mount and observed a link file created on the available subvolume. Performed a commit after this, the removed subvolume is no longer part of the volume but it contains the actual file hosts.allow which is lost. And the available subvolume has a T file which errors in "Structure needs cleaning" [root@georep1 b1]# cd /rhs/brick1/b1/ [root@georep1 b1]# ls -lrt hosts.allow ---------T. 2 root root 0 May 27 20:43 hosts.allow [root@georep1 b1]# cd /rhs/brick2/b2/ [root@georep1 b2]# ls -lrt hosts.allow -rw-r--r--. 2 root root 370 May 27 20:43 hosts.allow [root@georep1 b2]# gluster v info master Volume Name: master Type: Replicate Volume ID: 7b933011-28da-48d7-90a0-40ac33102aae Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.46.96:/rhs/brick1/b1 Brick2: 10.70.46.97:/rhs/brick1/b1 Brick3: 10.70.46.93:/rhs/brick1/b1 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on performance.readdir-ahead: on [root@georep1 b2]# [root@georep1 b2]# getfattr -n trusted.gfid -e hex -m . /rhs/brick1/b1/hosts.allow getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/b1/hosts.allow trusted.gfid=0x70ad500ee72645a39fc3b54d88aa2cc7 [root@georep1 b2]# getfattr -n trusted.gfid -e hex -m . /rhs/brick2/b2/hosts.allow getfattr: Removing leading '/' from absolute path names # file: rhs/brick2/b2/hosts.allow trusted.gfid=0x70ad500ee72645a39fc3b54d88aa2cc7 [root@georep1 b2]# Version-Release number of selected component (if applicable): ============================================================== glusterfs-3.7.0-2.el6rhs.x86_64 How reproducible: ================= Tried only once Steps Carried: ============== 1. Created Master and Slave Cluster 2. Created Master and Slave Volume (2x3) 3. Created shared meta volume and mounted on all the master nodes 4. Created and Started geo-rep session between master and slave volume 5. Mount the master and slave volumes (Fuse & NFS) 6. From Fuse and NFS mount of master volume, create set of dirs and files 7. Once data creation is completed, wait for it to sync to slave 8. Verify using arequal that the data at master and slave matches 9. Remove a complete subvolume using "remove start" 10. Wait for it to complete (Monitor using remove status" 11. Once completed, touch a file "rahul" from fuse and cp a file hosts from /etc/hosts to fuse mount of master 12. Copy a file /etc/hosts.allow to nfs mount of master 13. do a ls from fuse mount , it should list all the rahul,hosts,hosts.allow file 14. do a remove commit to remove the complete subvolume 15. geo-rep session goes to faulty with traceback "OSError: [Errno 117] Structure needs cleaning: '.gfid/70ad500e-e726-45a3-9fc3-b54d88aa2cc7'" Checked in backend, the original file which is written from NFS is present in the decommissioned bricks and its T link file in the present available volume --- Additional comment from Red Hat Bugzilla Rules Engine on 2015-05-27 17:10:21 MVT --- This bug is automatically being proposed for Red Hat Gluster Storage 3.2.0 by setting the release flag 'rhgs‑3.2.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Rahul Hinduja on 2015-05-27 17:20:13 MVT --- sosreports are at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1225452/ The appropriate logs will be at master and client Before commit: ============= [root@georep1 scripts]# gluster v info master Volume Name: master Type: Distributed-Replicate Volume ID: 7b933011-28da-48d7-90a0-40ac33102aae Status: Started Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.70.46.96:/rhs/brick1/b1 Brick2: 10.70.46.97:/rhs/brick1/b1 Brick3: 10.70.46.93:/rhs/brick1/b1 Brick4: 10.70.46.96:/rhs/brick2/b2 Brick5: 10.70.46.97:/rhs/brick2/b2 Brick6: 10.70.46.93:/rhs/brick2/b2 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on performance.readdir-ahead: on [root@georep1 scripts]# After commit: ============ [root@georep1 ~]# gluster v info master Volume Name: master Type: Replicate Volume ID: 7b933011-28da-48d7-90a0-40ac33102aae Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.46.96:/rhs/brick1/b1 Brick2: 10.70.46.97:/rhs/brick1/b1 Brick3: 10.70.46.93:/rhs/brick1/b1 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on performance.readdir-ahead: on [root@georep1 ~]#
REVIEW: http://review.gluster.org/11061 (server/nfs: Restart nfs server post remove-brick start) posted (#1) for review on master by Susant Palai (spalai)
REVIEW: http://review.gluster.org/11061 (server/nfs: Restart nfs server post remove-brick start) posted (#2) for review on master by Susant Palai (spalai)
*** This bug has been marked as a duplicate of bug 1232378 ***