Description of problem: DHT - User is able to modify file when cached sub-volume is down and hashed sub-volume is up it results in data lost and at same level multiple files can be created having same name Version-Release number of selected component (if applicable): glusterfs-3.4.0qa5-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. Create a Distributed volume having 3 or more sub-volumes on multiple server and start that volume. 2. Fuse Mount the volume from the client-1 using “mount -t glusterfs server:/<volume> <client-1_mount_point>” 3. From mount point create some dirs and files inside it. Execute rename command for files and make sure that because of rename operation hashed and cached sub-volum eis different for that file server 3:- -bash-4.1# ls -l renamefile10 ---------T 2 root root 0 Jan 9 06:49 renamefile10 server 1:- -bash-4.1# stat renamefile10 File: `renamefile10' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 810h/2064d Inode: 273839861 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-01-09 09:10:31.647469095 +0000 Modify: 2013-01-09 06:39:41.000000000 +0000 4. Bring sub-volume down where file is cached. In our case it's server 1. 5. from mount point modify that file using vi/vim. [root@client verify]# vi renamefile10 do some changes, save and quit 6. Bring all the sub-vols up 7. Now from mount point execute ls command and verify that renamefile10 is listed twice [root@client verify]# ls d1 d16 d22 d29 d35 d41 d48 d9 renamefile14 renamefile20 renamefile27 renamefile33 renamefile4 renamefile46 renamefile7 d10 d17 d23 d3 d36 d42 d49 renamefile1 renamefile15 renamefile21 renamefile28 renamefile34 renamefile40 renamefile47 renamefile8 d11 d18 d24 d30 d37 d43 d5 renamefile10 renamefile16 renamefile22 renamefile29 renamefile35 renamefile41 renamefile48 renamefile9 d12 d19 d25 d31 d38 d44 d50 renamefile10 renamefile17 renamefile23 renamefile3 renamefile36 renamefile42 renamefile49 d13 d2 d26 d32 d39 d45 d6 renamefile11 renamefile18 renamefile24 renamefile30 renamefile37 renamefile43 renamefile5 d14 d20 d27 d33 d4 d46 d7 renamefile12 renamefile19 renamefile25 renamefile31 renamefile38 renamefile44 renamefile50 d15 d21 d28 d34 d40 d47 d8 renamefile13 renamefile2 renamefile26 renamefile32 renamefile39 renamefile45 renamefile6 8. check files on backend server 1:- -bash-4.1# stat renamefile10 File: `renamefile10' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 810h/2064d Inode: 273839861 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-01-09 09:10:31.647469095 +0000 Modify: 2013-01-09 06:39:41.000000000 +0000 Change: 2013-01-09 06:49:43.233807527 +0000 server 3:- -bash-4.1# getfattr -d -m . renamefile10 # file: renamefile10 trusted.gfid=0snL5DI4LxQKu/FKMnTyD2WQ== -bash-4.1# stat renamefile10 File: `renamefile10' Size: 2 Blocks: 8 IO Block: 4096 regular file Device: 810h/2064d Inode: 145987204 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-01-09 09:10:31.649531504 +0000 Modify: 2013-01-09 09:06:55.293946024 +0000 Change: 2013-01-09 09:06:55.293946024 +0000 Actual results: User is able to modify file when cached sub-volume is down Expected results: User should not be able to modify file when cached sub-volume is down
Upstream fix http://review.gluster.org/#change,4383 in review
CHANGE: http://review.gluster.org/4383 (cluster/distribute: If cached_subvol is down, return ENOTCONN in lookup) merged in master by Anand Avati (avati)
*** Bug 903917 has been marked as a duplicate of this bug. ***
*** Bug 903476 has been marked as a duplicate of this bug. ***
verified with 3.3.0.6rhs-4.el6rhs.x86_64 bug 893378 and bug 903917 working as per expectation but Bug 903476 - not working as per expectation e.g. [root@rhsauto037 new]# cat renamefile18 cat: renamefile18: No such file or directory [root@rhsauto037 new]# cp renamefile18 abc cp: cannot stat `renamefile18': No such file or directory [root@rhsauto037 new]# ls -l renamefile18 ls: cannot access renamefile18: No such file or directory [root@rhsauto037 new]# chmod 777 f1 chmod: cannot access `f1': No such file or directory hence moving back to assigned
sorry for the inconvenience caused, logs are attached to the bug.
After investigating the logs, it looked like a issue fixed in bug 884379. Updated the release to [root@localhost ~]# rpm -qa |grep glusterfs glusterfs-devel-3.3.0.6rhs-6.el6rhs.x86_64 glusterfs-3.3.0.6rhs-6.el6rhs.x86_64 glusterfs-server-3.3.0.6rhs-6.el6rhs.x86_64 glusterfs-geo-replication-3.3.0.6rhs-6.el6rhs.x86_64 glusterfs-debuginfo-3.3.0.6rhs-6.el6rhs.x86_64 glusterfs-fuse-3.3.0.6rhs-6.el6rhs.x86_64 glusterfs-rdma-3.3.0.6rhs-6.el6rhs.x86_64 1. Create a 3 - dht volume, mount and rename file till we get a linkfile [root@localhost export]# mount -t glusterfs localhost:/test /mnt/dht/ [root@localhost export]# cd /mnt/dht/ [root@localhost dht]# ls [root@localhost dht]# touch 1 [root@localhost dht]# mv 1 2 [root@localhost dht]# ls -l /export/* /export/sub1: total 0 ---------T. 2 root root 0 Mar 21 03:28 2 /export/sub2: total 0 -rw-r--r--. 2 root root 0 Mar 21 03:28 2 /export/sub3: total 0 2. kill brick, and check status [root@localhost ~]# gluster volume status Status of volume: test Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick vm1:/export/sub1 24012 Y 29771 Brick vm1:/export/sub2 24013 N N/A Brick vm1:/export/sub3 24014 Y 29784 NFS Server on localhost 38467 Y 29790 3. Try to perform ops on the file. [root@localhost dht]# cat 2 cat: 2: Transport endpoint is not connected [root@localhost dht]# rm 2 rm: cannot remove `2': Transport endpoint is not connected [root@localhost dht]# ls -l 2 ls: cannot access 2: Transport endpoint is not connected [root@localhost dht]# mv 2 3 mv: cannot stat `2': Transport endpoint is not connected Can you please rerun the test and check if the issue is fixed.
Per 04-10-2013 Storage bug triage meeting, targeting for Big Bend.
I'm not able to reproduce this bug with the glusterfs-3.4.0.9rhs-1.el6rhs. [root@localhost mnt]# gluster volume info test Volume Name: test Type: Distribute Volume ID: 713ad1ed-96ca-459b-8728-0209439b972f Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.70.42.223:/brick/test1 Brick2: 10.70.42.223:/brick/test2 [root@localhost mnt]# mount | grep glusterfs localhost:test on /mnt type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) [root@localhost mnt]# touch file [root@localhost mnt]# ls -l /brick/* /brick/test1: total 0 /brick/test2: total 0 -rw-r--r-- 2 root root 0 Jun 12 13:07 file [root@localhost mnt]# mv file file.rename [root@localhost mnt]# ls -l /brick/* /brick/test1: total 0 ---------T 2 root root 0 Jun 12 13:07 file.rename /brick/test2: total 0 -rw-r--r-- 2 root root 0 Jun 12 13:07 file.rename [root@localhost mnt]# gluster volume status test Status of volume: test Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.223:/brick/test1 49152 Y 28403 Brick 10.70.42.223:/brick/test2 49153 Y 29028 NFS Server on localhost 2049 Y 29039 There are no active volume tasks [root@localhost mnt]# kill 29028 [root@localhost mnt]# gluster volume status test Status of volume: test Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.223:/brick/test1 49152 Y 28403 Brick 10.70.42.223:/brick/test2 N/A N N/A NFS Server on localhost 2049 Y 29039 There are no active volume tasks [root@localhost mnt]# ls [root@localhost mnt]# touch file.rename touch: cannot touch `file.rename': Transport endpoint is not connected [root@localhost mnt]# cat file.rename cat: file.rename: Transport endpoint is not connected [root@localhost mnt]# echo "hello" > file.rename -bash: file.rename: Transport endpoint is not connected Can you rerun this test and confirm the same?
Requesting QE to run the test as mentioned by Kaushal above.
As mentioned in comment #7 of this bug " verified with 3.3.0.6rhs-4.el6rhs.x86_64 bug 893378 and bug 903917 working as per expectation but Bug 903476 - not working as per expectation" this bug was fixed but it got opened as one of its duplicate Bug 903476 was not working as per expectation. As we have removed Bug 903476 from duplicate. we can mark this as verified.(bug 893378 and bug 903917 working as per expectation)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html