[2011-09-28 17:02:45.139827] W [write-behind.c:3030:init] 0-test4-write-behind: disabling write-behind for first 0 b ytes [2011-09-28 17:02:45.237063] I [client.c:1935:notify] 0-test4-client-0: parent translators are ready, attempting con nect on transport [2011-09-28 17:02:45.241955] I [client.c:1935:notify] 0-test4-client-1: parent translators are ready, attempting con nect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume test4-client-0 2: type protocol/client 3: option remote-host centos2 4: option remote-subvolume /var/spool/gluster/test4a 5: option transport-type tcp 6: end-volume 7: 8: volume test4-client-1 9: type protocol/client 10: option remote-host centos2 11: option remote-subvolume /var/spool/gluster/test4b 12: option transport-type tcp 13: end-volume 14: 15: volume test4-replicate-0 16: type cluster/replicate 17: subvolumes test4-client-0 test4-client-1 18: end-volume 19: 20: volume test4-write-behind 21: type performance/write-behind 22: subvolumes test4-replicate-0 23: end-volume 24: 25: volume test4-read-ahead 26: type performance/read-ahead 27: subvolumes test4-write-behind 28: end-volume 29: 30: volume test4-io-cache 31: type performance/io-cache 32: subvolumes test4-read-ahead 33: end-volume 34: 35: volume test4-quick-read 36: type performance/quick-read 37: subvolumes test4-io-cache 38: end-volume 39: 40: volume test4-stat-prefetch 41: type performance/stat-prefetch 42: subvolumes test4-quick-read 43: end-volume 44: 45: volume test4 46: type debug/io-stats 47: option latency-measurement off 48: option count-fop-hits off 49: subvolumes test4-stat-prefetch 50: end-volume +------------------------------------------------------------------------------+ [2011-09-28 17:02:45.247241] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-test4-client-1: changing port to 24028 (from 0) [2011-09-28 17:02:45.247397] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-test4-client-0: changing port to 24027 (from 0) [2011-09-28 17:02:49.144295] I [client-handshake.c:1082:select_server_supported_programs] 0-test4-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2011-09-28 17:02:49.144949] I [client-handshake.c:913:client_setvolume_cbk] 0-test4-client-1: Connected to 10.0.0.136:24028, attached to remote volume '/var/spool/gluster/test4b'. [2011-09-28 17:02:49.144980] I [afr-common.c:2611:afr_notify] 0-test4-replicate-0: Subvolume 'test4-client-1' came back up; going online. [2011-09-28 17:02:49.150072] I [client-handshake.c:1082:select_server_supported_programs] 0-test4-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2011-09-28 17:02:49.150460] I [client-handshake.c:913:client_setvolume_cbk] 0-test4-client-0: Connected to 10.0.0.136:24027, attached to remote volume '/var/spool/gluster/test4a'. [2011-09-28 17:02:49.706624] I [fuse-bridge.c:3336:fuse_graph_setup] 0-fuse: switched to graph 0 [2011-09-28 17:02:49.872812] I [fuse-bridge.c:2924:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10 [2011-09-28 17:02:49.874888] I [afr-common.c:912:afr_fresh_lookup_cbk] 0-test4-replicate-0: added root inode [2011-09-28 17:02:50.110079] I [afr-dir-read.c:174:afr_examine_dir_readdir_cbk] 0-test4-replicate-0: entry self-heal triggered. path: /, reason: checksums of directory differ, forced merge option set [2011-09-28 17:02:50.144930] W [afr-common.c:122:afr_set_split_brain] (-->/usr/lib64/glusterfs/3.2.3/xlator/cluster/replicate.so [0x2aaaab5fd113] (-->/usr/lib64/glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_entry_done+0x46) [0x2aaaab5f6646] (-->/usr/lib64/glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x246) [0x2aaaab5efac6]))) 0-test4-replicate-0: invalid argument: inode [2011-09-28 17:02:50.144981] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-test4-replicate-0: background entry entry self-heal completed on / [2011-09-28 17:02:50.185364] I [afr-common.c:649:afr_lookup_self_heal_check] 0-test4-replicate-0: size differs for /bar [2011-09-28 17:02:50.185405] I [afr-common.c:811:afr_lookup_done] 0-test4-replicate-0: background meta-data data self-heal triggered. path: /bar [2011-09-28 17:02:50.191267] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-test4-replicate-0: background meta-data data self-heal completed on /bar [2011-09-28 17:02:50.211964] I [afr-common.c:649:afr_lookup_self_heal_check] 0-test4-replicate-0: size differs for /baz [2011-09-28 17:02:50.212004] I [afr-common.c:811:afr_lookup_done] 0-test4-replicate-0: background data self-heal triggered. path: /baz [2011-09-28 17:02:50.213701] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-test4-replicate-0: background data data self-heal completed on /baz
Create a volume test1: gluster volume create replica 2 test1 server1:/data/test1 server2:/data/test1 Mount the volume: mount -t glusterfs server1:test1 /mnt/test1 Create a file: cd /mnt/test1 echo asdf > foo Create hardlinks: ln foo bar ln foo baz ls -li 5767174 -rw-r--r-- 3 root root 5 Sep 28 16:59 bar 5767174 -rw-r--r-- 3 root root 5 Sep 28 16:59 baz 5767174 -rw-r--r-- 3 root root 5 Sep 28 16:59 foo Unmount and stop everything: umount /mnt/test1 (on both servers) service glusterd stop service glusterfsd stop Wipe out a share directory to simulate a drive replacement: (server2) rm -rf /data/test1 mkdir /data/test1 Start the server, mount and stat the files to start a self-heal: (on both servers) service glusterd start (one machine) mount -t glusterfs server1:test1 /mnt/test1 cd /mnt/test1 stat * ls -l total 16 -rw-r--r-- 1 root root 0 Sep 28 17:02 bar -rw-r--r-- 1 root root 0 Sep 28 17:02 baz -rw-r--r-- 1 root root 5 Sep 28 16:59 foo (note the 0-sized files) On the backend, server1: getfattr -m . -d -e hex * # file: bar trusted.afr.test1-client-0=0x000000000000000000000000 trusted.afr.test1-client-1=0x000000000000000000000000 trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567 # file: baz trusted.afr.test1-client-0=0x000000000000000000000000 trusted.afr.test1-client-1=0x000000000000000000000000 trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567 # file: foo trusted.afr.test1-client-0=0x000000000000000000000000 trusted.afr.test1-client-1=0x000000000000000000000000 trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567 ls -li total 24 2883587 -rw-r--r-- 3 root root 5 Sep 28 16:59 bar 2883587 -rw-r--r-- 3 root root 5 Sep 28 16:59 baz 2883587 -rw-r--r-- 3 root root 5 Sep 28 16:59 foo On the backend, server2: getfattr -m . -d -e hex * # file: bar trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567 # file: baz trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567 # file: foo trusted.afr.test1-client-0=0x000000000000000000000000 trusted.afr.test1-client-1=0x000000000000000000000000 trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567 ls -li total 16 2883588 -rw-r--r-- 1 root root 0 Sep 28 17:02 bar 2883589 -rw-r--r-- 1 root root 0 Sep 28 17:02 baz 2883590 -rw-r--r-- 1 root root 5 Sep 28 16:59 foo
Really? This can easily result in data loss.
(In reply to comment #2) > Really? This can easily result in data loss. It is a P1 enhancement as the code changes involved are not trivial.
My idea was to use the sticky-bit pointers to simulate hardlinks. The actual file would still need some sort of pointer back to the sticky, then, probably an extended attribute. Renames would have to go back to all the pointers and update them to the new filename. Deletes would probably just trigger a rename to one of the stickies, which would then have to update all the stickies to the new filename.
(In reply to comment #4) > My idea was to use the sticky-bit pointers to simulate hardlinks. The actual > file would still need some sort of pointer back to the sticky, then, probably > an extended attribute. > > Renames would have to go back to all the pointers and update them to the new > filename. Deletes would probably just trigger a rename to one of the stickies, > which would then have to update all the stickies to the new filename. We're introducing a solid framework (gfid filehandles) to address hardlinks and rename self-heals in 3.4. Some of the framework code can be found in https://github.com/avati/glusterfs/commits/iops. It is best to await this "right" fix in 3.4 than kludgy patchwork. Avati
*** Bug 764393 has been marked as a duplicate of this bug. ***
CHANGE: http://review.gluster.com/2841 (cluster/afr: Hardlink Self-heal) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/3056 (cluster/afr: Fix frame leak in hardlink self-heal) merged in master by Vijay Bellur (vijay)
Could someone please clarify the status of hardlinks working correctly in the latest stable gluster? Can these be used properly? Is there a workaround in case of node failures? Would be much appreciated. James
Verified the bug on 3.3.0qa43. Bug is fixed.
is this change, 3.3.0qa43, part of the released 3.3.0-1 standard RPMS or is it to be rolled into a future version?
It is part of 3.3.0