Bug 765391 (GLUSTER-3659) - hardlinks fail to self-heal
Summary: hardlinks fail to self-heal
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3659
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.2.3
Hardware: x86_64
OS: Linux
urgent
low
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact: Shwetha Panduranga
URL:
Whiteboard:
: GLUSTER-2661 (view as bug list)
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2011-09-29 00:12 UTC by Joe Julian
Modified: 2013-07-24 17:55 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:55:59 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Joe Julian 2011-09-28 21:18:49 UTC
[2011-09-28 17:02:45.139827] W [write-behind.c:3030:init] 0-test4-write-behind: disabling write-behind for first 0 b
ytes
[2011-09-28 17:02:45.237063] I [client.c:1935:notify] 0-test4-client-0: parent translators are ready, attempting con
nect on transport
[2011-09-28 17:02:45.241955] I [client.c:1935:notify] 0-test4-client-1: parent translators are ready, attempting con
nect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume test4-client-0
  2:     type protocol/client
  3:     option remote-host centos2
  4:     option remote-subvolume /var/spool/gluster/test4a
  5:     option transport-type tcp
  6: end-volume
  7: 
  8: volume test4-client-1
  9:     type protocol/client
 10:     option remote-host centos2
 11:     option remote-subvolume /var/spool/gluster/test4b
 12:     option transport-type tcp
 13: end-volume
 14: 
 15: volume test4-replicate-0
 16:     type cluster/replicate
 17:     subvolumes test4-client-0 test4-client-1
 18: end-volume
 19: 
 20: volume test4-write-behind
 21:     type performance/write-behind
 22:     subvolumes test4-replicate-0
 23: end-volume
 24: 
 25: volume test4-read-ahead
 26:     type performance/read-ahead
 27:     subvolumes test4-write-behind
 28: end-volume
 29: 
 30: volume test4-io-cache
 31:     type performance/io-cache
 32:     subvolumes test4-read-ahead
 33: end-volume
 34: 
 35: volume test4-quick-read
 36:     type performance/quick-read
 37:     subvolumes test4-io-cache
 38: end-volume
 39: 
 40: volume test4-stat-prefetch
 41:     type performance/stat-prefetch
 42:     subvolumes test4-quick-read
 43: end-volume
 44: 
 45: volume test4
 46:     type debug/io-stats
 47:     option latency-measurement off
 48:     option count-fop-hits off
 49:     subvolumes test4-stat-prefetch
 50: end-volume

+------------------------------------------------------------------------------+
[2011-09-28 17:02:45.247241] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-test4-client-1: changing port to 24028 (from 0)
[2011-09-28 17:02:45.247397] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-test4-client-0: changing port to 24027 (from 0)
[2011-09-28 17:02:49.144295] I [client-handshake.c:1082:select_server_supported_programs] 0-test4-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-09-28 17:02:49.144949] I [client-handshake.c:913:client_setvolume_cbk] 0-test4-client-1: Connected to 10.0.0.136:24028, attached to remote volume '/var/spool/gluster/test4b'.
[2011-09-28 17:02:49.144980] I [afr-common.c:2611:afr_notify] 0-test4-replicate-0: Subvolume 'test4-client-1' came back up; going online.
[2011-09-28 17:02:49.150072] I [client-handshake.c:1082:select_server_supported_programs] 0-test4-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-09-28 17:02:49.150460] I [client-handshake.c:913:client_setvolume_cbk] 0-test4-client-0: Connected to 10.0.0.136:24027, attached to remote volume '/var/spool/gluster/test4a'.
[2011-09-28 17:02:49.706624] I [fuse-bridge.c:3336:fuse_graph_setup] 0-fuse: switched to graph 0
[2011-09-28 17:02:49.872812] I [fuse-bridge.c:2924:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10
[2011-09-28 17:02:49.874888] I [afr-common.c:912:afr_fresh_lookup_cbk] 0-test4-replicate-0: added root inode
[2011-09-28 17:02:50.110079] I [afr-dir-read.c:174:afr_examine_dir_readdir_cbk] 0-test4-replicate-0:  entry self-heal triggered. path: /, reason: checksums of directory differ, forced merge option set
[2011-09-28 17:02:50.144930] W [afr-common.c:122:afr_set_split_brain] (-->/usr/lib64/glusterfs/3.2.3/xlator/cluster/replicate.so [0x2aaaab5fd113] (-->/usr/lib64/glusterfs/3.2.3/xlator/cluster/replicate.so(afr_sh_entry_done+0x46) [0x2aaaab5f6646] (-->/usr/lib64/glusterfs/3.2.3/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0x246) [0x2aaaab5efac6]))) 0-test4-replicate-0: invalid argument: inode
[2011-09-28 17:02:50.144981] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-test4-replicate-0: background  entry entry self-heal completed on /
[2011-09-28 17:02:50.185364] I [afr-common.c:649:afr_lookup_self_heal_check] 0-test4-replicate-0: size differs for /bar 
[2011-09-28 17:02:50.185405] I [afr-common.c:811:afr_lookup_done] 0-test4-replicate-0: background  meta-data data self-heal triggered. path: /bar
[2011-09-28 17:02:50.191267] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-test4-replicate-0: background  meta-data data self-heal completed on /bar
[2011-09-28 17:02:50.211964] I [afr-common.c:649:afr_lookup_self_heal_check] 0-test4-replicate-0: size differs for /baz 
[2011-09-28 17:02:50.212004] I [afr-common.c:811:afr_lookup_done] 0-test4-replicate-0: background  data self-heal triggered. path: /baz
[2011-09-28 17:02:50.213701] I [afr-self-heal-common.c:1557:afr_self_heal_completion_cbk] 0-test4-replicate-0: background  data data self-heal completed on /baz

Comment 1 Joe Julian 2011-09-29 00:12:56 UTC
Create a volume test1:
gluster volume create replica 2 test1 server1:/data/test1 server2:/data/test1

Mount the volume:
mount -t glusterfs server1:test1 /mnt/test1

Create a file:
cd /mnt/test1
echo asdf > foo

Create hardlinks:
ln foo bar
ln foo baz

ls -li
5767174 -rw-r--r-- 3 root root    5 Sep 28 16:59 bar
5767174 -rw-r--r-- 3 root root    5 Sep 28 16:59 baz
5767174 -rw-r--r-- 3 root root    5 Sep 28 16:59 foo

Unmount and stop everything:
umount /mnt/test1
(on both servers)
service glusterd stop
service glusterfsd stop

Wipe out a share directory to simulate a drive replacement:
(server2) rm -rf /data/test1
mkdir /data/test1

Start the server, mount and stat the files to start a self-heal:
(on both servers) service glusterd start
(one machine)
mount -t glusterfs server1:test1 /mnt/test1
cd /mnt/test1
stat *
ls -l 
total 16
-rw-r--r-- 1 root root 0 Sep 28 17:02 bar
-rw-r--r-- 1 root root 0 Sep 28 17:02 baz
-rw-r--r-- 1 root root 5 Sep 28 16:59 foo
(note the 0-sized files)

On the backend, server1:
getfattr -m . -d -e hex *
# file: bar
trusted.afr.test1-client-0=0x000000000000000000000000
trusted.afr.test1-client-1=0x000000000000000000000000
trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567

# file: baz
trusted.afr.test1-client-0=0x000000000000000000000000
trusted.afr.test1-client-1=0x000000000000000000000000
trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567

# file: foo
trusted.afr.test1-client-0=0x000000000000000000000000
trusted.afr.test1-client-1=0x000000000000000000000000
trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567

ls -li
total 24
2883587 -rw-r--r-- 3 root root 5 Sep 28 16:59 bar
2883587 -rw-r--r-- 3 root root 5 Sep 28 16:59 baz
2883587 -rw-r--r-- 3 root root 5 Sep 28 16:59 foo

On the backend, server2:
getfattr -m . -d -e hex *
# file: bar
trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567

# file: baz
trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567

# file: foo
trusted.afr.test1-client-0=0x000000000000000000000000
trusted.afr.test1-client-1=0x000000000000000000000000
trusted.gfid=0x8f3a01d1ee1c4dbe8f851a43f8b19567

ls -li
total 16
2883588 -rw-r--r-- 1 root root 0 Sep 28 17:02 bar
2883589 -rw-r--r-- 1 root root 0 Sep 28 17:02 baz
2883590 -rw-r--r-- 1 root root 5 Sep 28 16:59 foo

Comment 2 Joe Julian 2011-10-10 03:43:21 UTC
Really? This can easily result in data loss.

Comment 3 Vijay Bellur 2011-10-10 08:31:06 UTC
(In reply to comment #2)
> Really? This can easily result in data loss.

It is a P1 enhancement as the code changes involved are not trivial.

Comment 4 Joe Julian 2011-10-10 13:04:01 UTC
My idea was to use the sticky-bit pointers to simulate hardlinks. The actual file would still need some sort of pointer back to the sticky, then, probably an extended attribute.

Renames would have to go back to all the pointers and update them to the new filename. Deletes would probably just trigger a rename to one of the stickies, which would then have to update all the stickies to the new filename.

Comment 5 Anand Avati 2011-10-10 13:33:12 UTC
(In reply to comment #4)
> My idea was to use the sticky-bit pointers to simulate hardlinks. The actual
> file would still need some sort of pointer back to the sticky, then, probably
> an extended attribute.
> 
> Renames would have to go back to all the pointers and update them to the new
> filename. Deletes would probably just trigger a rename to one of the stickies,
> which would then have to update all the stickies to the new filename.

We're introducing a solid framework (gfid filehandles) to address hardlinks and rename self-heals in 3.4. Some of the framework code can be found in https://github.com/avati/glusterfs/commits/iops. It is best to await this "right" fix in 3.4 than kludgy patchwork.

Avati

Comment 6 Pranith Kumar K 2011-12-20 05:25:18 UTC
*** Bug 764393 has been marked as a duplicate of this bug. ***

Comment 7 Anand Avati 2012-03-01 15:51:26 UTC
CHANGE: http://review.gluster.com/2841 (cluster/afr: Hardlink Self-heal) merged in master by Vijay Bellur (vijay)

Comment 8 Anand Avati 2012-03-31 14:41:20 UTC
CHANGE: http://review.gluster.com/3056 (cluster/afr: Fix frame leak in hardlink self-heal) merged in master by Vijay Bellur (vijay)

Comment 9 purpleidea 2012-04-30 05:26:49 UTC
Could someone please clarify the status of hardlinks working correctly in the latest stable gluster?
Can these be used properly? Is there a workaround in case of node failures?
Would be much appreciated.
James

Comment 10 Shwetha Panduranga 2012-05-29 07:14:30 UTC
Verified the bug on 3.3.0qa43. Bug is fixed.

Comment 11 Johnny Hughes 2012-07-08 11:28:46 UTC
is this change, 3.3.0qa43, part of the released 3.3.0-1 standard RPMS or is it to be rolled into a future version?

Comment 12 Pranith Kumar K 2012-07-09 05:37:06 UTC
It is part of 3.3.0


Note You need to log in before you can comment on or make changes to this bug.