A situation can occur in which self-heal might not fix the extended attributes of files. Consider two subvols of replicate A and B. A is full of data and B is empty (new disk). Self-heal is triggered on a file say /foo/bar.txt Before self-heal on bar.txt gets triggered, an opendir might happen on /foo, which in turn triggers the opendir self-heal (does readdir on both subvols and compares checksums). This will do a force-merge of both directories, and will create the entry bar.txt (impunge). However, this does not copy over the extended attributes of bar.txt. Now a lookup on bar.txt happens, but this lookup finds nothing wrong with the file. The changelog is absent on B ("ignorant state") and all zeroes on A ("innocent"). Lookup will compare metadata (perms, owner, size) and find them to be in sync, and does not compare xattrs. So it decides that metadata self-heal is not needed, and the xattrs never get synced. This is particularly a problem if the file in question happens to be a DHT link file. The "linkto" attribute does not get copied. Possible fixes: #1 sync xattrs in the impunge code path. This is rather clumsy, as the impunge code is already dealing quite complicated as it has to do readdir and create entries one by one. #2 Gather all xattrs in lookup (not just afr xattrs) and compare them, just like we compare the perms and ownership now, and trigger metadata self-heal if there is a mismatch.
dht does not get confused if linkto is missing.
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.
Please update the status of this bug as its been more than 6months since its filed (bug id < 2000) Please resolve it with proper resolution if its not valid anymore. If its still valid and not critical, move it to 'enhancement' severity.
Script to reproduce the bug: The xattr wont be self-healed if you run this script: #!/usr/bin/bash -x rm -rf /home/repl1 /home/repl2 mkdir -p /home/repl1/foo touch /home/repl1/foo/bar.txt setfattr -n trusted.gfid -v 0sspQx+7a+RvKln/FSSBx9NA== /home/repl1/foo/bar.txt setfattr -n trusted.762680 -v "fixed" /home/repl1/foo/bar.txt glusterd gluster volume create repl replica 2 `hostname`:/home/repl1 `hostname`:/home/repl2 --mode=script gluster volume start repl valgrind --leak-check=full --log-file=/etc/glusterd/valgrind.log glusterfs -s `hostname` --volfile-id repl /mnt/client cd /mnt/client ls foo/bar.txt getfattr -d -m . /home/repl?/foo/bar.txt
CHANGE: http://review.gluster.com/2503 (cluster/afr: Perform xattrop with all afr-keys) merged in master by Vijay Bellur (vijay)
Test case is added to afr test plan. Executed the test case on 3.3.0qa44 . works fine.
Output of the test case executed on 3.3.0qa44 Node1:- ---------- [05/30/12 - 17:02:05 root@APP-SERVER1 ~]# mkdir /export_sdb/dir1/ [05/30/12 - 17:02:09 root@APP-SERVER1 ~]# mkdir /export_sdb/dir1/foo [05/30/12 - 17:02:11 root@APP-SERVER1 ~]# touch /export_sdb/dir1/foo/bar.txt [05/30/12 - 17:02:24 root@APP-SERVER1 ~]# setfattr -n trusted.gfid -v 0sspQx+7a+RvKln/FSSBx9NA== /export_sdb/dir1/foo/bar.txt [05/30/12 - 17:02:57 root@APP-SERVER1 ~]# setfattr -n trusted.762680 -v "fixed" /export_sdb/dir1/foo/bar.txt [05/30/12 - 17:03:10 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/foo/bar.txt getfattr: Removing leading '/' from absolute path names # file: export_sdb/dir1/foo/bar.txt trusted.762680=0x6669786564 trusted.gfid=0xb29431fbb6be46f2a59ff152481c7d34 [05/30/12 - 17:03:27 root@APP-SERVER1 ~]# glusterd [05/30/12 - 17:03:35 root@APP-SERVER1 ~]# gluster peer probe 192.168.2.36 Probe successful [05/30/12 - 17:03:50 root@APP-SERVER1 ~]# gluster peer status Number of Peers: 1 Hostname: 192.168.2.36 Uuid: 0148141b-2366-4228-92bc-d673b800959f State: Peer in Cluster (Connected) [05/30/12 - 17:03:54 root@APP-SERVER1 ~]# gluster v create dstore replica 2 192.168.2.35:/export_sdb/dir1/ 192.168.2.36:/export_sdb/dir1/ Creation of volume dstore has been successful. Please start the volume to access data. [05/30/12 - 17:05:30 root@APP-SERVER1 ~]# gluster v set dstore "self-heal-daemon" off Set volume successful [05/30/12 - 17:05:44 root@APP-SERVER1 ~]# gluster v info Volume Name: dstore Type: Replicate Volume ID: 3b898c3c-f219-4ebc-9831-cd7e8407861e Status: Created Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 192.168.2.35:/export_sdb/dir1 Brick2: 192.168.2.36:/export_sdb/dir1 Options Reconfigured: cluster.self-heal-daemon: off [05/30/12 - 17:08:34 root@APP-SERVER1 ~]# gluster v start dstore Starting volume dstore has been successful [05/30/12 - 17:08:40 root@APP-SERVER1 ~]# gluster v status Status of volume: dstore Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.2.35:/export_sdb/dir1 24009 Y 30225 Brick 192.168.2.36:/export_sdb/dir1 24009 Y 4330 NFS Server on localhost 38467 Y 30231 NFS Server on 192.168.2.36 38467 Y 4336 Mount Output:- ---------------- [05/30/12 - 17:08:18 root@APP-CLIENT1 ~]# mount -t glusterfs 192.168.2.35:/dstore /mnt/gfsc1 [05/30/12 - 17:08:48 root@APP-CLIENT1 ~]# cd /mnt/gfsc1 [05/30/12 - 17:09:01 root@APP-CLIENT1 gfsc1]# ls foo/bar.txt foo/bar.txt [05/30/12 - 17:09:11 root@APP-CLIENT1 gfsc1]# ls foo [05/30/12 - 17:09:31 root@APP-CLIENT1 gfsc1]# ls foo bar.txt Node1:- ---------- [05/30/12 - 17:08:42 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/foo/bar.txt getfattr: Removing leading '/' from absolute path names # file: export_sdb/dir1/foo/bar.txt trusted.762680=0x6669786564 trusted.afr.dstore-client-0=0x000000000000000000000000 trusted.afr.dstore-client-1=0x000000000000000000000000 trusted.gfid=0xb29431fbb6be46f2a59ff152481c7d34 [05/30/12 - 17:09:21 root@APP-SERVER1 ~]# getfattr -d -m . -e hex /export_sdb/dir1/foo/ getfattr: Removing leading '/' from absolute path names # file: export_sdb/dir1/foo/ trusted.afr.dstore-client-0=0x000000000000000000000000 trusted.afr.dstore-client-1=0x000000000000000000000000 trusted.gfid=0x94e2519ccb7640d687aeccac28d7100d Node2:- ---- [05/30/12 - 17:03:38 root@APP-SERVER2 ~]# getfattr -d -m . -e hex /export_sdb/dir1/foo/bar.txt getfattr: Removing leading '/' from absolute path names # file: export_sdb/dir1/foo/bar.txt trusted.762680=0x6669786564 trusted.afr.dstore-client-0=0x000000000000000000000000 trusted.afr.dstore-client-1=0x000000000000000000000000 trusted.gfid=0xb29431fbb6be46f2a59ff152481c7d34 [05/30/12 - 17:09:27 root@APP-SERVER2 ~]# getfattr -d -m . -e hex /export_sdb/dir1/foo/ getfattr: Removing leading '/' from absolute path names # file: export_sdb/dir1/foo/ trusted.afr.dstore-client-0=0x000000000000000000000000 trusted.afr.dstore-client-1=0x000000000000000000000000 trusted.gfid=0x94e2519ccb7640d687aeccac28d7100d
*** Bug 811244 has been marked as a duplicate of this bug. ***