Hide Forgot
Setup: * 2 test servers with encrypted test files (10KB ~ couple of 100 MB's) * Servers are in replicate mode. * IP address is assigned via DHCP. Server1 loses network connection. When network is restored, self-heal kicks in. Server2 has files witn updated timestamp and selfheal seem to truncate files or move them to different locations.
The timestamp of files does not matter at all for self-heal. Self-heal is done solely on the basis of the changelogs. By 'moves them to different locations' do you mean it moves files to directories it shouldn't have? Can you try to narrow this down to a small, reproducible case?
Here is some additional info which might help as this was seen in a production environment with log-level=NORMAL. To add to the issue it looks like SELinux was enabled by default on the secondary server though set to permissive on the primary. /mnt/glusterfs is the primary fuse mount point. /mnt/glusterfs/maindir is the applications base directory. Multiple lines in logs showing errors as such: [2010-02-16 16:24:29] E [posix.c:619:posix_setattr] posix1: setattr (lstat) on /mnt/glusterfs/maindir/dirA/dirB/dirC/filename01.bin For each of the above "errors" referring to (lstat) that file was re-written to the posix dir: /mnt/posix/maindir-dirA-dirB-dirC-filename01.bin On the the secondary Glusterfd server logs such as this were seen: [2010-02-16 10:00:01] E [posix.c:477:posix_lookup] posix1: post-operation lstat on parent of .landfill/maindir-XXXXX-XXXXXXX-XXXX-Home Since I don't currently have spare hardware running CentOS 5.3 to test with I haven't been able to reproduce in a test environment. For me it isn't a big issue as this has only been seen once and looks to be mostly caused by out of sync clocks and possibly the Selinux factor.
Thanks for the information, Cory. As you mentioned SELinux could be a factor that triggered this bug. We will try to reproduce this.
We have been unable to reproduce this so far. Hence pushing target to 3.1.
We have been unable to reproduce this.