Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 762424 (GLUSTER-692)

Summary:	Erratic behaviour during selfheal on files with future timestamp
Product:	[Community] GlusterFS	Reporter:	Chida <chida>
Component:	replicate	Assignee:	Pavan Vilas Sondur <pavan>
Status:	CLOSED WORKSFORME	QA Contact:
Severity:	high	Docs Contact:
Priority:	low
Version:	3.0.2	CC:	amarts, cory.meyer, gluster-bugs, vijay
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Chida 2010-03-02 14:16:28 UTC

Setup:
* 2 test servers with encrypted test files (10KB ~ couple of 100 MB's)
* Servers are in replicate mode. 
* IP address is assigned via DHCP. 

Server1 loses network connection. When network is restored, self-heal kicks in. Server2 has files witn updated timestamp and selfheal seem to truncate files or move them to different locations.

Comment 1 Vikas Gorur 2010-03-02 15:45:11 UTC

The timestamp of files does not matter at all for self-heal. Self-heal is done solely on the basis of the changelogs. By 'moves them to different locations' do you mean it moves files to directories it shouldn't have?

Can you try to narrow this down to a small, reproducible case?

Comment 2 cory.meyer 2010-03-05 18:58:41 UTC

Here is some additional info which might help as this was seen in a production environment with log-level=NORMAL.  To add to the issue it looks like SELinux was enabled by default on the secondary server though set to permissive on the primary.

/mnt/glusterfs is the primary fuse mount point.   /mnt/glusterfs/maindir is the applications base directory. 

Multiple lines in logs showing errors as such:  
[2010-02-16 16:24:29] E [posix.c:619:posix_setattr] posix1: setattr (lstat) on /mnt/glusterfs/maindir/dirA/dirB/dirC/filename01.bin

For each of the above "errors" referring to (lstat) that file was re-written to 
the posix dir:    /mnt/posix/maindir-dirA-dirB-dirC-filename01.bin


On the the secondary Glusterfd server logs such as this were seen:   
[2010-02-16 10:00:01] E [posix.c:477:posix_lookup] posix1: post-operation lstat on parent of .landfill/maindir-XXXXX-XXXXXXX-XXXX-Home


Since I don't currently have spare hardware running CentOS 5.3 to test with I haven't been able to reproduce in a test environment.  For me it isn't a big issue as this has only been seen once and looks to be mostly caused by out of sync clocks and possibly the Selinux factor.

Comment 3 Vikas Gorur 2010-03-05 19:01:03 UTC

Thanks for the information, Cory. As you mentioned SELinux could be a factor that triggered this bug. We will try to reproduce this.

Comment 4 Vijay Bellur 2010-05-10 02:39:48 UTC

We have been unable to reproduce this so far. Hence pushing target to 3.1.

Comment 5 Vijay Bellur 2010-09-01 07:03:24 UTC

We have been unable to reproduce this.