Bug 1296208

Summary: Geo-Replication Session goes "FAULTY" when application logs rolled on master
Product: [Community] GlusterFS Reporter: Milind Changire <mchangir>
Component: geo-replicationAssignee: Milind Changire <mchangir>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.7.7CC: avishwan, bugs, byarlaga, chrisw, csaba, hamiller, khiremat, mchangir, nlevinki, sankarshan, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1264986 Environment:
Last Closed: 2016-03-22 08:15:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1264986    
Bug Blocks: 1296206, 1309567    

Comment 2 Vijay Bellur 2016-03-02 04:50:12 UTC
REVIEW: http://review.gluster.org/13571 (georep: avoid creating multiple entries with same gfid) posted (#1) for review on release-3.7 by Milind Changire (mchangir)

Comment 3 Kotresh HR 2016-03-02 05:16:51 UTC
Description of problem: 
When application rolls logs on Master, the session goes FAULTY.

Investigation revealed that there is an issue with CREATE + RENAME log replay from geo-rep.

Comment 4 Vijay Bellur 2016-03-08 16:35:03 UTC
COMMIT: http://review.gluster.org/13571 committed in release-3.7 by Vijay Bellur (vbellur) 
------
commit 16f42cdef539d5c63784f989af9ae877a94d72e7
Author: Milind Changire <mchangir>
Date:   Fri Jan 29 13:53:07 2016 +0530

    georep: avoid creating multiple entries with same gfid
    
    Problem:
    CREATE + RENAME changelogs replayed by geo-replication cause
    stale old-name entries with same gfid on slave nodes.
    A gfid is a unique key in the file-system and should not be
    assigned to multiple entries.
    
    Solution:
    Create entry on slave only if lstat(gfid) at aux-mount fails.
    This applies to files as well as directories.
    
    Change-Id: Ice3340f4ae1251c2dcef024a2388c4d33b5d4919
    BUG: 1296208
    Signed-off-by: Milind Changire <mchangir>
    Reviewed-on: http://review.gluster.org/13316
    Smoke: Gluster Build System <jenkins.com>
    Reviewed-by: Kotresh HR <khiremat>
    Reviewed-by: Aravinda VK <avishwan>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    (cherry picked from commit 87d93fac9fcc4b258b7eb432ac4151cdd043534f)
    Reviewed-on: http://review.gluster.org/13571

Comment 5 Vijay Bellur 2016-03-08 19:40:00 UTC
Has any performance characterization been done to ascertain the percentage of creates being affected due to the additional stat()?

Comment 6 Milind Changire 2016-03-09 04:42:19 UTC
No performance characterization tests have been done specifically.
However, the lstat() is done for _every_ entry creation i.e. 100% of the time, since there's no way to identify if the logs are being played for the first time or after a georep restart to conditionally lstat()

Comment 7 Kaushal 2016-04-19 07:21:07 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.

glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user