Bug 1500433

Summary: [geo-rep]: RSYNC throwing internal errors
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: avishwan, bugs, csaba, rallan, rhinduja, rhs-bugs, sheggodu, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.13.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1476876
: 1502104 (view as bug list) Environment:
Last Closed: 2017-12-08 17:43:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1476876    
Bug Blocks: 1502104    

Description Kotresh HR 2017-10-10 14:46:49 UTC
+++ This bug was initially created as a clone of Bug #1476876 +++

Description of problem:
=======================
Rsync throwing internal errors with  'rsync: get_xattr_data: lgetxattr"

[2017-07-31  09:46:14.352732] W [master(/rhs/brick3/b16):1067:process] _GMaster:  incomplete sync, retrying changelogs: CHANGELOG.1501494366
[2017-07-31  09:46:15.125684] E [resource(/rhs/brick2/b10):1044:rsync] SSH: SYNC  Error(Rsync): rsync: get_xattr_data:  lgetxattr(""/proc/3840/cwd/.gfid/00000000-0000-0000-0000-000000000001"","trusted.glusterfs.volume-mark.2d516aed-ad11-43cc-8741-32bfc7391b74",0)  failed: No data available (61)
[2017-07-31  09:46:15.126796] E [master(/rhs/brick2/b10):1046:process] _GMaster:  changelogs CHANGELOG.1501494366 could not be processed completely -  moving on...
[2017-07-31  09:46:15.132359] E [resource(/rhs/brick1/b4):1044:rsync] SSH: SYNC  Error(Rsync): rsync: get_xattr_data:  lgetxattr(""/proc/3838/cwd/.gfid/00000000-0000-0000-0000-000000000001"","trusted.glusterfs.volume-mark.2d516aed-ad11-43cc-8741-32bfc7391b74",0)  failed: No data available (61)
[2017-07-31  09:46:15.133415] E [master(/rhs/brick1/b4):1046:process] _GMaster:  changelogs CHANGELOG.1501494366 could not be processed completely -  moving on...
[2017-07-31  09:46:15.158014] W [master(/rhs/brick3/b16):1067:process] _GMaster:  incomplete sync, retrying changelogs: CHANGELOG.1501494366
[2017-07-31  09:46:16.12286] E [resource(/rhs/brick3/b16):1044:rsync] SSH: SYNC  Error(Rsync): rsync: get_xattr_data:  lgetxattr(""/proc/3839/cwd/.gfid/00000000-0000-0000-0000-000000000001"","trusted.glusterfs.volume-mark.2d516aed-ad11-43cc-8741-32bfc7391b74",0)  failed: No data available (61)
[2017-07-31  09:46:16.13156] E [master(/rhs/brick3/b16):1046:process] _GMaster:  changelogs CHANGELOG.1501494366 could not be processed completely -  moving on...
[2017-07-31 09:47:21.598099] I [master(/rhs/brick2/b10):1132:crawl] _GMaster: slave's time: (1501494365, 0)
[2017-07-31 09:47:21.616106] I [master(/rhs/brick1/b4):1132:crawl] _GMaster: slave's time: (1501494365, 0)

Version-Release number of selected component (if applicable):
==============================================================
mainline

Steps to Reproduce:
=====================
1.Create a 6 node master cluster and a 6-node slave cluster
2.Create a 9x2 DR master volume and slave volume
3.Create and start non-root geo-replication session
4. Mount the master and slave volume
5. Create data  on the master mount :

for i in {create,chmod,chown,chgrp,hardlink,symlink,truncate,rename}; do echo "------------------- This iteration is for fop $i -----------------" >> /root/result ; crefi --multi -n 5 -b 10 -d 10 --max=10k --min=5k --random -T 10 -t text --fop=$i /mnt/master/ 1>/dev/null 2>&1 ; sleep 10 ; echo "---Arequal Master for $i---" >> /root/result ; /root/arequal-checksum -p /mnt/master/ >> /root/result ; sleep 600 ;  echo "---Arequal Slave for $i---" >> /root/result ; /root/arequal-checksum -p /mnt/slave/ >> /root/result ; done

All fops are synced (create,chmod,chgrp,chown,hardlink,symlink,truncate,rename) 
 
How reproducible:
==============
Have seen this twice on non-root setup out of 4 trials.

Comment 1 Worker Ant 2017-10-10 14:48:22 UTC
REVIEW: https://review.gluster.org/18479 (geo-rep: Filter out volume-mark xattr) posted (#1) for review on master by Kotresh HR (khiremat)

Comment 2 Worker Ant 2017-10-13 16:26:09 UTC
COMMIT: https://review.gluster.org/18479 committed in master by Jeff Darcy (jeff.us) 
------
commit c64fd0d4b0ef313bb44aae68a376ec0c9ee8657a
Author: Kotresh HR <khiremat>
Date:   Tue Oct 10 10:27:01 2017 -0400

    geo-rep: Filter out volume-mark xattr
    
    The volume-mark xattr, maintained at brick root
    of slave volume is specific to geo-replication
    and should be filtered out for all other clients.
    It should also be filtered out from list getxattr
    from all mounts including geo-rep mount as it
    might cause rsync to read and set.
    
    Change-Id: If9eb5a3af18051083c853e70d93b2819e8eea222
    BUG: 1500433
    Signed-off-by: Kotresh HR <khiremat>

Comment 3 Shyamsundar 2017-12-08 17:43:44 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report.

glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html
[2] https://www.gluster.org/pipermail/gluster-users/