Bug 1312762

Summary: [geo-rep]: Session goes to faulty with Errno 13: Permission denied
Product: [Community] GlusterFS Reporter: Rahul Hinduja <rhinduja>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.8CC: avishwan, bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1313303 (view as bug list) Environment:
Last Closed: 2016-03-22 08:15:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1309567, 1313303    

Description Rahul Hinduja 2016-02-29 09:01:21 UTC
Description of problem:
=======================

During geo-rep automation testing found the worker died during history crawl with following traceback on slave:

Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 737, in entry_ops
    [ESTALE, EINVAL])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 78, in lsetxattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 13] Permission denied

slave.gluster.log shows following:

[2016-02-28 19:26:49.777548] W [fuse-bridge.c:1282:fuse_err_cbk] 0-glusterfs-fuse: 7: SETXATTR() /.gfid/042c10ee-949c-4eb9-ad07-3c961d03c3e7 => -1 (Permission denied)
[2016-02-28 19:26:49.795919] I [fuse-bridge.c:4986:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-sTOH1n
[2016-02-28 19:26:49.798300] W [glusterfsd.c:1236:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f94b60ccdc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f94b7737905] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f94b7737789] ) 0-: received signum (15), shutting down
[2016-02-28 19:26:49.798347] I [fuse-bridge.c:5685:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-sTOH1n'.
[2016-02-28 19:26:49.832822] W [fuse-bridge.c:1282:fuse_err_cbk] 0-glusterfs-fuse: 7: SETXATTR() /.gfid/9c874be7-978e-418d-8282-1ac15270ff2b => -1 (Permission denied)
[2016-02-28 19:26:49.851560] I [fuse-bridge.c:4986:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-mtH5zL
[2016-02-28 19:26:49.852168] W [glusterfsd.c:1236:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fa17d764dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fa17edcf905] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7fa17edcf789] ) 0-: received signum (15), shutting down
[2016-02-28 19:26:49.852220] I [fuse-bridge.c:5685:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-mtH5zL'.



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.8-1.el7.x86_64


How reproducible:
=================
Happens with combination of fops like chgrp,chown,rename,hardlinks


Steps to Reproduce:
===================
1. Run geo-rep automation test suite with export testcases as:
[geo_rahul@skywalker distaf]$ echo $testcases
changelog-test-create changelog-test-chmod changelog-test-chown changelog-test-chgrp changelog-test-symlink changelog-test-hardlink changelog-test-truncate changelog-test-rename xsync-test-create xsync-test-chmod xsync-test-chown xsync-test-chgrp xsync-test-symlink xsync-test-hardlink xsync-test-truncate history-test-create history-test-chmod history-test-chown history-test-chgrp history-test-symlink history-test-hardlink history-test-truncate history-test-rename history-dynamic-create history-dynamic-chmod history-dynamic-chown history-dynamic-chgrp history-dynamic-symlink history-dynamic-hardlink history-dynamic-truncate history-dynamic-rename
[geo_rahul@skywalker distaf]$ 

Actual results:
===============

[geo_rahul@skywalker distaf]$ time python main.py -d "geo_rep" -t "$testcases"
test_1_changelog-test-create (__main__.gluster_tests) ... ok
test_2_changelog-test-chmod (__main__.gluster_tests) ... ok
test_3_changelog-test-chown (__main__.gluster_tests) ... ok
test_4_changelog-test-chgrp (__main__.gluster_tests) ... ok
test_5_changelog-test-symlink (__main__.gluster_tests) ... ok
test_6_changelog-test-hardlink (__main__.gluster_tests) ... ok
test_7_changelog-test-truncate (__main__.gluster_tests) ... ok
test_8_changelog-test-rename (__main__.gluster_tests) ... ok
test_9_xsync-test-create (__main__.gluster_tests) ... ok
test_10_xsync-test-chmod (__main__.gluster_tests) ... ok
test_11_xsync-test-chown (__main__.gluster_tests) ... ok
test_12_xsync-test-chgrp (__main__.gluster_tests) ... ok
test_13_xsync-test-symlink (__main__.gluster_tests) ... ok
test_14_xsync-test-hardlink (__main__.gluster_tests) ... ok
test_15_xsync-test-truncate (__main__.gluster_tests) ... ok
test_16_history-test-create (__main__.gluster_tests) ... ok
test_17_history-test-chmod (__main__.gluster_tests) ... ok
test_18_history-test-chown (__main__.gluster_tests) ... ok
test_19_history-test-chgrp (__main__.gluster_tests) ... ok
test_20_history-test-symlink (__main__.gluster_tests) ... ok
test_21_history-test-hardlink (__main__.gluster_tests) ... ok
test_22_history-test-truncate (__main__.gluster_tests) ... ok
test_23_history-test-rename (__main__.gluster_tests) ... ok
test_24_history-dynamic-create (__main__.gluster_tests) ... FAIL
test_25_history-dynamic-chmod (__main__.gluster_tests) ... FAIL

Expected results:
=================

Session should not go to faulty and history cases should also be successful

Comment 1 Rahul Hinduja 2016-02-29 09:03:09 UTC
Generic use case to reproduce this issue: 

1. Start Geo-replication, once it reaches to Changelog Crawl, Create
file f1 in Master and let it sync to Slave.
2. Wait for 30 seconds and create one more file and let it sync to Slave.
2. Stop Geo-replication
3. Delete file "f1" from Slave Volume. (Just to simulate Changelog
Replay issue)
4. Change uid/gid in Master and Rename.
5. Start Geo-rep again. You should hit EACCES error.

Comment 2 Vijay Bellur 2016-03-08 11:15:26 UTC
REVIEW: http://review.gluster.org/13643 (geo-rep: Fix Entry Creation issue with non root UID/GID) posted (#1) for review on release-3.7 by Aravinda VK (avishwan)

Comment 3 Vijay Bellur 2016-03-09 11:02:39 UTC
COMMIT: http://review.gluster.org/13643 committed in release-3.7 by Aravinda VK (avishwan) 
------
commit 2199acfd04b1e70fc6484a89196e7b9e4abb7208
Author: Aravinda VK <avishwan>
Date:   Mon Feb 29 14:05:54 2016 +0530

    geo-rep: Fix Entry Creation issue with non root UID/GID
    
    During entry_ops RENAME Geo-rep sends stat info along with the
    recorded info from Changelog. In Slave side if Source file exists
    Geo-rep renames to Target file by calling os.rename. If source file
    does not exists, it tries to create Target file directly using available
    stat info from Master. If UID and GID are different in Master for that
    file then stat info will have different UID/GID during Create. Geo-rep
    gets EACCES when it tries to create a new entry using gfid-access with
    different UID/GID.
    
    With this patch, Entry creation with different UID/GID is split into two
    operations. Create Entry with UID:0 and GID:0 and then set UID/GID.
    
    Change-Id: I4987e3a205d8513c06fa66198cde145a87003a01
    BUG: 1312762
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on:http://review.gluster.org/13542
    Reviewed-on: http://review.gluster.org/13643
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Kotresh HR <khiremat>

Comment 4 Kaushal 2016-04-19 07:20:46 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.

glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user