Bug 1242041 - nfs-ganesha : Multiple setting of nfs4_acl on a same file will cause brick crash
Summary: nfs-ganesha : Multiple setting of nfs4_acl on a same file will cause brick crash
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: access-control
Version: mainline
Hardware: All
OS: All
high
high
Target Milestone: ---
Assignee: Jiffin
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1242044 1242046
TreeView+ depends on / blocked
 
Reported: 2015-07-10 18:25 UTC by Jiffin
Modified: 2016-06-16 13:22 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1242044 (view as bug list)
Environment:
Last Closed: 2016-06-16 13:22:53 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Jiffin 2015-07-10 18:25:29 UTC
Description of problem:

Version-Release number of selected component (if applicable):

mainline
How reproducible:
always

Steps to Reproduce:
1.Create a volume
2.Export the volume using nfs-ganesha 
3.Mount the volume using nfsv4 protocol
4.set nfs4_acl on same file twice. 
nfs4_setfacl -a "A::<user_name>:<perm_set>" <filepath>

Actual results:
The brick got crashed.

Expected results:
It should succeed

Additional info:

back trace of brick process crash

#0  0x00007fd3339ba8e2 in posix_acl_ctx_update (inode=0x7fd33407275c, this=0x7fd334011fc0, buf=0x7fd331391800) at posix-acl.c:766
#1  0x00007fd3339c16d9 in posix_acl_setattr_cbk (frame=0x7fd328000d9c, cookie=0x7fd32800169c, this=0x7fd334011fc0, op_ret=0, op_errno=0, 
    prebuf=0x7fd331391870, postbuf=0x7fd331391800, xdata=0x0) at posix-acl.c:1762
#2  0x00007fd333de9a5d in changelog_setattr_cbk (frame=0x7fd32800169c, cookie=0x7fd3280028ac, this=0x7fd33400ee20, op_ret=0, op_errno=0, 
    preop_stbuf=0x7fd331391870, postop_stbuf=0x7fd331391800, xdata=0x0) at changelog.c:1202
#3  0x00007fd33859acef in ctr_setattr_cbk (frame=0x7fd3280028ac, cookie=0x7fd32800222c, this=0x7fd33400b8f0, op_ret=0, op_errno=0, 
    preop_stbuf=0x7fd331391870, postop_stbuf=0x7fd331391800, xdata=0x0) at changetimerecorder.c:451
#4  0x00007fd338dddc30 in posix_setattr (frame=0x7fd32800222c, this=0x7fd334007820, loc=0x7fd32c007d7c, stbuf=0x7fd32c0082b4, valid=1, xdata=0x0)
    at posix.c:449
#5  0x00007fd3435c51a0 in default_setattr (frame=0x7fd32800222c, this=0x7fd334009f50, loc=0x7fd32c007d7c, stbuf=0x7fd32c0082b4, valid=1, xdata=0x0)
    at defaults.c:2107
#6  0x00007fd33859b2d4 in ctr_setattr (frame=0x7fd3280028ac, this=0x7fd33400b8f0, loc=0x7fd32c007d7c, stbuf=0x7fd32c0082b4, valid=1, xdata=0x0)
    at changetimerecorder.c:484
#7  0x00007fd333de9e2d in changelog_setattr (frame=0x7fd32800169c, this=0x7fd33400ee20, loc=0x7fd32c007d7c, stbuf=0x7fd32c0082b4, valid=1, xdata=0x0)
    at changelog.c:1239
#8  0x00007fd3435c51a0 in default_setattr (frame=0x7fd32800169c, this=0x7fd3340109b0, loc=0x7fd32c007d7c, stbuf=0x7fd32c0082b4, valid=1, xdata=0x0)
    at defaults.c:2107
#9  0x00007fd3339c1a97 in posix_acl_setattr (frame=0x7fd328000d9c, this=0x7fd334011fc0, loc=0x7fd32c007d7c, buf=0x7fd32c0082b4, valid=1, xdata=0x0)
    at posix-acl.c:1784
#10 0x00007fd3435c51a0 in default_setattr (frame=0x7fd328000d9c, this=0x7fd334013400, loc=0x7fd32c007d7c, stbuf=0x7fd32c0082b4, valid=1, xdata=0x0)
    at defaults.c:2107
#11 0x00007fd33358623d in up_setattr (frame=0x7fd328000bac, this=0x7fd334014810, loc=0x7fd32c007d7c, stbuf=0x7fd32c0082b4, valid=1, xdata=0x0)
    at upcall.c:368
#12 0x00007fd3435c238b in default_setattr_resume (frame=0x7fd32c00290c, this=0x7fd334015da0, loc=0x7fd32c007d7c, stbuf=0x7fd32c0082b4, valid=1, 
    xdata=0x0) at defaults.c:1662
#13 0x00007fd3435e2870 in call_resume_wind (stub=0x7fd32c007d3c) at call-stub.c:2252
#14 0x00007fd3435ea7fa in call_resume (stub=0x7fd32c007d3c) at call-stub.c:2571
#15 0x00007fd33337a317 in iot_worker (data=0x7fd334042260) at io-threads.c:210
#16 0x000000315c8079d1 in start_thread () from /lib64/libpthread.so.0
#17 0x000000315c0e886d in clone () from /lib64/libc.so.6

From the bt it is clear that crash is happened in access-control translator's posix_setattr_cbk().

It seems to context of access_control_translator contains invalid entries.
(gdb) p ctx
$1 = (struct posix_acl_ctx *) 0x7fd328003640
(gdb) p *ctx
$2 = {uid = 0, gid = 0, perm = 32768, acl_access = 0x7fd328001850, acl_default = 0x0}
(gdb) p *ctx->acl_access 
$3 = {refcnt = -1379869184, count = 8377310, entries = 0x7fd328001850}

The crash happened when it tried to access invalid memory.

Comment 1 Anand Avati 2015-07-11 11:09:01 UTC
REVIEW: http://review.gluster.org/11633 (access_control : avoid double unrefing of acl variable in its context.) posted (#1) for review on release-3.7 by jiffin tony Thottan (jthottan@redhat.com)

Comment 2 Anand Avati 2015-07-11 17:59:53 UTC
REVIEW: http://review.gluster.org/11632 (access_control : avoid double unrefing of acl variable in its context.) posted (#3) for review on master by jiffin tony Thottan (jthottan@redhat.com)

Comment 3 Anand Avati 2015-07-12 06:08:16 UTC
REVIEW: http://review.gluster.org/11632 (access_control : avoid double unrefing of acl variable in its context.) posted (#4) for review on master by jiffin tony Thottan (jthottan@redhat.com)

Comment 4 Anand Avati 2015-07-12 10:57:18 UTC
COMMIT: http://review.gluster.org/11632 committed in master by Kaleb KEITHLEY (kkeithle@redhat.com) 
------
commit 7f21238bb918a9b6eefcff5d76516a92a9271ae2
Author: Jiffin Tony Thottan <jthottan@redhat.com>
Date:   Sat Jul 11 15:47:18 2015 +0530

    access_control : avoid double unrefing of acl variable in its context.
    
    In handling_other_acl_related_xattr(), acl variable is unrefered twice
    after updating the context of access_control translator.So the acl variable
    stored in the inmemory context will become invalid one. When the variable
    accessed again , it will result in brick crash. This patch fixes the same.
    
    Change-Id: Ib95d2e3d67b0fb20d201244a206379d6261aeb23
    BUG: 1242041
    Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
    Reviewed-on: http://review.gluster.org/11632
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-by: soumya k <skoduri@redhat.com>
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>

Comment 5 Niels de Vos 2015-07-12 20:03:35 UTC
Can this happen when mounting over FUSE (with "acl" mount option) or Gluster/NFS too? If so, could you add a test-case?

Comment 6 Nagaprasad Sathyanarayana 2015-10-25 14:55:14 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 7 Niels de Vos 2016-06-16 13:22:53 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.