Red Hat Bugzilla – Bug 1241839
nfs-ganesha: bricks crash while executing acl related operation for named group/user
Last modified: 2016-01-19 01:14:55 EST
Created attachment 1050606 [details] coredump of a brick from nfs12 Description of problem: I tried to execute nfs4_setfacl and nfs4_getfacl operation on directory for a non-root user and found that the bricks have crashed. [root@nfs12 ~]# gluster volume status vol4 Status of volume: vol4 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.8:/rhs/brick1/d1r14 N/A N/A N N/A Brick 10.70.46.27:/rhs/brick1/d1r24 N/A N/A N N/A Brick 10.70.46.25:/rhs/brick1/d2r14 N/A N/A N N/A Brick 10.70.46.29:/rhs/brick1/d2r24 N/A N/A N N/A Brick 10.70.46.8:/rhs/brick1/d3r14 N/A N/A N N/A Brick 10.70.46.27:/rhs/brick1/d3r24 N/A N/A N N/A Brick 10.70.46.25:/rhs/brick1/d4r14 N/A N/A N N/A Brick 10.70.46.29:/rhs/brick1/d4r24 N/A N/A N N/A Brick 10.70.46.8:/rhs/brick1/d5r14 N/A N/A N N/A Brick 10.70.46.27:/rhs/brick1/d5r24 N/A N/A N N/A Brick 10.70.46.25:/rhs/brick1/d6r14 N/A N/A N N/A Brick 10.70.46.29:/rhs/brick1/d6r24 N/A N/A N N/A NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 21916 NFS Server on 10.70.46.8 N/A N/A N N/A Self-heal Daemon on 10.70.46.8 N/A N/A Y 7920 NFS Server on 10.70.46.29 N/A N/A N N/A Self-heal Daemon on 10.70.46.29 N/A N/A Y 8702 NFS Server on 10.70.46.25 N/A N/A N N/A Self-heal Daemon on 10.70.46.25 N/A N/A Y 24895 NFS Server on 10.70.46.39 2049 0 Y 31393 Self-heal Daemon on 10.70.46.39 N/A N/A Y 31402 NFS Server on 10.70.46.22 2049 0 Y 12105 Self-heal Daemon on 10.70.46.22 N/A N/A Y 12113 Task Status of Volume vol4 ------------------------------------------------------------------------------ There are no active volume tasks Version-Release number of selected component (if applicable): glusterfs-3.7.1-8.el6rhs.x86_64 nfs-ganesha-2.2.0-4.el6rhs.x86_64 How reproducible: seen for first time Steps to Reproduce: 1. create a volume of 6x2 type, start it 2. configure nfs-ganesha 3. create a group with a specific id on all RHGS servers and client 4. create a non-root user with a specific id on all RHGS servers and client. keep the group as the one created in step 3. 5. enable acls and mount the volume on client with vers=4 6. create a direcotory on mount point 7. chown the directory with group and user created in step 3 and 4. 8. execute the command, "nfs4_setfacl -a "A::acl_user1@lab.eng.blr.redhat.com:rwx" acl_user1_dir/" followed by, 9. execute the command "nfs4_getfacl acl_user1_dir/" Actual results: step 9, result, [root@rhsauto009 vol4]# nfs4_getfacl acl_user1_dir Invalid filename: acl_user1_dir gluster volume status displays that all the bricks have crashed and coredumped. bt of one of the brick, (gdb) bt #0 0x00007f1479cfd625 in raise () from /lib64/libc.so.6 #1 0x00007f1479cfee05 in abort () from /lib64/libc.so.6 #2 0x00007f1479d3b537 in __libc_message () from /lib64/libc.so.6 #3 0x00007f1479d40f4e in malloc_printerr () from /lib64/libc.so.6 #4 0x00007f1479d43c5d in _int_free () from /lib64/libc.so.6 #5 0x00007f147b358215 in data_destroy (data=0x7f1478731f0c) at dict.c:235 #6 0x00007f147b358c4e in dict_destroy (this=0x7f1478913d8c) at dict.c:564 #7 0x00007f146dc4278e in posix_setxattr (frame=0x7f1478f1aa80, this=<value optimized out>, loc=<value optimized out>, dict=<value optimized out>, flags=0, xdata=0x7f1478913e18) at posix.c:3408 #8 0x00007f147b367d43 in default_setxattr (frame=0x7f1478f1aa80, this=0x7f1468009110, loc=0x7f14789a2cc8, dict=0x7f1478913c74, flags=<value optimized out>, xdata=<value optimized out>) at defaults.c:1777 #9 0x00007f146d1fb0e1 in ctr_setxattr (frame=0x7f1478f1a31c, this=0x7f146800a730, loc=0x7f14789a2cc8, xattr=0x7f1478913c74, flags=0, xdata=0x7f1478913e18) at changetimerecorder.c:1056 #10 0x00007f146cb471dd in changelog_setxattr (frame=0x7f1478f1a270, this=0x7f146800cff0, loc=0x7f14789a2cc8, dict=0x7f1478913c74, flags=0, xdata=0x7f1478913e18) at changelog.c:1475 #11 0x00007f146c71a641 in br_stub_setxattr (frame=0x7f1478f1a270, this=0x7f146800ef00, loc=0x7f14789a2cc8, dict=0x7f1478913c74, flags=0, xdata=0x7f1478913e18) at bit-rot-stub.c:1113 #12 0x00007f146c50f824 in posix_acl_setxattr (frame=<value optimized out>, this=0x7f1468010390, loc=0x7f14789a2cc8, xattr=0x7f1478913c74, flags=0, xdata=0x7f1478913e18) at posix-acl.c:2023 #13 0x00007f147b367d43 in default_setxattr (frame=0x7f1478f1a9d4, this=0x7f1468011720, loc=0x7f14789a2cc8, dict=0x7f1478913c74, flags=<value optimized out>, xdata=<value optimized out>) at defaults.c:1777 #14 0x00007f147b367d43 in default_setxattr (frame=0x7f1478f1a9d4, this=0x7f1468012aa0, loc=0x7f14789a2cc8, dict=0x7f1478913c74, flags=<value optimized out>, xdata=<value optimized out>) at defaults.c:1777 #15 0x00007f147b36c433 in default_setxattr_resume (frame=0x7f1478f1a928, this=0x7f1468013f00, loc=0x7f14789a2cc8, dict=0x7f1478913c74, flags=0, xdata=0x7f1478913e18) at defaults.c:1334 #16 0x00007f147b389580 in call_resume (stub=0x7f14789a2c88) at call-stub.c:2576 #17 0x00007f1467dfb541 in iot_worker (data=0x7f146804f900) at io-threads.c:215 #18 0x00007f147a449a51 in start_thread () from /lib64/libpthread.so.0 #19 0x00007f1479db396d in clone () from /lib64/libc.so.6 bricks logs crash update, pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2015-07-10 07:23:48 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.1 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7f147b35e826] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7f147b37e3ef] /lib64/libc.so.6(+0x3d0c2326a0)[0x7f1479cfd6a0] /lib64/libc.so.6(gsignal+0x35)[0x7f1479cfd625] /lib64/libc.so.6(abort+0x175)[0x7f1479cfee05] /lib64/libc.so.6(+0x3d0c270537)[0x7f1479d3b537] /lib64/libc.so.6(+0x3d0c275f4e)[0x7f1479d40f4e] /lib64/libc.so.6(+0x3d0c278c5d)[0x7f1479d43c5d] /usr/lib64/libglusterfs.so.0(data_destroy+0x55)[0x7f147b358215] /usr/lib64/libglusterfs.so.0(dict_destroy+0x3e)[0x7f147b358c4e] /usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so(posix_setxattr+0x31e)[0x7f146dc4278e] /usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x7f147b367d43] /usr/lib64/glusterfs/3.7.1/xlator/features/changetimerecorder.so(ctr_setxattr+0x191)[0x7f146d1fb0e1] /usr/lib64/glusterfs/3.7.1/xlator/features/changelog.so(changelog_setxattr+0x17d)[0x7f146cb471dd] /usr/lib64/glusterfs/3.7.1/xlator/features/bitrot-stub.so(br_stub_setxattr+0x281)[0x7f146c71a641] /usr/lib64/glusterfs/3.7.1/xlator/features/access-control.so(posix_acl_setxattr+0x244)[0x7f146c50f824] /usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x7f147b367d43] /usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x7f147b367d43] /usr/lib64/libglusterfs.so.0(default_setxattr_resume+0x143)[0x7f147b36c433] /usr/lib64/libglusterfs.so.0(call_resume+0x80)[0x7f147b389580] /usr/lib64/glusterfs/3.7.1/xlator/performance/io-threads.so(iot_worker+0x171)[0x7f1467dfb541] /lib64/libpthread.so.0(+0x3d0c607a51)[0x7f147a449a51] /lib64/libc.so.6(clone+0x6d)[0x7f1479db396d] Expected results: 1. nfs4_setfacl should pass and nfs4_getfacl should display the change done. 2. there should not be any crash of the bricks or any other process with acl related operations. Additional info:
The crash is due to double free in posix_setxattr() call. The fix is send in upstream After fixing the crash , I noticed another crash , which explained in https://bugzilla.redhat.com/show_bug.cgi?id=1242046 I am still try to find the root cause of the same
The fix is send in upstream http://review.gluster.org/#/c/11627/
The patch is posted in downstream https://code.engineering.redhat.com/gerrit/#/c/52816/
The patch is merged in downstream https://code.engineering.redhat.com/gerrit/#/c/52816/
post fix of the BZ, [root@rhsauto009 mnt1]# nfs4_setfacl -a "A::acl_user1@lab.eng.blr.redhat.com:rwx" acl_user1_dir1/ [root@rhsauto009 mnt1]# nfs4_getfacl acl_user1_dir1/ A::OWNER@:rwaDxtTcCy A::acl_user1@lab.eng.blr.redhat.com:rwaDxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy [root@rhsauto009 mnt1]# mount | grep mnt1 10.70.44.92:/vol4 on /export/mnt1 type nfs (rw,vers=4,addr=10.70.44.92,clientaddr=10.70.36.239) [root@nfs11 ~]# gluster volume status vol4 Status of volume: vol4 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.8:/rhs/brick1/d1r14 49156 0 Y 16720 Brick 10.70.46.27:/rhs/brick1/d1r24 49155 0 Y 31744 Brick 10.70.46.25:/rhs/brick1/d2r14 49157 0 Y 30081 Brick 10.70.46.29:/rhs/brick1/d2r24 49156 0 Y 22951 Brick 10.70.46.8:/rhs/brick1/d3r14 49157 0 Y 16738 Brick 10.70.46.27:/rhs/brick1/d3r24 49156 0 Y 31762 Brick 10.70.46.25:/rhs/brick1/d4r14 49158 0 Y 30099 Brick 10.70.46.29:/rhs/brick1/d4r24 49157 0 Y 22969 Brick 10.70.46.8:/rhs/brick1/d5r14 49158 0 Y 16756 Brick 10.70.46.27:/rhs/brick1/d5r24 49157 0 Y 31780 Brick 10.70.46.25:/rhs/brick1/d6r14 49159 0 Y 30117 Brick 10.70.46.29:/rhs/brick1/d6r24 49158 0 Y 22987 Self-heal Daemon on localhost N/A N/A Y 10581 Self-heal Daemon on 10.70.46.25 N/A N/A Y 21878 Self-heal Daemon on 10.70.46.22 N/A N/A Y 1465 Self-heal Daemon on 10.70.46.39 N/A N/A Y 20442 Self-heal Daemon on 10.70.46.29 N/A N/A Y 14763 Self-heal Daemon on 10.70.46.27 N/A N/A Y 24236 Task Status of Volume vol4 ------------------------------------------------------------------------------ There are no active volume tasks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html