Description of problem: I was having the fs-sanity getting executed over glusterfs-nfs verions=3, Meanwhile this execution, one the brick process has dumped core, starting off this bz with posix. (gdb) bt #0 0x0000003315e0c380 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007fbfac734d6d in _posix_handle_xattr_keyvalue_pair (d=0x7fbfb6003044, k=0x7fbf5c045310 "trusted.glusterfs.quota.6e6c3d48-fafb-4c3d-a8bb-8cf6240454e1.contri", v=0x7fbfb5e1f7ac, tmp=0x7fbefe9dbb10) at posix.c:4648 #2 0x00007fbfb77623a3 in dict_foreach_match (dict=0x7fbfb6003044, match=0x7fbfb7762320 <dict_match_everything>, match_data=0x0, action=0x7fbfac734d10 <_posix_handle_xattr_keyvalue_pair>, action_data=0x7fbefe9dbb10) at dict.c:1182 #3 0x00007fbfb7762438 in dict_foreach (dict=<value optimized out>, fn=<value optimized out>, data=<value optimized out>) at dict.c:1141 #4 0x00007fbfac7342e5 in do_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa8006880, loc=0x7fbfb608eb5c, fd=<value optimized out>, optype=GF_XATTROP_ADD_ARRAY64, xattr=0x7fbfb6003044) at posix.c:4821 #5 0x00007fbfac7347f1 in posix_xattrop (frame=<value optimized out>, this=<value optimized out>, loc=<value optimized out>, optype=<value optimized out>, xattr=<value optimized out>, xdata=<value optimized out>) at posix.c:4836 #6 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa8009030, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>) at defaults.c:1978 #7 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa800a5f0, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>) at defaults.c:1978 #8 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa800cc00, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>) at defaults.c:1978 #9 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa800eb20, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>) at defaults.c:1978 #10 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa8010060, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>) at defaults.c:1978 #11 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa80113d0, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>) at defaults.c:1978 #12 0x00007fbfb776f663 in default_xattrop (frame=0x7fbfb66077a4, this=0x7fbfa8012750, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>) at defaults.c:1978 #13 0x00007fbfb7772d22 in default_xattrop_resume (frame=0x7fbfb660866c, this=0x7fbfa8013bb0, loc=0x7fbfb608eb5c, flags=GF_XATTROP_ADD_ARRAY64, dict=0x7fbfb6003044, xdata=0x0) at defaults.c:1539 #14 0x00007fbfb778e080 in call_resume (stub=0x7fbfb608eb1c) at call-stub.c:2894 #15 0x00007fbfa7118398 in iot_worker (data=0x7fbfa8052990) at io-threads.c:214 #16 0x0000003315e079d1 in start_thread () from /lib64/libpthread.so.0 #17 0x0000003315ae88fd in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): glusterfs-3.7dev-0.994.gitf522001.el6.x86_64 How reproducible: seen this time Actual results: [root@nfs-rdma1 ~]# gluster volume status Status of volume: vol0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.36.45:/rhs/brick1/d1r1 N/A N/A N 11462 Brick 10.70.36.47:/rhs/brick1/d1r1 49152 0 Y 21534 Brick 10.70.36.45:/rhs/brick1/d2r1 49153 0 Y 11479 Brick 10.70.36.47:/rhs/brick1/d2r2 49153 0 Y 21551 NFS Server on localhost 2049 0 Y 11500 Self-heal Daemon on localhost N/A N/A Y 11507 Quota Daemon on localhost N/A N/A Y 11514 NFS Server on 10.70.36.47 2049 0 Y 21572 Self-heal Daemon on 10.70.36.47 N/A N/A Y 21579 Quota Daemon on 10.70.36.47 N/A N/A Y 21586 Task Status of Volume vol0 ------------------------------------------------------------------------------ There are no active volume tasks [root@nfs-rdma1 ~]# gluster volume info Volume Name: vol0 Type: Distributed-Replicate Volume ID: 7336876c-c9ab-4dfc-8931-98d84d71d05b Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.36.45:/rhs/brick1/d1r1 Brick2: 10.70.36.47:/rhs/brick1/d1r1 Brick3: 10.70.36.45:/rhs/brick1/d2r1 Brick4: 10.70.36.47:/rhs/brick1/d2r2 Options Reconfigured: nfs.disable: off features.quota: on features.quota-deem-statfs: on logs from brick process, patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-04-14 21:04:03 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7dev /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7fbfb7769d26] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7fbfb7785a3f] /lib64/libc.so.6[0x3315a326a0] /lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x3315e0c380] /usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(+0xbd6d)[0x7fbfac734d6d] /usr/lib64/libglusterfs.so.0(dict_foreach_match+0x73)[0x7fbfb77623a3] /usr/lib64/libglusterfs.so.0(dict_foreach+0x18)[0x7fbfb7762438] /usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(do_xattrop+0x135)[0x7fbfac7342e5] /usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(posix_xattrop+0x11)[0x7fbfac7347f1] /usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663] /usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663] /usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663] /usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663] /usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663] /usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663] /usr/lib64/libglusterfs.so.0(default_xattrop+0x83)[0x7fbfb776f663] /usr/lib64/libglusterfs.so.0(default_xattrop_resume+0x142)[0x7fbfb7772d22] /usr/lib64/libglusterfs.so.0(call_resume+0x80)[0x7fbfb778e080] /usr/lib64/glusterfs/3.7dev/xlator/performance/io-threads.so(iot_worker+0x158)[0x7fbfa7118398] /lib64/libpthread.so.0[0x3315e079d1] /lib64/libc.so.6(clone+0x6d)[0x3315ae88fd] --------- (END) Expected results: With fs-sanity in execution brick process crash in unexpected. Additional info:
Created attachment 1014818 [details] coredump of brick process
The most recent change that went into master in posix xlator was http://review.gluster.org/#/c/10180/ (sent by me!), which was merged on April 13th. It's probably what caused the crash. Will investigate. Thanks for the bug report.
Seeing the same crashes on the servers while running BVT.
OK, it has nothing to do with http://review.gluster.org/#/c/10180/. RCA'd it. Will send out a patch soon. The bug is hit with quota due to a race between xattrop and unlink. After quota winds an xattrop fop on a path, if by the time it reaches posix, another client has unlinked it, the brick crashes due to the absence of the gfid link && the absence of a proper NULL-check to handle this case. The fix involves handling the case of missing gfid link on the backend, in posix xlator appropriately and returning failure to the xlator above.
Sent http://review.gluster.org/10999 to identify the code path where this can happen a lot earlier. This is not a fix but a step in identifying how the malformed link got created in the first place. So not moving the bug to POST
http://review.gluster.org/11028 is sent to prevent crash when this issue is hit. Now it will fail with ESTALE instead of crashing. This is still not the complete fix. So not moving bug to POST.
REVIEW: http://review.gluster.org/10999 (storage/posix: Prevent malformed internal link creations) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/10999 (storage/posix: Prevent malformed internal link creations) posted (#3) for review on master by Vijay Bellur (vbellur)
*** Bug 1222942 has been marked as a duplicate of this bug. ***
COMMIT: http://review.gluster.org/11028 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 476d4070dbdb9d73b36bd04f3eb3d6eda84abe73 Author: Pranith Kumar K <pkarampu> Date: Mon Jun 1 13:34:33 2015 +0530 storage/posix: Handle MAKE_INODE_HANDLE failures Change-Id: Ia176ccd4cac82c66ba50e3896fbe72c2da860c20 BUG: 1212110 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/11028 Reviewed-by: Krutika Dhananjay <kdhananj> Tested-by: NetBSD Build System <jenkins.org>
REVIEW: http://review.gluster.org/10999 (storage/posix: Prevent malformed internal link creations) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/10999 (storage/posix: Prevent malformed internal link creations) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user