+++ This bug was initially created as a clone of Bug #1118591 +++ Description of problem: I just upgraded the glusterfs nodes and post upgrade mounted the volume to a nfs client , executed iozone on the mount-point and iozone finished properly, but after some time I am finding that the brick processes have crashed though I enabled quota after iozone operation with this backtrace, pending frames: frame : type(0) op(0) frame : type(0) op(1) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(40) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2014-07-06 22:05:46 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.6.0.24 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7f16eb4d1e56] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7f16eb4ec28f] /lib64/libc.so.6[0x3f4fa329a0] /lib64/libc.so.6[0x3f4fa81461] /usr/lib64/glusterfs/3.6.0.24/xlator/features/marker.so(mq_loc_fill_from_name+0xa1)[0x7f16dbdf2651] /usr/lib64/glusterfs/3.6.0.24/xlator/features/marker.so(mq_readdir_cbk+0x2bf)[0x7f16dbdf628f] /usr/lib64/libglusterfs.so.0(default_readdir_cbk+0xc2)[0x7f16eb4de0b2] /usr/lib64/libglusterfs.so.0(default_readdir_cbk+0xc2)[0x7f16eb4de0b2] /usr/lib64/glusterfs/3.6.0.24/xlator/features/access-control.so(posix_acl_readdir_cbk+0xc2)[0x7f16e0a17432] /usr/lib64/glusterfs/3.6.0.24/xlator/storage/posix.so(posix_do_readdir+0x1b8)[0x7f16e0e4f3c8] /usr/lib64/glusterfs/3.6.0.24/xlator/storage/posix.so(posix_readdir+0x13)[0x7f16e0e4f603] /usr/lib64/libglusterfs.so.0(default_readdir+0x83)[0x7f16eb4d7013] /usr/lib64/glusterfs/3.6.0.24/xlator/features/access-control.so(posix_acl_readdir+0x22d)[0x7f16e0a1991d] /usr/lib64/libglusterfs.so.0(default_readdir+0x83)[0x7f16eb4d7013] /usr/lib64/libglusterfs.so.0(default_readdir_resume+0x142)[0x7f16eb4d9a02] /usr/lib64/libglusterfs.so.0(call_resume+0x1b1)[0x7f16eb4f3631] /usr/lib64/glusterfs/3.6.0.24/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f16e05f6348] /lib64/libpthread.so.0[0x3f502079d1] /lib64/libc.so.6(clone+0x6d)[0x3f4fae8b5d] --------- gluster volume info [root@nfs1 ~]# gluster volume info dist-rep Volume Name: dist-rep Type: Distributed-Replicate Volume ID: 07f5f58d-83e3-4591-ba7f-e2473153e220 Status: Started Snap Volume: no Number of Bricks: 7 x 2 = 14 Transport-type: tcp Bricks: Brick1: 10.70.37.62:/bricks/d1r1 Brick2: 10.70.37.215:/bricks/d1r2 Brick3: 10.70.37.44:/bricks/d2r1 Brick4: 10.70.37.201:/bricks/dr2r2 Brick5: 10.70.37.62:/bricks/d3r1 Brick6: 10.70.37.215:/bricks/d3r2 Brick7: 10.70.37.44:/bricks/d4r1 Brick8: 10.70.37.201:/bricks/dr4r2 Brick9: 10.70.37.62:/bricks/d5r1 Brick10: 10.70.37.215:/bricks/d5r2 Brick11: 10.70.37.44:/bricks/d6r1 Brick12: 10.70.37.201:/bricks/dr6r2 Brick13: 10.70.37.62:/bricks/d1r1-add Brick14: 10.70.37.215:/bricks/d1r2-add Options Reconfigured: nfs-ganesha.enable: off nfs-ganesha.host: 10.70.37.44 nfs.disable: off performance.readdir-ahead: on features.quota: on features.quota-deem-statfs: off snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable How reproducible: crash seen once till now but for all bricks Expected results: crash is unexpected Additional info: --- Additional comment from Saurabh on 2014-07-07 05:22:27 EDT --- (gdb) bt #0 0x0000003f4fa81461 in __strlen_sse2 () from /lib64/libc.so.6 #1 0x00007f16dbdf2651 in mq_loc_fill_from_name (this=0xb8be10, newloc=0x7f16bf9f89a0, oldloc=0xbad66c, ino=<value optimized out>, name=0x7f169804d938 "appletalk") at marker-quota.c:176 #2 0x00007f16dbdf628f in mq_readdir_cbk (frame=0x7f16ea14bba8, cookie=<value optimized out>, this=0xb8be10, op_ret=<value optimized out>, op_errno=<value optimized out>, entries=0x7f16bf9f8bb0, xdata=0x0) at marker-quota.c:609 #3 0x00007f16eb4de0b2 in default_readdir_cbk (frame=0x7f16ea3274e4, cookie=<value optimized out>, this=<value optimized out>, op_ret=23, op_errno=0, entries=<value optimized out>, xdata=0x0) at defaults.c:1225 #4 0x00007f16eb4de0b2 in default_readdir_cbk (frame=0x7f16ea323c74, cookie=<value optimized out>, this=<value optimized out>, op_ret=23, op_errno=0, entries=<value optimized out>, xdata=0x0) at defaults.c:1225 #5 0x00007f16e0a17432 in posix_acl_readdir_cbk (frame=0x7f16ea31d700, cookie=<value optimized out>, this=<value optimized out>, op_ret=23, op_errno=0, entries=<value optimized out>, xdata=0x0) at posix-acl.c:1486 #6 0x00007f16e0e4f3c8 in posix_do_readdir (frame=0x7f16ea3276e8, this=<value optimized out>, fd=<value optimized out>, size=<value optimized out>, off=23, whichop=28, dict=0x0) at posix.c:4946 #7 0x00007f16e0e4f603 in posix_readdir (frame=<value optimized out>, this=<value optimized out>, fd=<value optimized out>, size=<value optimized out>, off=<value optimized out>, xdata=<value optimized out>) at posix.c:4958 #8 0x00007f16eb4d7013 in default_readdir (frame=0x7f16ea3276e8, this=0xb83070, fd=0xbcecb0, size=4096, off=<value optimized out>, xdata=<value optimized out>) at defaults.c:2067 #9 0x00007f16e0a1991d in posix_acl_readdir (frame=0x7f16ea31d700, this=0xb85ea0, fd=0xbcecb0, size=4096, offset=0, xdata=0x0) at posix-acl.c:1500 #10 0x00007f16eb4d7013 in default_readdir (frame=0x7f16ea31d700, this=0xb87130, fd=0xbcecb0, size=4096, off=<value optimized out>, xdata=<value optimized out>) at defaults.c:2067 #11 0x00007f16eb4d9a02 in default_readdir_resume (frame=0x7f16ea323c74, this=0xb88350, fd=0xbcecb0, size=4096, off=0, xdata=0x0) at defaults.c:1635 #12 0x00007f16eb4f3631 in call_resume_wind (stub=0x7f16e9dc1f38) at call-stub.c:2492 #13 call_resume (stub=0x7f16e9dc1f38) at call-stub.c:2841 #14 0x00007f16e05f6348 in iot_worker (data=0xbba080) at io-threads.c:214 #15 0x0000003f502079d1 in start_thread () from /lib64/libpthread.so.0 #16 0x0000003f4fae8b5d in clone () from /lib64/libc.so.6 further trace of bt, (gdb) f 1 #1 0x00007f16dbdf2651 in mq_loc_fill_from_name (this=0xb8be10, newloc=0x7f16bf9f89a0, oldloc=0xbad66c, ino=<value optimized out>, name=0x7f169804d938 "appletalk") at marker-quota.c:176 176 len = strlen (oldloc->path); (gdb) list 171 } 172 173 newloc->parent = inode_ref (oldloc->inode); 174 uuid_copy (newloc->pargfid, oldloc->inode->gfid); 175 176 len = strlen (oldloc->path); 177 178 if (oldloc->path [len - 1] == '/') 179 ret = gf_asprintf ((char **) &path, "%s%s", 180 oldloc->path, name); (gdb) p oldloc $1 = (loc_t *) 0xbad66c (gdb) p *$ $2 = {path = 0x0, name = 0x0, inode = 0x7f16d91760b4, parent = 0x7f16d90f4be0, gfid = "0\367H\216\361QF3\237\314\335\026\327\t\"p", pargfid = "\037\062b<X\031Ej\232\035\000\346y\303\037\017"} (gdb) --- Additional comment from Niels de Vos on 2014-07-13 07:52:29 EDT --- http://review.gluster.org/8296 has been POSTed, but against a bug in the Red Hat Storage product. Please repost against this bug. --- Additional comment from Anand Avati on 2014-07-14 02:38:05 EDT --- REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before sending the control to healing) posted (#2) for review on master by Varun Shastry (vshastry) --- Additional comment from Anand Avati on 2014-07-14 06:05:26 EDT --- REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before sending the control to healing) posted (#3) for review on master by Varun Shastry (vshastry) --- Additional comment from Anand Avati on 2014-07-15 02:45:25 EDT --- REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before sending the control to healing) posted (#4) for review on master by Varun Shastry (vshastry) --- Additional comment from Anand Avati on 2014-07-21 07:26:14 EDT --- REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before sending the control to healing) posted (#5) for review on master by Varun Shastry (vshastry) --- Additional comment from Anand Avati on 2014-07-22 11:56:59 EDT --- COMMIT: http://review.gluster.org/8296 committed in master by Raghavendra G (rgowdapp) ------ commit 56ffb164743449897f1cdecd3dbe085a0f0a66d7 Author: Varun Shastry <vshastry> Date: Wed Jul 9 15:16:00 2014 +0530 features/marker: Fill loc->path before sending the control to healing Problem: The xattr healing part of the marker requires path to be present in the loc. Currently path is not filled while triggering from the readdirp_cbk. Solution: Current patch tries to fill the loc with path. Change-Id: I5c7dc9de60fa79ca0fe9b58d2636fd1355add0d3 BUG: 1118591 Signed-off-by: Varun Shastry <vshastry> Reviewed-on: http://review.gluster.org/8296 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Tested-by: Raghavendra G <rgowdapp>
REVIEW: http://review.gluster.org/8778 (features/marker: Fill loc->path before sending the control to healing) posted (#1) for review on release-3.5 by Krutika Dhananjay (kdhananj)
COMMIT: http://review.gluster.org/8778 committed in release-3.5 by Niels de Vos (ndevos) ------ commit da1657d6841e6bd74074f5a60ed29cf3a97fbff4 Author: Varun Shastry <vshastry> Date: Wed Jul 9 15:16:00 2014 +0530 features/marker: Fill loc->path before sending the control to healing Backport of: http://review.gluster.org/8296 Problem: The xattr healing part of the marker requires path to be present in the loc. Currently path is not filled while triggering from the readdirp_cbk. Solution: Current patch tries to fill the loc with path. Change-Id: Icc16c740bc6453714306eae19526e18c1775c1d8 BUG: 1144315 Signed-off-by: Varun Shastry <vshastry> Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/8778 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp>
*** Bug 1119827 has been marked as a duplicate of this bug. ***
The first (and last?) Beta for GlusterFS 3.5.3 has been released [1]. Please verify if the release solves this bug report for you. In case the glusterfs-3.5.3beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-October/018990.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
The second Beta for GlusterFS 3.5.3 has been released [1]. Please verify if the release solves this bug report for you. In case the glusterfs-3.5.3beta2 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions have been made available on [2] to make testing easier. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019359.html [2] http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.5.3beta2/
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.3, please reopen this bug report. glusterfs-3.5.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/announce/2014-November/000042.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
*** Bug 1203433 has been marked as a duplicate of this bug. ***