1118591 – core: all brick processes crash when quota is enabled

Bug 1118591 - core: all brick processes crash when quota is enabled

Summary: core: all brick processes crash when quota is enabled

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Nagaprasad Sathyanarayana
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1116761
Blocks:	1119827 1144315 1145623
TreeView+	depends on / blocked

Reported:	2014-07-11 06:54 UTC by vpshastry
Modified:	2016-02-18 00:20 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.7.0
Clone Of:	1116761
Clones:	1144315 1145623 (view as bug list)
Environment:
Last Closed:	2015-05-14 17:26:16 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description vpshastry 2014-07-11 06:54:46 UTC

Description of problem:
I just upgraded the glusterfs nodes and post upgrade mounted the volume to a nfs client , executed iozone on the mount-point and iozone finished properly, but after some time I am finding that the brick processes have crashed though I enabled quota after iozone operation

with this backtrace,
pending frames:
frame : type(0) op(0)
frame : type(0) op(1)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(40)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2014-07-06 22:05:46
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.24
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7f16eb4d1e56]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7f16eb4ec28f]
/lib64/libc.so.6[0x3f4fa329a0]
/lib64/libc.so.6[0x3f4fa81461]
/usr/lib64/glusterfs/3.6.0.24/xlator/features/marker.so(mq_loc_fill_from_name+0xa1)[0x7f16dbdf2651]
/usr/lib64/glusterfs/3.6.0.24/xlator/features/marker.so(mq_readdir_cbk+0x2bf)[0x7f16dbdf628f]
/usr/lib64/libglusterfs.so.0(default_readdir_cbk+0xc2)[0x7f16eb4de0b2]
/usr/lib64/libglusterfs.so.0(default_readdir_cbk+0xc2)[0x7f16eb4de0b2]
/usr/lib64/glusterfs/3.6.0.24/xlator/features/access-control.so(posix_acl_readdir_cbk+0xc2)[0x7f16e0a17432]
/usr/lib64/glusterfs/3.6.0.24/xlator/storage/posix.so(posix_do_readdir+0x1b8)[0x7f16e0e4f3c8]
/usr/lib64/glusterfs/3.6.0.24/xlator/storage/posix.so(posix_readdir+0x13)[0x7f16e0e4f603]
/usr/lib64/libglusterfs.so.0(default_readdir+0x83)[0x7f16eb4d7013]
/usr/lib64/glusterfs/3.6.0.24/xlator/features/access-control.so(posix_acl_readdir+0x22d)[0x7f16e0a1991d]
/usr/lib64/libglusterfs.so.0(default_readdir+0x83)[0x7f16eb4d7013]
/usr/lib64/libglusterfs.so.0(default_readdir_resume+0x142)[0x7f16eb4d9a02]
/usr/lib64/libglusterfs.so.0(call_resume+0x1b1)[0x7f16eb4f3631]
/usr/lib64/glusterfs/3.6.0.24/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f16e05f6348]
/lib64/libpthread.so.0[0x3f502079d1]
/lib64/libc.so.6(clone+0x6d)[0x3f4fae8b5d]
---------


gluster volume info

[root@nfs1 ~]# gluster volume info dist-rep
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 07f5f58d-83e3-4591-ba7f-e2473153e220
Status: Started
Snap Volume: no
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.62:/bricks/d1r1
Brick2: 10.70.37.215:/bricks/d1r2
Brick3: 10.70.37.44:/bricks/d2r1
Brick4: 10.70.37.201:/bricks/dr2r2
Brick5: 10.70.37.62:/bricks/d3r1
Brick6: 10.70.37.215:/bricks/d3r2
Brick7: 10.70.37.44:/bricks/d4r1
Brick8: 10.70.37.201:/bricks/dr4r2
Brick9: 10.70.37.62:/bricks/d5r1
Brick10: 10.70.37.215:/bricks/d5r2
Brick11: 10.70.37.44:/bricks/d6r1
Brick12: 10.70.37.201:/bricks/dr6r2
Brick13: 10.70.37.62:/bricks/d1r1-add
Brick14: 10.70.37.215:/bricks/d1r2-add
Options Reconfigured:
nfs-ganesha.enable: off
nfs-ganesha.host: 10.70.37.44
nfs.disable: off
performance.readdir-ahead: on
features.quota: on
features.quota-deem-statfs: off
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

How reproducible:
crash seen once till now but for all bricks


Expected results:
crash is unexpected

Additional info:

--- Additional comment from Saurabh on 2014-07-07 05:22:27 EDT ---

(gdb) bt
#0  0x0000003f4fa81461 in __strlen_sse2 () from /lib64/libc.so.6
#1  0x00007f16dbdf2651 in mq_loc_fill_from_name (this=0xb8be10, newloc=0x7f16bf9f89a0, oldloc=0xbad66c, ino=<value optimized out>, name=0x7f169804d938 "appletalk")
    at marker-quota.c:176
#2  0x00007f16dbdf628f in mq_readdir_cbk (frame=0x7f16ea14bba8, cookie=<value optimized out>, this=0xb8be10, op_ret=<value optimized out>, op_errno=<value optimized out>, 
    entries=0x7f16bf9f8bb0, xdata=0x0) at marker-quota.c:609
#3  0x00007f16eb4de0b2 in default_readdir_cbk (frame=0x7f16ea3274e4, cookie=<value optimized out>, this=<value optimized out>, op_ret=23, op_errno=0, entries=<value optimized out>, 
    xdata=0x0) at defaults.c:1225
#4  0x00007f16eb4de0b2 in default_readdir_cbk (frame=0x7f16ea323c74, cookie=<value optimized out>, this=<value optimized out>, op_ret=23, op_errno=0, entries=<value optimized out>, 
    xdata=0x0) at defaults.c:1225
#5  0x00007f16e0a17432 in posix_acl_readdir_cbk (frame=0x7f16ea31d700, cookie=<value optimized out>, this=<value optimized out>, op_ret=23, op_errno=0, 
    entries=<value optimized out>, xdata=0x0) at posix-acl.c:1486
#6  0x00007f16e0e4f3c8 in posix_do_readdir (frame=0x7f16ea3276e8, this=<value optimized out>, fd=<value optimized out>, size=<value optimized out>, off=23, whichop=28, dict=0x0)
    at posix.c:4946
#7  0x00007f16e0e4f603 in posix_readdir (frame=<value optimized out>, this=<value optimized out>, fd=<value optimized out>, size=<value optimized out>, off=<value optimized out>, 
    xdata=<value optimized out>) at posix.c:4958
#8  0x00007f16eb4d7013 in default_readdir (frame=0x7f16ea3276e8, this=0xb83070, fd=0xbcecb0, size=4096, off=<value optimized out>, xdata=<value optimized out>) at defaults.c:2067
#9  0x00007f16e0a1991d in posix_acl_readdir (frame=0x7f16ea31d700, this=0xb85ea0, fd=0xbcecb0, size=4096, offset=0, xdata=0x0) at posix-acl.c:1500
#10 0x00007f16eb4d7013 in default_readdir (frame=0x7f16ea31d700, this=0xb87130, fd=0xbcecb0, size=4096, off=<value optimized out>, xdata=<value optimized out>) at defaults.c:2067
#11 0x00007f16eb4d9a02 in default_readdir_resume (frame=0x7f16ea323c74, this=0xb88350, fd=0xbcecb0, size=4096, off=0, xdata=0x0) at defaults.c:1635
#12 0x00007f16eb4f3631 in call_resume_wind (stub=0x7f16e9dc1f38) at call-stub.c:2492
#13 call_resume (stub=0x7f16e9dc1f38) at call-stub.c:2841
#14 0x00007f16e05f6348 in iot_worker (data=0xbba080) at io-threads.c:214
#15 0x0000003f502079d1 in start_thread () from /lib64/libpthread.so.0
#16 0x0000003f4fae8b5d in clone () from /lib64/libc.so.6


further trace of bt,
(gdb) f 1
#1  0x00007f16dbdf2651 in mq_loc_fill_from_name (this=0xb8be10, newloc=0x7f16bf9f89a0, oldloc=0xbad66c, ino=<value optimized out>, name=0x7f169804d938 "appletalk")
    at marker-quota.c:176
176	        len = strlen (oldloc->path);
(gdb) list
171	        }
172	
173	        newloc->parent = inode_ref (oldloc->inode);
174	        uuid_copy (newloc->pargfid, oldloc->inode->gfid);
175	
176	        len = strlen (oldloc->path);
177	
178	        if (oldloc->path [len - 1] == '/')
179	                ret = gf_asprintf ((char **) &path, "%s%s",
180	                                   oldloc->path, name);
(gdb) p oldloc
$1 = (loc_t *) 0xbad66c
(gdb) p *$
$2 = {path = 0x0, name = 0x0, inode = 0x7f16d91760b4, parent = 0x7f16d90f4be0, gfid = "0\367H\216\361QF3\237\314\335\026\327\t\"p", 
  pargfid = "\037\062b<X\031Ej\232\035\000\346y\303\037\017"}
(gdb)

Comment 1 Niels de Vos 2014-07-13 11:52:29 UTC

http://review.gluster.org/8296 has been POSTed, but against a bug in the Red Hat Storage product. Please repost against this bug.

Comment 2 Anand Avati 2014-07-14 06:38:05 UTC

REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before sending the control to healing) posted (#2) for review on master by Varun Shastry (vshastry)

Comment 3 Anand Avati 2014-07-14 10:05:26 UTC

REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before sending the control to healing) posted (#3) for review on master by Varun Shastry (vshastry)

Comment 4 Anand Avati 2014-07-15 06:45:25 UTC

REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before sending the control to healing) posted (#4) for review on master by Varun Shastry (vshastry)

Comment 5 Anand Avati 2014-07-21 11:26:14 UTC

REVIEW: http://review.gluster.org/8296 (features/marker: Fill loc->path before sending the control to healing) posted (#5) for review on master by Varun Shastry (vshastry)

Comment 6 Anand Avati 2014-07-22 15:56:59 UTC

COMMIT: http://review.gluster.org/8296 committed in master by Raghavendra G (rgowdapp) 
------
commit 56ffb164743449897f1cdecd3dbe085a0f0a66d7
Author: Varun Shastry <vshastry>
Date:   Wed Jul 9 15:16:00 2014 +0530

    features/marker: Fill loc->path before sending the control to healing
    
    Problem:
    The xattr healing part of the marker requires path to be present in the loc.
    Currently path is not filled while triggering from the readdirp_cbk.
    
    Solution:
    Current patch tries to fill the loc with path.
    
    Change-Id: I5c7dc9de60fa79ca0fe9b58d2636fd1355add0d3
    BUG: 1118591
    Signed-off-by: Varun Shastry <vshastry>
    Reviewed-on: http://review.gluster.org/8296
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>
    Tested-by: Raghavendra G <rgowdapp>

Comment 9 Niels de Vos 2015-05-14 17:26:16 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 10 Niels de Vos 2015-05-14 17:35:28 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 11 Niels de Vos 2015-05-14 17:37:50 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 12 Niels de Vos 2015-05-14 17:42:41 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.