Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 765145 (GLUSTER-3413)

Summary:	glusterfs client process crashed due to memory overrun
Product:	[Community] GlusterFS	Reporter:	M S Vishwanath Bhat <vbhat>
Component:	HDFS	Assignee:	Venky Shankar <vshankar>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	pre-release	CC:	gluster-bugs, mzywusko
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:	master	Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description M S Vishwanath Bhat 2011-08-15 15:41:48 UTC

On a dist-rep set-up I was running 'teragen' and 'randomtextwriter' simultaneously and glusterfs client process crashed with following backtrace.

(gdb) bt
#0  0x00000038fbe30265 in raise () from /lib64/libc.so.6
#1  0x00000038fbe31d10 in abort () from /lib64/libc.so.6
#2  0x00000038fbe296e6 in __assert_fail () from /lib64/libc.so.6
#3  0x00002b77db9bbdff in __gf_free (free_ptr=0x11c32a00) at mem-pool.c:297
#4  0x00002aaaada8c61c in dht_pathinfo_getxattr_cbk (frame=0x2b77dc8a5d14, cookie=0x2b77dc8a52d4, this=0x11c08660, op_ret=0, op_errno=0, xattr=0x11c31910) at dht-common.c:1764
#5  0x00002aaaad80d98a in afr_getxattr_pathinfo_cbk (frame=0x2b77dc8a52d4, cookie=0x0, this=0x11c07a10, op_ret=0, op_errno=0, dict=0x11c398e0) at afr-inode-read.c:742
#6  0x00002aaaad5d0490 in client3_1_getxattr_cbk (req=0x2aaaaebf6710, iov=0x2aaaaebf6750, count=1, myframe=0x2b77dc8a6568) at client3_1-fops.c:892
#7  0x00002b77dbc0a752 in rpc_clnt_handle_reply (clnt=0x11c1bb70, pollin=0x11c381e0) at rpc-clnt.c:747
#8  0x00002b77dbc0aa89 in rpc_clnt_notify (trans=0x11c1bcb0, mydata=0x11c1bba0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c381e0) at rpc-clnt.c:860
#9  0x00002b77dbc07170 in rpc_transport_notify (this=0x11c1bcb0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c381e0) at rpc-transport.c:931
#10 0x00002aaaaad6dea7 in socket_event_poll_in (this=0x11c1bcb0) at socket.c:1676
#11 0x00002aaaaad6e3e9 in socket_event_handler (fd=10, idx=3, data=0x11c1bcb0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1791
#12 0x00002b77db9bae30 in event_dispatch_epoll_handler (event_pool=0x11bf7920, events=0x11bfc650, i=0) at event.c:794
#13 0x00002b77db9bb035 in event_dispatch_epoll (event_pool=0x11bf7920) at event.c:856
#14 0x00002b77db9bb38f in event_dispatch (event_pool=0x11bf7920) at event.c:956
#15 0x0000000000407222 in main (argc=4, argv=0x7fffc15b33a8) at glusterfsd.c:1557
(gdb) f 4
#4  0x00002aaaada8c61c in dht_pathinfo_getxattr_cbk (frame=0x2b77dc8a5d14, cookie=0x2b77dc8a52d4, this=0x11c08660, op_ret=0, op_errno=0, xattr=0x11c31910) at dht-common.c:1764
1764                            GF_FREE (local->pathinfo);
(gdb) f 5
#5  0x00002aaaad80d98a in afr_getxattr_pathinfo_cbk (frame=0x2b77dc8a52d4, cookie=0x0, this=0x11c07a10, op_ret=0, op_errno=0, dict=0x11c398e0) at afr-inode-read.c:742
742                     AFR_STACK_UNWIND (getxattr, frame, op_ret, op_errno, xattr);
(gdb) f 3
#3  0x00002b77db9bbdff in __gf_free (free_ptr=0x11c32a00) at mem-pool.c:297
297                     GF_ASSERT (0);
(gdb)

Comment 1 Venky Shankar 2011-08-16 02:17:28 UTC

Vishwanath is giving another run with fix. It was a memory overrun due to incorrect length used in GF_REALLOC calls in pathinfo xattr callback for dht.

Comment 2 Anand Avati 2011-08-16 03:18:41 UTC

CHANGE: http://review.gluster.com/236 (size of the allocated length is incorrectly calculated which could) merged in master by Anand Avati (avati)

Comment 3 Anand Avati 2011-08-18 01:01:40 UTC

CHANGE: http://review.gluster.com/249 (We use strcat to concat pathinfo strings. strcat appends a \0 at) merged in master by Anand Avati (avati)

Comment 4 M S Vishwanath Bhat 2011-08-24 07:49:53 UTC

I observed this crash when I ran two map-reduce jobs simultaneously. Now after the fix, When i ran 2 map-reduce jobs simultaneously I don't see the crash and both the jobs went on to completion.