Bug 765145 (GLUSTER-3413) - glusterfs client process crashed due to memory overrun
Summary: glusterfs client process crashed due to memory overrun
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3413
Product: GlusterFS
Classification: Community
Component: HDFS
Version: pre-release
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Venky Shankar
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-15 15:41 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:55 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: master


Attachments (Terms of Use)

Description M S Vishwanath Bhat 2011-08-15 15:41:48 UTC
On a dist-rep set-up I was running 'teragen' and 'randomtextwriter' simultaneously and glusterfs client process crashed with following backtrace.

(gdb) bt
#0  0x00000038fbe30265 in raise () from /lib64/libc.so.6
#1  0x00000038fbe31d10 in abort () from /lib64/libc.so.6
#2  0x00000038fbe296e6 in __assert_fail () from /lib64/libc.so.6
#3  0x00002b77db9bbdff in __gf_free (free_ptr=0x11c32a00) at mem-pool.c:297
#4  0x00002aaaada8c61c in dht_pathinfo_getxattr_cbk (frame=0x2b77dc8a5d14, cookie=0x2b77dc8a52d4, this=0x11c08660, op_ret=0, op_errno=0, xattr=0x11c31910) at dht-common.c:1764
#5  0x00002aaaad80d98a in afr_getxattr_pathinfo_cbk (frame=0x2b77dc8a52d4, cookie=0x0, this=0x11c07a10, op_ret=0, op_errno=0, dict=0x11c398e0) at afr-inode-read.c:742
#6  0x00002aaaad5d0490 in client3_1_getxattr_cbk (req=0x2aaaaebf6710, iov=0x2aaaaebf6750, count=1, myframe=0x2b77dc8a6568) at client3_1-fops.c:892
#7  0x00002b77dbc0a752 in rpc_clnt_handle_reply (clnt=0x11c1bb70, pollin=0x11c381e0) at rpc-clnt.c:747
#8  0x00002b77dbc0aa89 in rpc_clnt_notify (trans=0x11c1bcb0, mydata=0x11c1bba0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c381e0) at rpc-clnt.c:860
#9  0x00002b77dbc07170 in rpc_transport_notify (this=0x11c1bcb0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x11c381e0) at rpc-transport.c:931
#10 0x00002aaaaad6dea7 in socket_event_poll_in (this=0x11c1bcb0) at socket.c:1676
#11 0x00002aaaaad6e3e9 in socket_event_handler (fd=10, idx=3, data=0x11c1bcb0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1791
#12 0x00002b77db9bae30 in event_dispatch_epoll_handler (event_pool=0x11bf7920, events=0x11bfc650, i=0) at event.c:794
#13 0x00002b77db9bb035 in event_dispatch_epoll (event_pool=0x11bf7920) at event.c:856
#14 0x00002b77db9bb38f in event_dispatch (event_pool=0x11bf7920) at event.c:956
#15 0x0000000000407222 in main (argc=4, argv=0x7fffc15b33a8) at glusterfsd.c:1557
(gdb) f 4
#4  0x00002aaaada8c61c in dht_pathinfo_getxattr_cbk (frame=0x2b77dc8a5d14, cookie=0x2b77dc8a52d4, this=0x11c08660, op_ret=0, op_errno=0, xattr=0x11c31910) at dht-common.c:1764
1764                            GF_FREE (local->pathinfo);
(gdb) f 5
#5  0x00002aaaad80d98a in afr_getxattr_pathinfo_cbk (frame=0x2b77dc8a52d4, cookie=0x0, this=0x11c07a10, op_ret=0, op_errno=0, dict=0x11c398e0) at afr-inode-read.c:742
742                     AFR_STACK_UNWIND (getxattr, frame, op_ret, op_errno, xattr);
(gdb) f 3
#3  0x00002b77db9bbdff in __gf_free (free_ptr=0x11c32a00) at mem-pool.c:297
297                     GF_ASSERT (0);
(gdb)

Comment 1 Venky Shankar 2011-08-16 02:17:28 UTC
Vishwanath is giving another run with fix. It was a memory overrun due to incorrect length used in GF_REALLOC calls in pathinfo xattr callback for dht.

Comment 2 Anand Avati 2011-08-16 03:18:41 UTC
CHANGE: http://review.gluster.com/236 (size of the allocated length is incorrectly calculated which could) merged in master by Anand Avati (avati)

Comment 3 Anand Avati 2011-08-18 01:01:40 UTC
CHANGE: http://review.gluster.com/249 (We use strcat to concat pathinfo strings. strcat appends a \0 at) merged in master by Anand Avati (avati)

Comment 4 M S Vishwanath Bhat 2011-08-24 07:49:53 UTC
I observed this crash when I ran two map-reduce jobs simultaneously. Now after the fix, When i ran 2 map-reduce jobs simultaneously I don't see the crash and both the jobs went on to completion.


Note You need to log in before you can comment on or make changes to this bug.