Description of problem: When add-brick was done running dbench on mount point, the fuse client crashed. The add-brick was done to single brick volume. Version-Release number of selected component (if applicable): Upstream How reproducible: Consistently Steps to Reproduce: 1. Run dbench -s 10 -D /mnt/gluster on the fuse client 2. Add another brick to the volume gluster volume add-brick test2 shortwing:/falcon/d2 Actual results: (gdb) bt #0 0x00007f117f3a83a5 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f117f3abb0b in __GI_abort () at abort.c:92 #2 0x00007f117f3e0113 in __libc_message (do_abort=2, fmt=0x7f117f4d10d8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:189 #3 0x00007f117f3eaa96 in malloc_printerr (action=3, str=0x7f117f4d12a0 "free(): invalid next size (normal)", ptr=<optimized out>) at malloc.c:6283 #4 0x00007f117f3eed7c in __GI___libc_free (mem=<optimized out>) at malloc.c:3738 #5 0x00007f117fdb32bd in __gf_free (free_ptr=0x30f4be0) at mem-pool.c:267 #6 0x00007f117a99b845 in mdc_local_wipe (this=0x3111f00, local=0x30f4be0) at md-cache.c:238 #7 0x00007f117a99c520 in mdc_lookup_cbk (frame=0x7f117dffc06c, cookie=0x7f117dffed70, this=0x3111f00, op_ret=0, op_errno=107, inode=0x7f11708f61e8, stbuf=0x3128438, dict=0x7f117dc7f998, postparent=0x3128668) at md-cache.c:625 #8 0x00007f117ababf2f in qr_lookup_cbk (frame=0x7f117dffed70, cookie=0x7f117dffb1a4, this=0x31114a0, op_ret=0, op_errno=107, inode=0x7f11708f61e8, buf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at quick-read.c:455 #9 0x00007f117adc894e in ioc_lookup_cbk (frame=0x7f117dffb1a4, cookie=0x7f117dffe408, this=0x3110a60, op_ret=0, op_errno=107, inode=0x7f11708f61e8, stbuf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at io-cache.c:264 #10 0x00007f117fd811e5 in default_lookup_cbk (frame=0x7f117dffe408, cookie=0x7f117dffd138, this=0x3110080, op_ret=0, op_errno=107, inode=0x7f11708f61e8, buf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at defaults.c:46 #11 0x00007f117fd811e5 in default_lookup_cbk (frame=0x7f117dffd138, cookie=0x7f117dffa994, this=0x310f6e0, op_ret=0, op_errno=107, inode=0x7f11708f61e8, buf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at defaults.c:46 #12 0x00007f117e669292 in dht_discover_complete (this=0x310ed40, discover_frame=0x7f117ddf7994) at dht-common.c:209 #13 0x00007f117e669a91 in dht_discover_cbk (frame=0x7f117ddf7994, cookie=0x7f117dffa0d8, this=0x310ed40, op_ret=0, op_errno=22, inode=0x7f11708f61e8, stbuf=0x7ffffbe531b0, xattr=0x7f117dc7f998, postparent=0x7ffffbe53220) at dht-common.c:316 #14 0x00007f117b20dfe9 in client3_1_lookup_cbk (req=0x336fd10, iov=0x336fd50, count=1, myframe=0x7f117dffa0d8) at client3_1-fops.c:2616 #15 0x00007f117fb5b321 in rpc_clnt_handle_reply (clnt=0x336e120, pollin=0x30ff910) at rpc-clnt.c:797 #16 0x00007f117fb5b698 in rpc_clnt_notify (trans=0x3495390, mydata=0x336e150, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x30ff910) at rpc-clnt.c:916 #17 0x00007f117fb57380 in rpc_transport_notify (this=0x3495390, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x30ff910) at rpc-transport.c:498 #18 0x00007f117be4a323 in socket_event_poll_in (this=0x3495390) at socket.c:1686 #19 0x00007f117be4a88c in socket_event_handler (fd=28, idx=8, data=0x3495390, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801 #20 0x00007f117fdb21e4 in event_dispatch_epoll_handler (event_pool=0x18be520, events=0x18cb1f0, i=0) at event.c:794 #21 0x00007f117fdb23f7 in event_dispatch_epoll (event_pool=0x18be520) at event.c:856 #22 0x00007f117fdb276a in event_dispatch (event_pool=0x18be520) at event.c:956 #23 0x0000000000408542 in main (argc=8, argv=0x7ffffbe53788) at glusterfsd.c:1652 (gdb) p *local $1 = {loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, loc2 = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, fd = 0x0, linkname = 0x0, xattr = 0x0} (gdb) up #7 0x00007f117a99c520 in mdc_lookup_cbk (frame=0x7f117dffc06c, cookie=0x7f117dffed70, this=0x3111f00, op_ret=0, op_errno=107, inode=0x7f11708f61e8, stbuf=0x3128438, dict=0x7f117dc7f998, postparent=0x3128668) at md-cache.c:625 625 MDC_STACK_UNWIND (lookup, frame, op_ret, op_errno, inode, stbuf, (gdb) p *frame $2 = {root = 0x7f117ddf1898, parent = 0x7f117dffcf34, next = 0x7f117dffcf34, prev = 0x7f117dffed70, local = 0x0, this = 0x3111f00, ret = 0x7f117a784d0b <io_stats_open_cbk>, ref_count = 1, lock = 1, cookie = 0x7f117dffc06c, complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7f117a794510 "io_stats_open", wind_to = 0x7f117a793cb8 "FIRST_CHILD(this)->fops->open", unwind_from = 0x0, unwind_to = 0x7f117a793cd6 "io_stats_open_cbk"} (gdb) p *this->graph $3 = {list = {next = 0x7f116c2b7af0, prev = 0x18a7170}, graph_uuid = "shortwing-18365-2012/04/30-11:31:05:572540", '\000' <repeats 85 times>, dob = {tv_sec = 1335765665, tv_usec = 572539}, first = 0x31133e0, top = 0x31133e0, xl_count = 10, id = 14, used = 1, volfile_checksum = 0} (gdb) p *frame->this->graph $4 = {list = {next = 0x7f116c2b7af0, prev = 0x18a7170}, graph_uuid = "shortwing-18365-2012/04/30-11:31:05:572540", '\000' <repeats 85 times>, dob = {tv_sec = 1335765665, tv_usec = 572539}, first = 0x31133e0, top = 0x31133e0, xl_count = 10, id = 14, used = 1, volfile_checksum = 0} (gdb) up #8 0x00007f117ababf2f in qr_lookup_cbk (frame=0x7f117dffed70, cookie=0x7f117dffb1a4, this=0x31114a0, op_ret=0, op_errno=107, inode=0x7f11708f61e8, buf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at quick-read.c:455 455 QR_STACK_UNWIND (lookup, frame, op_ret, op_errno, inode, buf, xdata, (gdb) p *frame $5 = {root = 0x7f117ddf1898, parent = 0x7f117dffc06c, next = 0x7f117dffc06c, prev = 0x7f117dffb1a4, local = 0x3115178, this = 0x31114a0, ret = 0x7f117fd8257e <default_open_cbk>, ref_count = 1, lock = 1, cookie = 0x7f117dffed70, complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7f117fddd94b "default_open", wind_to = 0x7f117fddcf49 "FIRST_CHILD(this)->fops->open", unwind_from = 0x0, unwind_to = 0x7f117fddcf67 "default_open_cbk"}
Ran the glusterfs client process with valgrind and did volume set in a loop. Saw invalid writes in the valgrind log. Valgrind log. ==19418== Invalid write of size 4 ==19418== at 0x4C2AF5A: memset (mc_replace_strmem.c:738) ==19418== by 0x4E79ECB: __fd_ctx_set (fd.c:834) ==19418== by 0x4E79FC8: fd_ctx_set (fd.c:859) ==19418== by 0xA290856: ra_open_cbk (read-ahead.c:99) ==19418== by 0xA081C69: wb_open_cbk (write-behind.c:1416) ==19418== by 0x9E5F793: dht_open_cbk (dht-inode-read.c:64) ==19418== by 0x9BE7081: client3_1_open_cbk (client3_1-fops.c:422) ==19418== by 0x50D3320: rpc_clnt_handle_reply (rpc-clnt.c:797) ==19418== by 0x50D3697: rpc_clnt_notify (rpc-clnt.c:916) ==19418== by 0x50CF37F: rpc_transport_notify (rpc-transport.c:498) ==19418== by 0x8FB5322: socket_event_poll_in (socket.c:1686) ==19418== by 0x8FB588B: socket_event_handler (socket.c:1801) ==19418== Address 0xff454d8 is 8 bytes after a block of size 176 free'd ==19418== at 0x4C29097: realloc (vg_replace_malloc.c:525) ==19418== by 0x4E7D980: __gf_default_realloc (mem-pool.h:96) ==19418== by 0x4E7DFB4: __gf_realloc (mem-pool.c:194) ==19418== by 0x4E79E07: __fd_ctx_set (fd.c:815) ==19418== by 0x4E79FC8: fd_ctx_set (fd.c:859) ==19418== by 0xA290856: ra_open_cbk (read-ahead.c:99) ==19418== by 0xA081C69: wb_open_cbk (write-behind.c:1416) ==19418== by 0x9E5F793: dht_open_cbk (dht-inode-read.c:64) ==19418== by 0x9BE7081: client3_1_open_cbk (client3_1-fops.c:422) ==19418== by 0x50D3320: rpc_clnt_handle_reply (rpc-clnt.c:797) ==19418== by 0x50D3697: rpc_clnt_notify (rpc-clnt.c:916) ==19418== by 0x50CF37F: rpc_transport_notify (rpc-transport.c:498) ==19418== Conditional jump or move depends on uninitialised value(s) ==19418== at 0x4E79D68: __fd_ctx_set (fd.c:797) ==19418== by 0x4E79FC8: fd_ctx_set (fd.c:859) ==19418== by 0xAAE0C93: ios_fd_ctx_set (io-stats.c:358) ==19418== by 0xAAE5E04: io_stats_open_cbk (io-stats.c:1270) ==19418== by 0x4E4D6D4: default_open_cbk (defaults.c:202) ==19418== by 0xA6B7E08: qr_open_cbk (quick-read.c:661) ==19418== by 0xA4A1D8A: ioc_open_cbk (io-cache.c:588) ==19418== by 0xA290A1C: ra_open_cbk (read-ahead.c:112) ==19418== by 0xA081C69: wb_open_cbk (write-behind.c:1416) ==19418== by 0x9E5F793: dht_open_cbk (dht-inode-read.c:64) ==19418== by 0x9BE7081: client3_1_open_cbk (client3_1-fops.c:422) ==19418== by 0x50D3320: rpc_clnt_handle_reply (rpc-clnt.c:797) ==19418== ==19418== Conditional jump or move depends on uninitialised value(s) ==19418== at 0x4E79D92: __fd_ctx_set (fd.c:803) ==19418== by 0x4E79FC8: fd_ctx_set (fd.c:859) ==19418== by 0xAAE0C93: ios_fd_ctx_set (io-stats.c:358) ==19418== by 0xAAE5E04: io_stats_open_cbk (io-stats.c:1270) ==19418== by 0x4E4D6D4: default_open_cbk (defaults.c:202) ==19418== by 0xA6B7E08: qr_open_cbk (quick-read.c:661) ==19418== by 0xA4A1D8A: ioc_open_cbk (io-cache.c:588) ==19418== by 0xA290A1C: ra_open_cbk (read-ahead.c:112) ==19418== by 0xA081C69: wb_open_cbk (write-behind.c:1416) ==19418== by 0x9E5F793: dht_open_cbk (dht-inode-read.c:64) ==19418== by 0x9BE7081: client3_1_open_cbk (client3_1-fops.c:422) ==19418== by 0x50D3320: rpc_clnt_handle_reply (rpc-clnt.c:797)
CHANGE: http://review.gluster.com/3250 (libglusterfs/fd: use reallocated memory for intializing new graphs contexts) merged in master by Vijay Bellur (vijay)
*** Bug 816951 has been marked as a duplicate of this bug. ***