Bug 817461

Summary: [99bdbb2ed9697bb93f006d4060d4f1aaeb0850af] Crash in mdc_local_wipe during graph change
Product: [Community] GlusterFS Reporter: Anush Shetty <ashetty>
Component: fuseAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED UPSTREAM QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-02 08:21:14 UTC Type: Bug
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anush Shetty 2012-04-30 06:27:26 UTC
Description of problem: When add-brick was done running dbench on mount point, the fuse client crashed. The add-brick was done to single brick volume. 


Version-Release number of selected component (if applicable): Upstream


How reproducible: Consistently


Steps to Reproduce:
1. Run dbench -s 10 -D /mnt/gluster on the fuse client
2. Add another brick to the volume gluster volume add-brick test2 shortwing:/falcon/d2

  
Actual results:

(gdb) bt
#0  0x00007f117f3a83a5 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f117f3abb0b in __GI_abort () at abort.c:92
#2  0x00007f117f3e0113 in __libc_message (do_abort=2, fmt=0x7f117f4d10d8 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3  0x00007f117f3eaa96 in malloc_printerr (action=3, str=0x7f117f4d12a0 "free(): invalid next size (normal)", ptr=<optimized out>)
    at malloc.c:6283
#4  0x00007f117f3eed7c in __GI___libc_free (mem=<optimized out>) at malloc.c:3738
#5  0x00007f117fdb32bd in __gf_free (free_ptr=0x30f4be0) at mem-pool.c:267
#6  0x00007f117a99b845 in mdc_local_wipe (this=0x3111f00, local=0x30f4be0) at md-cache.c:238
#7  0x00007f117a99c520 in mdc_lookup_cbk (frame=0x7f117dffc06c, cookie=0x7f117dffed70, this=0x3111f00, op_ret=0, op_errno=107, 
    inode=0x7f11708f61e8, stbuf=0x3128438, dict=0x7f117dc7f998, postparent=0x3128668) at md-cache.c:625
#8  0x00007f117ababf2f in qr_lookup_cbk (frame=0x7f117dffed70, cookie=0x7f117dffb1a4, this=0x31114a0, op_ret=0, op_errno=107, 
    inode=0x7f11708f61e8, buf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at quick-read.c:455
#9  0x00007f117adc894e in ioc_lookup_cbk (frame=0x7f117dffb1a4, cookie=0x7f117dffe408, this=0x3110a60, op_ret=0, op_errno=107, 
    inode=0x7f11708f61e8, stbuf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at io-cache.c:264
#10 0x00007f117fd811e5 in default_lookup_cbk (frame=0x7f117dffe408, cookie=0x7f117dffd138, this=0x3110080, op_ret=0, op_errno=107, 
    inode=0x7f11708f61e8, buf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at defaults.c:46
#11 0x00007f117fd811e5 in default_lookup_cbk (frame=0x7f117dffd138, cookie=0x7f117dffa994, this=0x310f6e0, op_ret=0, op_errno=107, 
    inode=0x7f11708f61e8, buf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at defaults.c:46
#12 0x00007f117e669292 in dht_discover_complete (this=0x310ed40, discover_frame=0x7f117ddf7994) at dht-common.c:209
#13 0x00007f117e669a91 in dht_discover_cbk (frame=0x7f117ddf7994, cookie=0x7f117dffa0d8, this=0x310ed40, op_ret=0, op_errno=22, 
    inode=0x7f11708f61e8, stbuf=0x7ffffbe531b0, xattr=0x7f117dc7f998, postparent=0x7ffffbe53220) at dht-common.c:316
#14 0x00007f117b20dfe9 in client3_1_lookup_cbk (req=0x336fd10, iov=0x336fd50, count=1, myframe=0x7f117dffa0d8) at client3_1-fops.c:2616
#15 0x00007f117fb5b321 in rpc_clnt_handle_reply (clnt=0x336e120, pollin=0x30ff910) at rpc-clnt.c:797
#16 0x00007f117fb5b698 in rpc_clnt_notify (trans=0x3495390, mydata=0x336e150, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x30ff910)
    at rpc-clnt.c:916
#17 0x00007f117fb57380 in rpc_transport_notify (this=0x3495390, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x30ff910) at rpc-transport.c:498
#18 0x00007f117be4a323 in socket_event_poll_in (this=0x3495390) at socket.c:1686
#19 0x00007f117be4a88c in socket_event_handler (fd=28, idx=8, data=0x3495390, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#20 0x00007f117fdb21e4 in event_dispatch_epoll_handler (event_pool=0x18be520, events=0x18cb1f0, i=0) at event.c:794
#21 0x00007f117fdb23f7 in event_dispatch_epoll (event_pool=0x18be520) at event.c:856
#22 0x00007f117fdb276a in event_dispatch (event_pool=0x18be520) at event.c:956
#23 0x0000000000408542 in main (argc=8, argv=0x7ffffbe53788) at glusterfsd.c:1652


(gdb) p *local
$1 = {loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, 
  loc2 = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, 
  fd = 0x0, linkname = 0x0, xattr = 0x0}
(gdb) up
#7  0x00007f117a99c520 in mdc_lookup_cbk (frame=0x7f117dffc06c, cookie=0x7f117dffed70, this=0x3111f00, op_ret=0, op_errno=107, 
    inode=0x7f11708f61e8, stbuf=0x3128438, dict=0x7f117dc7f998, postparent=0x3128668) at md-cache.c:625
625	        MDC_STACK_UNWIND (lookup, frame, op_ret, op_errno, inode, stbuf,
(gdb) p *frame
$2 = {root = 0x7f117ddf1898, parent = 0x7f117dffcf34, next = 0x7f117dffcf34, prev = 0x7f117dffed70, local = 0x0, this = 0x3111f00, 
  ret = 0x7f117a784d0b <io_stats_open_cbk>, ref_count = 1, lock = 1, cookie = 0x7f117dffc06c, complete = _gf_false, op = GF_FOP_NULL, 
  begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7f117a794510 "io_stats_open", 
  wind_to = 0x7f117a793cb8 "FIRST_CHILD(this)->fops->open", unwind_from = 0x0, unwind_to = 0x7f117a793cd6 "io_stats_open_cbk"}
(gdb) p *this->graph
$3 = {list = {next = 0x7f116c2b7af0, prev = 0x18a7170}, 
  graph_uuid = "shortwing-18365-2012/04/30-11:31:05:572540", '\000' <repeats 85 times>, dob = {tv_sec = 1335765665, tv_usec = 572539}, 
  first = 0x31133e0, top = 0x31133e0, xl_count = 10, id = 14, used = 1, volfile_checksum = 0}
(gdb) p *frame->this->graph
$4 = {list = {next = 0x7f116c2b7af0, prev = 0x18a7170}, 
  graph_uuid = "shortwing-18365-2012/04/30-11:31:05:572540", '\000' <repeats 85 times>, dob = {tv_sec = 1335765665, tv_usec = 572539}, 
  first = 0x31133e0, top = 0x31133e0, xl_count = 10, id = 14, used = 1, volfile_checksum = 0}
(gdb) up
#8  0x00007f117ababf2f in qr_lookup_cbk (frame=0x7f117dffed70, cookie=0x7f117dffb1a4, this=0x31114a0, op_ret=0, op_errno=107, 
    inode=0x7f11708f61e8, buf=0x3128438, xdata=0x7f117dc7f998, postparent=0x3128668) at quick-read.c:455
455	        QR_STACK_UNWIND (lookup, frame, op_ret, op_errno, inode, buf, xdata,
(gdb) p *frame
$5 = {root = 0x7f117ddf1898, parent = 0x7f117dffc06c, next = 0x7f117dffc06c, prev = 0x7f117dffb1a4, local = 0x3115178, this = 0x31114a0, 
  ret = 0x7f117fd8257e <default_open_cbk>, ref_count = 1, lock = 1, cookie = 0x7f117dffed70, complete = _gf_false, op = GF_FOP_NULL, 
  begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7f117fddd94b "default_open", 
  wind_to = 0x7f117fddcf49 "FIRST_CHILD(this)->fops->open", unwind_from = 0x0, unwind_to = 0x7f117fddcf67 "default_open_cbk"}

Comment 1 Anush Shetty 2012-04-30 07:25:24 UTC
Ran the glusterfs client process with valgrind and did volume set in a loop. Saw invalid writes in the valgrind log.

Valgrind log.

==19418== Invalid write of size 4
==19418==    at 0x4C2AF5A: memset (mc_replace_strmem.c:738)
==19418==    by 0x4E79ECB: __fd_ctx_set (fd.c:834)
==19418==    by 0x4E79FC8: fd_ctx_set (fd.c:859)
==19418==    by 0xA290856: ra_open_cbk (read-ahead.c:99)
==19418==    by 0xA081C69: wb_open_cbk (write-behind.c:1416)
==19418==    by 0x9E5F793: dht_open_cbk (dht-inode-read.c:64)
==19418==    by 0x9BE7081: client3_1_open_cbk (client3_1-fops.c:422)
==19418==    by 0x50D3320: rpc_clnt_handle_reply (rpc-clnt.c:797)
==19418==    by 0x50D3697: rpc_clnt_notify (rpc-clnt.c:916)
==19418==    by 0x50CF37F: rpc_transport_notify (rpc-transport.c:498)
==19418==    by 0x8FB5322: socket_event_poll_in (socket.c:1686)
==19418==    by 0x8FB588B: socket_event_handler (socket.c:1801)
==19418==  Address 0xff454d8 is 8 bytes after a block of size 176 free'd
==19418==    at 0x4C29097: realloc (vg_replace_malloc.c:525)
==19418==    by 0x4E7D980: __gf_default_realloc (mem-pool.h:96)
==19418==    by 0x4E7DFB4: __gf_realloc (mem-pool.c:194)
==19418==    by 0x4E79E07: __fd_ctx_set (fd.c:815)
==19418==    by 0x4E79FC8: fd_ctx_set (fd.c:859)
==19418==    by 0xA290856: ra_open_cbk (read-ahead.c:99)
==19418==    by 0xA081C69: wb_open_cbk (write-behind.c:1416)
==19418==    by 0x9E5F793: dht_open_cbk (dht-inode-read.c:64)
==19418==    by 0x9BE7081: client3_1_open_cbk (client3_1-fops.c:422)
==19418==    by 0x50D3320: rpc_clnt_handle_reply (rpc-clnt.c:797)
==19418==    by 0x50D3697: rpc_clnt_notify (rpc-clnt.c:916)
==19418==    by 0x50CF37F: rpc_transport_notify (rpc-transport.c:498)

==19418== Conditional jump or move depends on uninitialised value(s)
==19418==    at 0x4E79D68: __fd_ctx_set (fd.c:797)
==19418==    by 0x4E79FC8: fd_ctx_set (fd.c:859)
==19418==    by 0xAAE0C93: ios_fd_ctx_set (io-stats.c:358)
==19418==    by 0xAAE5E04: io_stats_open_cbk (io-stats.c:1270)
==19418==    by 0x4E4D6D4: default_open_cbk (defaults.c:202)
==19418==    by 0xA6B7E08: qr_open_cbk (quick-read.c:661)
==19418==    by 0xA4A1D8A: ioc_open_cbk (io-cache.c:588)
==19418==    by 0xA290A1C: ra_open_cbk (read-ahead.c:112)
==19418==    by 0xA081C69: wb_open_cbk (write-behind.c:1416)
==19418==    by 0x9E5F793: dht_open_cbk (dht-inode-read.c:64)
==19418==    by 0x9BE7081: client3_1_open_cbk (client3_1-fops.c:422)
==19418==    by 0x50D3320: rpc_clnt_handle_reply (rpc-clnt.c:797)
==19418== 
==19418== Conditional jump or move depends on uninitialised value(s)
==19418==    at 0x4E79D92: __fd_ctx_set (fd.c:803)
==19418==    by 0x4E79FC8: fd_ctx_set (fd.c:859)
==19418==    by 0xAAE0C93: ios_fd_ctx_set (io-stats.c:358)
==19418==    by 0xAAE5E04: io_stats_open_cbk (io-stats.c:1270)
==19418==    by 0x4E4D6D4: default_open_cbk (defaults.c:202)
==19418==    by 0xA6B7E08: qr_open_cbk (quick-read.c:661)
==19418==    by 0xA4A1D8A: ioc_open_cbk (io-cache.c:588)
==19418==    by 0xA290A1C: ra_open_cbk (read-ahead.c:112)
==19418==    by 0xA081C69: wb_open_cbk (write-behind.c:1416)
==19418==    by 0x9E5F793: dht_open_cbk (dht-inode-read.c:64)
==19418==    by 0x9BE7081: client3_1_open_cbk (client3_1-fops.c:422)
==19418==    by 0x50D3320: rpc_clnt_handle_reply (rpc-clnt.c:797)

Comment 2 Anand Avati 2012-05-01 17:20:51 UTC
CHANGE: http://review.gluster.com/3250 (libglusterfs/fd: use reallocated memory for intializing new graphs contexts) merged in master by Vijay Bellur (vijay)

Comment 3 Raghavendra G 2012-05-02 08:18:57 UTC
*** Bug 816951 has been marked as a duplicate of this bug. ***