Description of problem: Same setup as 826080. After adding 2 bricks to a 2 replica volume (making it 2x2 distributed replicate) did rebalance. Rebalance process crashed with the following backtrace. Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa45/xlator/performance/md-cache.so Cannot find new threads: generic error Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.3.x86_64 valgrind-3.6.0-4.el6.x86_64 zlib-1.2.3-27.el6.x86_64 (gdb) bt #0 0x0000003cdc20eccd in nanosleep () from /lib64/libpthread.so.0 #1 0x0000000004c3e22b in gf_timer_proc (ctx=0x52ea040) at ../../../libglusterfs/src/timer.c:173 #2 0x0000003cdc2077f1 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003cdbae592d in clone () from /lib64/libc.so.6 (gdb) t 2 [Switching to thread 2 (Thread 28542)]#0 0x0000003cdc20e48d in write () from /lib64/libpthread.so.0 (gdb) bt #0 0x0000003cdc20e48d in write () from /lib64/libpthread.so.0 #1 0x0000000004c3a269 in gf_print_trace (signum=6) at ../../../libglusterfs/src/common-utils.c:373 #2 <signal handler called> #3 0x0000003cdba32885 in raise () from /lib64/libc.so.6 #4 0x0000003cdba34065 in abort () from /lib64/libc.so.6 #5 0x0000003cdba2b9fe in __assert_fail_base () from /lib64/libc.so.6 #6 0x0000003cdba2bac0 in __assert_fail () from /lib64/libc.so.6 #7 0x0000000004c5ae1f in __gf_free (free_ptr=0x569f400) at ../../../libglusterfs/src/mem-pool.c:269 #8 0x000000000986e38a in gf_defrag_start_crawl (data=0x5689dd0) at ../../../../../xlators/cluster/dht/src/dht-rebalance.c:1491 #9 0x0000000004c6a882 in synctask_wrap (old_task=0xc6d0f20) at ../../../libglusterfs/src/syncop.c:120 #10 0x0000003cdba43610 in ?? () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () (gdb) f 7 #7 0x0000000004c5ae1f in __gf_free (free_ptr=0x569f400) at ../../../libglusterfs/src/mem-pool.c:269 269 GF_ASSERT (0); (gdb) f 8 #8 0x000000000986e38a in gf_defrag_start_crawl (data=0x5689dd0) at ../../../../../xlators/cluster/dht/src/dht-rebalance.c:1491 1491 GF_FREE (defrag); (gdb) p defrag $1 = (gf_defrag_info_t *) 0x569f400 (gdb) p *defrag $2 = {total_files = 0, total_data = 0, num_files_lookedup = 0, total_failures = 0, lock = 1, cmd = 1, th = 356079360, defrag_status = GF_DEFRAG_STATUS_COMPLETE, rpc = 0x0, connected = 0, is_exiting = 1, pid = -3, root_inode = 0x1579607c, node_uuid = "/>\"N\310\fK\031\223\370\260\250\230\204_\035"} (gdb) t 3 [Switching to thread 3 (Thread 28541)]#0 0x0000003cdba32885 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000003cdba32885 in raise () from /lib64/libc.so.6 #1 0x0000003cdba34065 in abort () from /lib64/libc.so.6 #2 0x0000003cdba2b9fe in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000003cdba2bac0 in __assert_fail () from /lib64/libc.so.6 #4 0x0000000004c5ae1f in __gf_free (free_ptr=0x569f400) at ../../../libglusterfs/src/mem-pool.c:269 #5 0x000000000986e38a in gf_defrag_start_crawl (data=0x5689dd0) at ../../../../../xlators/cluster/dht/src/dht-rebalance.c:1491 #6 0x0000000004c6a882 in synctask_wrap (old_task=0x66e8240) at ../../../libglusterfs/src/syncop.c:120 #7 0x0000003cdba43610 in ?? () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () (gdb) f 5 #5 0x000000000986e38a in gf_defrag_start_crawl (data=0x5689dd0) at ../../../../../xlators/cluster/dht/src/dht-rebalance.c:1491 1491 GF_FREE (defrag); (gdb) p defrag $3 = (gf_defrag_info_t *) 0x569f400 (gdb) p *defrag $4 = {total_files = 0, total_data = 0, num_files_lookedup = 0, total_failures = 0, lock = 1, cmd = 1, th = 356079360, defrag_status = GF_DEFRAG_STATUS_COMPLETE, rpc = 0x0, connected = 0, is_exiting = 1, pid = -3, root_inode = 0x1579607c, node_uuid = "/>\"N\310\fK\031\223\370\260\250\230\204_\035"} (gdb) t 4 [Switching to thread 4 (Thread 28540)]#0 0x0000003cdbad97f7 in unlink () from /lib64/libc.so.6 (gdb) bt #0 0x0000003cdbad97f7 in unlink () from /lib64/libc.so.6 #1 0x0000000000407807 in glusterfs_pidfile_cleanup (ctx=0x52ea040) at ../../../glusterfsd/src/glusterfsd.c:1333 #2 0x000000000040632c in cleanup_and_exit (signum=15) at ../../../glusterfsd/src/glusterfsd.c:851 #3 0x0000000000407c18 in glusterfs_sigwaiter (arg=0x7fefffa70) at ../../../glusterfsd/src/glusterfsd.c:1421 #4 0x0000003cdc2077f1 in start_thread () from /lib64/libpthread.so.0 #5 0x0000003cdbae592d in clone () from /lib64/libc.so.6 (gdb) t 5 [Switching to thread 5 (Thread 28539)]#0 0x0000003cdbad862d in read () from /lib64/libc.so.6 (gdb) bt #0 0x0000003cdbad862d in read () from /lib64/libc.so.6 #1 0x0000003cdba70f58 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x0000003cdba72a5e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x0000003cdba6724a in _IO_getline_info_internal () from /lib64/libc.so.6 #4 0x0000003cdba6ff71 in fgets_unlocked () from /lib64/libc.so.6 #5 0x0000000008778656 in _nss_files_getservbyport_r () from /lib64/libnss_files.so.2 #6 0x0000003cdbb02cfa in getservbyport_r@@GLIBC_2.2.5 () from /lib64/libc.so.6 #7 0x0000003cdbb093f6 in getnameinfo () from /lib64/libc.so.6 #8 0x0000000004c39aa7 in gf_resolve_ip6 (hostname=0x6674750 "10.16.156.9", port=24007, family=0, dnscache=0x56e9ba8, addr_info=0x7feffdf00) at ../../../libglusterfs/src/common-utils.c:138 #9 0x000000000855a2ed in af_inet_client_get_remote_sockaddr (this=0x56e9b30, sockaddr=0x7feffdfd0, sockaddr_len=0x7feffe054) at ../../../../../rpc/rpc-transport/socket/src/name.c:241 #10 0x000000000855ae61 in socket_client_get_remote_sockaddr (this=0x56e9b30, sockaddr=0x7feffdfd0, sockaddr_len=0x7feffe054, sa_family=0x7feffe052) at ../../../../../rpc/rpc-transport/socket/src/name.c:499 #11 0x00000000085573cf in socket_connect (this=0x56e9b30, port=24009) at ../../../../../rpc/rpc-transport/socket/src/socket.c:2009 #12 0x0000000004eab9a8 in rpc_transport_connect (this=0x56e9b30, port=24009) at ../../../../rpc/rpc-lib/src/rpc-transport.c:380 #13 0x0000000004eb1394 in rpc_clnt_submit (rpc=0x56e96d0, prog=0x95cc340, procnum=27, cbkfn=0x93a8d10 <client3_1_lookup_cbk>, proghdr=0x7feffeb10, proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=0xee96a90, frame=0x5369a1c, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:1482 #14 0x000000000938dd4d in client_submit_request (this=0x5685000, req=0x7feffece0, frame=0x5369a1c, prog=0x95cc340, procnum=27, cbkfn=0x93a8d10 <client3_1_lookup_cbk>, iobref=0x0, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0, xdrproc=0x50c236c <xdr_gfs3_lookup_req>) at ../../../../../xlators/protocol/client/src/client.c:229 #15 0x00000000093aa6b5 in client3_1_lookup (frame=0x5369a1c, this=0x5685000, data=0x7feffedf0) at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2983 #16 0x000000000938e35d in client_lookup (frame=0x5369a1c, this=0x5685000, loc=0xb079d10, xdata=0x561ea70) at ../../../../../xlators/protocol/client/src/client.c:368 #17 0x0000000009636ccb in afr_lookup (frame=0x5369970, this=0x6677180, loc=0xa7447a0, xattr_req=0x561eb18) at ../../../../../xlators/cluster/afr/src/afr-common.c:2182 #18 0x000000000988143d in dht_lookup_directory (frame=0x536976c, this=0x5689dd0, loc=0xa7447a0) at ../../../../../xlators/cluster/dht/src/dht-common.c:1132 #19 0x0000000009881b3d in dht_lookup_cbk (frame=0x536976c, cookie=0x5369818, this=0x5689dd0, op_ret=-1, op_errno=107, inode=0x0, stbuf=0xb07ccac, xattr=0x0, postparent=0xb07cd1c) at ../../../../../xlators/cluster/dht/src/dht-common.c:1210 #20 0x0000000009635dbc in afr_lookup_done (frame=0x5369818, this=0x6677180) at ../../../../../xlators/cluster/afr/src/afr-common.c:1880 #21 0x00000000096364be in afr_lookup_cbk (frame=0x5369818, cookie=0x0, this=0x6677180, op_ret=-1, op_errno=107, inode=0x1579607c, buf=0x7fefff3f0, xattr=0x0, postparent=0x7fefff380) at ../../../../../xlators/cluster/afr/src/afr-common.c:2043 #22 0x00000000093a943c in client3_1_lookup_cbk (req=0xb6bb498, iov=0x7fefff600, count=1, myframe=0x53698c4) at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2636 #23 0x0000000004eaeb04 in saved_frames_unwind (saved_frames=0xb93b1f0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:378 #24 0x0000000004eaebb3 in saved_frames_destroy (frames=0xb93b1f0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:396 #25 0x0000000004eaf16d in rpc_clnt_connection_cleanup (conn=0x56e9700) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:558 #26 0x0000000004eafc4e in rpc_clnt_notify (trans=0x56e9b30, mydata=0x56e9700, event=RPC_TRANSPORT_DISCONNECT, data=0x56e9b30) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:861 #27 0x0000000004eabec8 in rpc_transport_notify (this=0x56e9b30, event=RPC_TRANSPORT_DISCONNECT, data=0x56e9b30) at ../../../../rpc/rpc-lib/src/rpc-transport.c:489 #28 0x00000000085521d7 in socket_event_poll_err (this=0x56e9b30) at ../../../../../rpc/rpc-transport/socket/src/socket.c:685 #29 0x0000000008556890 in socket_event_handler (fd=10, idx=6, data=0x56e9b30, poll_in=1, poll_out=0, poll_err=16) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1799 #30 0x0000000004c59d9c in event_dispatch_epoll_handler (event_pool=0x5307e60, events=0x666dc70, i=0) at ../../../libglusterfs/src/event.c:785 #31 0x0000000004c59fbf in event_dispatch_epoll (event_pool=0x5307e60) at ../../../libglusterfs/src/event.c:847 #32 0x0000000004c5a34a in event_dispatch (event_pool=0x5307e60) at ../../../libglusterfs/src/event.c:947 #33 0x0000000000408567 in main (argc=27, argv=0x7fefffc78) at ../../../glusterfsd/src/glusterfsd.c:1689 (gdb) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: gluster volume info Volume Name: mirror Type: Distributed-Replicate Volume ID: b99f6e32-ff75-4d87-9ed0-267cd8d6c66c Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.16.156.9:/export/mirror Brick2: 10.16.156.12:/export/mirror Brick3: 10.16.156.15:/export/mirror Brick4: 10.16.156.18:/export/mirror Options Reconfigured: performance.stat-prefetch: off network.ping-timeout: 222 features.lock-heal: on diagnostics.count-fop-hits: on diagnostics.latency-measurement: on features.limit-usage: /:1TB features.quota: on geo-replication.indexing: on valgrind log: ==28539== Thread 3: ==28539== Invalid read of size 4 ==28539== at 0x986E2DA: gf_defrag_start_crawl (dht-rebalance.c:1471) ==28539== Address 0x569f430 is 80 bytes inside a block of size 144 free'd ==28539== at 0x4A0595D: free (vg_replace_malloc.c:366) ==28539== ==28539== Invalid read of size 4 ==28539== at 0x986E2E6: gf_defrag_start_crawl (dht-rebalance.c:1472) ==28539== Address 0x569f430 is 80 bytes inside a block of size 144 free'd ==28539== at 0x4A0595D: free (vg_replace_malloc.c:366) ==28539== ==28539== Invalid write of size 4 ==28539== at 0x986E2F2: gf_defrag_start_crawl (dht-rebalance.c:1473) ==28539== Address 0x569f430 is 80 bytes inside a block of size 144 free'd ==28539== at 0x4A0595D: free (vg_replace_malloc.c:366) ==28539== ==28539== Invalid read of size 4 ==28539== at 0x3CDC20C110: pthread_spin_lock (in /lib64/libpthread-2.12.so) ==28539== Address 0x569f420 is 64 bytes inside a block of size 144 free'd ==28539== at 0x4A0595D: free (vg_replace_malloc.c:366) ==28539== ==28539== Invalid read of size 4 ==28539== at 0x986E5AE: gf_defrag_status_get (dht-rebalance.c:1560) ==28539== Address 0x569f430 is 80 bytes inside a block of size 144 free'd ==28539== at 0x4A0595D: free (vg_replace_malloc.c:366) ==28539== ==28539== Invalid read of size 8 ==28539== at 0x986E5BD: gf_defrag_status_get (dht-rebalance.c:1563) ==28539== Address 0x569f400 is 32 bytes inside a block of size 144 free'd ==28539== at 0x4A0595D: free (vg_replace_malloc.c:366) ==28539== ==28539== Invalid read of size 8 ==28539== at 0x986E5C8: gf_defrag_status_get (dht-rebalance.c:1564) ==28539== Address 0x569f408 is 40 bytes inside a block of size 144 free'd ==28539== at 0x4A0595D: free (vg_replace_malloc.c:366) ==28539== ==28539== Invalid read of size 8 ==28539== at 0x986E5D4: gf_defrag_status_get (dht-rebalance.c:1565) ==28539== Address 0x569f410 is 48 bytes inside a block of size 144 free'd ==28539== at 0x4A0595D: free (vg_replace_malloc.c:366)
The patch http://review.gluster.com/#change,3495 has been successfully merged into master branch.
*** Bug 823256 has been marked as a duplicate of this bug. ***
only in upstream for now.
*** Bug 856013 has been marked as a duplicate of this bug. ***