Description of problem: A distributed-replicate volume had 5 vms on it with i/o running. after couple of add-bricks and rebalance client crashed Version-Release number of selected component (if applicable): [root@rhs1-bb ~]# rpm -qa| grep gluster glusterfs-server-3.4.0.3rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.3rhs-1.el6rhs.x86_64 glusterfs-devel-3.4.0.3rhs-1.el6rhs.x86_64 glusterfs-3.4.0.3rhs-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.3rhs-1.el6rhs.x86_64 How reproducible: Steps to Reproduce: 1. I had a 6x2 distributed-replicate volume serving as virtual machine storage 2. added one pair of brick and initiated rebalance 3. once rebalance is over i hit the bug 960046 4. restarted the vms and added 4 more bricks (2 more pairs) 5. again started rebalance which lead to 960046(vms got paused). after sometime storage domain was down, further inverstigation revealed client crash on the hypervisor. volume info ============ Volume Name: vstore Type: Distributed-Replicate Volume ID: e8fe6a61-6345-41f0-9329-a802b051a026 Status: Started Number of Bricks: 9 x 2 = 18 Transport-type: tcp Bricks: Brick1: 10.70.37.76:/brick1/vs1 Brick2: 10.70.37.133:/brick1/vs1 Brick3: 10.70.37.76:/brick2/vs2 Brick4: 10.70.37.133:/brick2/vs2 Brick5: 10.70.37.76:/brick3/vs3 Brick6: 10.70.37.133:/brick3/vs3 Brick7: 10.70.37.76:/brick4/vs4 Brick8: 10.70.37.133:/brick4/vs4 Brick9: 10.70.37.76:/brick5/vs5 Brick10: 10.70.37.133:/brick5/vs5 Brick11: 10.70.37.76:/brick6/vs6 Brick12: 10.70.37.133:/brick6/vs6 Brick13: 10.70.37.134:/brick1/vs1 Brick14: 10.70.37.59:/brick1/vs1 Brick15: 10.70.37.134:/brick2/vs7 Brick16: 10.70.37.59:/brick2/vs7 Brick17: 10.70.37.134:/brick3/vs8 Brick18: 10.70.37.59:/brick3/vs8 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: on bt === (gdb) bt #0 0x000000362780c170 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007fa3304de4c3 in fd_ref (fd=0x2d54124) at fd.c:447 #2 0x00007fa32e8afd4e in fuse_resolve_fd (state=<value optimized out>) at fuse-resolve.c:461 #3 fuse_resolve (state=<value optimized out>) at fuse-resolve.c:622 #4 0x00007fa32e8b026e in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:665 #5 0x00007fa32e8b02b8 in fuse_resolve_and_resume (state=0x7fa31c038cb0, fn=0x7fa32e8b87a0 <fuse_flush_resume>) at fuse-resolve.c:705 #6 0x00007fa32e8c66f8 in fuse_thread_proc (data=0x100fb70) at fuse-bridge.c:4562 #7 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 #8 0x00000036274e890d in clone () from /lib64/libc.so.6 Program terminated with signal 11, Segmentation fault. #0 0x000000362780c170 in pthread_spin_lock () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.2.x86_64 libcom_err-1.41.12-14.el6.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 openssl-1.0.0-27.el6_4.2.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) thread apply all bt full Thread 7 (Thread 0x7fa32c682700 (LWP 15004)): #0 0x000000362780ed2d in nanosleep () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa3304cfd4c in gf_timer_proc (ctx=0xff3010) at timer.c:173 now = 1367905197921268 now_tv = {tv_sec = 1367905197, tv_usec = 921268} event = 0x7fa3240008e0 reg = 0x10292b0 sleepts = {tv_sec = 1, tv_nsec = 0} __FUNCTION__ = "gf_timer_proc" #2 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 6 (Thread 0x7fa32dea6700 (LWP 15000)): #0 0x000000362780b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa3304eab1f in syncenv_task (proc=0x101b4f0) at syncop.c:335 env = 0x101b4f0 task = 0x0 sleep_till = {tv_sec = 1367905700, tv_nsec = 0} ret = <value optimized out> #2 0x00007fa3304ee8d0 in syncenv_processor (thdata=0x101b4f0) at syncop.c:414 env = 0x101b4f0 proc = 0x101b4f0 task = <value optimized out> #3 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 5 (Thread 0x7fa32e8a7700 (LWP 14999)): #0 0x000000362780f2a5 in sigwait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x000000000040523b in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:1572 set = {__val = {18947, 0 <repeats 15 times>}} ret = <value optimized out> sig = 10 ---Type <return> to continue, or q <return> to quit--- #2 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 4 (Thread 0x7fa32335d700 (LWP 15008)): #0 0x000000362780e54d in read () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa32e8b3ab3 in notify_kernel_loop (data=<value optimized out>) at fuse-bridge.c:3550 this = <value optimized out> priv = 0x10109e0 fouh = <value optimized out> rv = <value optimized out> inval_buf = '\000' <repeats 287 times> __FUNCTION__ = "notify_kernel_loop" #2 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 3 (Thread 0x7fa33006e700 (LWP 14998)): #0 0x00000036274e8f03 in epoll_wait () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fa330500817 in event_dispatch_epoll (event_pool=0x100eea0) at event-epoll.c:428 events = 0x102a050 i = <value optimized out> ret = 0 __FUNCTION__ = "event_dispatch_epoll" #2 0x00000000004066c6 in main (argc=4, argv=0x7fff277ad7c8) at glusterfsd.c:1902 ctx = 0xff3010 ret = 0 cmdlinestr = "/usr/sbin/glusterfs --volfile-id=vstore --volfile-server=10.70.37.76 /rhev/data-center/mnt/10.70.37.76:vstore", '\000' <repeats 3986 times> __FUNCTION__ = "main" Thread 2 (Thread 0x7fa32d4a5700 (LWP 15001)): #0 0x000000362780b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa3304eab1f in syncenv_task (proc=0x101b8b0) at syncop.c:335 env = 0x101b4f0 ---Type <return> to continue, or q <return> to quit--- task = 0x0 sleep_till = {tv_sec = 1367905700, tv_nsec = 0} ret = <value optimized out> #2 0x00007fa3304ee8d0 in syncenv_processor (thdata=0x101b8b0) at syncop.c:414 env = 0x101b4f0 proc = 0x101b8b0 task = <value optimized out> #3 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 1 (Thread 0x7fa323d5e700 (LWP 15007)): #0 0x000000362780c170 in pthread_spin_lock () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa3304de4c3 in fd_ref (fd=0x2d54124) at fd.c:447 refed_fd = 0x0 __FUNCTION__ = "fd_ref" #2 0x00007fa32e8afd4e in fuse_resolve_fd (state=<value optimized out>) at fuse-resolve.c:461 activefd = 0x2d54124 this = <value optimized out> ret = 0 fd_migration_error = 0 '\000' resolve = 0x7fa31c038db0 basefd = 0x1482d30 active_subvol = 0x0 basefd_ctx = 0x7fa31c0314a0 #3 fuse_resolve (state=<value optimized out>) at fuse-resolve.c:622 resolve = 0x7fa31c038db0 #4 0x00007fa32e8b026e in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:665 __FUNCTION__ = "fuse_resolve_all" #5 0x00007fa32e8b02b8 in fuse_resolve_and_resume (state=0x7fa31c038cb0, fn=0x7fa32e8b87a0 <fuse_flush_resume>) at fuse-resolve.c:705 No locals. #6 0x00007fa32e8c66f8 in fuse_thread_proc (data=0x100fb70) at fuse-bridge.c:4562 mount_point = 0x0 this = 0x100fb70 priv = 0x10109e0 res = <value optimized out> iobuf = 0x100e6a0 finh = 0x7fa31c03b970 ---Type <return> to continue, or q <return> to quit--- iov_in = {{iov_base = 0x7fa31c03b970, iov_len = 80}, {iov_base = 0x7fa32f96d000, iov_len = 131072}} msg = <value optimized out> fuse_ops = 0x7fa32ead2920 pfd = {{fd = 5, events = 25, revents = 1}, {fd = 7, events = 25, revents = 1}} mount_finished = _gf_true __FUNCTION__ = "fuse_thread_proc" #7 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #8 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. attaching the sosreport
Please provide access to the host where the crash was seen
Looks like a flush call has come in on a freed fd (gdb) bt #0 0x000000362780c170 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007fa3304de4c3 in fd_ref (fd=0x2d54124) at fd.c:447 #2 0x00007fa32e8afd4e in fuse_resolve_fd (state=<value optimized out>) at fuse-resolve.c:461 #3 fuse_resolve (state=<value optimized out>) at fuse-resolve.c:622 #4 0x00007fa32e8b026e in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:665 #5 0x00007fa32e8b02b8 in fuse_resolve_and_resume (state=0x7fa31c038cb0, fn=0x7fa32e8b87a0 <fuse_flush_resume>) at fuse-resolve.c:705 #6 0x00007fa32e8c66f8 in fuse_thread_proc (data=0x100fb70) at fuse-bridge.c:4562 #7 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 #8 0x00000036274e890d in clone () from /lib64/libc.so.6 (gdb) f 1 #1 0x00007fa3304de4c3 in fd_ref (fd=0x2d54124) at fd.c:447 447 LOCK (&fd->inode->lock); (gdb) p fd->inode $12 = (struct _inode *) 0xaaaaaaaa (gdb) p *fd $13 = {pid = 0, flags = 0, refcount = 0, inode_list = {next = 0x2d54134, prev = 0x2d54134}, inode = 0xaaaaaaaa, lock = 1, _ctx = 0x7fa31c03a090, xl_count = 32, lk_ctx = 0x7fa31c012230, anonymous = _gf_true} <=====ref count is zero (gdb) f 5 #5 0x00007fa32e8b02b8 in fuse_resolve_and_resume (state=0x7fa31c038cb0, fn=0x7fa32e8b87a0 <fuse_flush_resume>) at fuse-resolve.c:705 705 fuse_resolve_all (state); (gdb) p activefd $15 = (fd_t *) 0x2d54124 (gdb) p *basefd $17 = {pid = 8614, flags = 573442, refcount = 4, inode_list = {next = 0x7fa31b48f370, prev = 0x7fa31b48f370}, inode = 0x7fa31b48f338, lock = 0, _ctx = 0x7fa31c029d70, xl_count = 26, lk_ctx = 0x7fa31c00a5b0, anonymous = _gf_false} (gdb) p *basefd->inode $18 = {table = 0x1482930, gfid = "K\362\247w\304\030G>\226\222\343q\343\270:\t", lock = 1, nlookup = 51, fd_count = 4294967247, ref = 1, ia_type = IA_IFREG, fd_list = {next = 0x1482d40, prev = 0x1482d40}, dentry_list = {next = 0x7fa31b1ee3c8, prev = 0x7fa31b1ee3c8}, hash = {next = 0x7fa31b1270a0, prev = 0x7fa31b1270a0}, list = {next = 0x7fa31b48f91c, prev = 0x7fa31b48f7e4}, _ctx = 0x7fa31c036100}
(gdb) p *(xlator_t *)fd->_ctx[2].xl_key $30 = {name = 0x14a3790 "vstore", type = 0x14a37b0 "debug/io-stats", next = 0x14dc3a0, prev = 0x0, parents = 0x0, children = 0x14a6570, options = 0x7fa32ecbc8a0, dlhandle = 0x1041c30, fops = 0x7fa32b17a5c0, cbks = 0x7fa32b17a860, dumpops = 0x0, volume_options = {next = 0x14a36c0, prev = 0x14a36c0}, fini = 0x7fa32af67f40 <fini>, init = 0x7fa32af683a0 <init>, reconfigure = 0x7fa32af68690 <reconfigure>, mem_acct_init = 0x7fa32af68620 <mem_acct_init>, notify = 0x7fa32af70da0 <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 46 times>}, history = 0x0, ctx = 0xff3010, graph = 0x14a5550, itable = 0x2c1d030, init_succeeded = 1 '\001', private = 0x14dd6e0, mem_acct = {num_types = 0, rec = 0x0}, winds = 1, switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false} (gdb) p *(xlator_t *)fd->_ctx[1].xl_key $31 = {name = 0x149f340 "vstore-open-behind", type = 0x14a3770 "performance/open-behind", next = 0x14dba00, prev = 0x14dcd40, parents = 0x14a64d0, children = 0x14a3700, options = 0x7fa32ecbc814, dlhandle = 0x1040bc0, fops = 0x7fa32b3824c0, cbks = 0x7fa32b382760, dumpops = 0x7fa32b382780, volume_options = {next = 0x14a36e0, prev = 0x14a36e0}, fini = 0x7fa32b17e650 <fini>, init = 0x7fa32b17e690 <init>, reconfigure = 0x7fa32b17e7e0 <reconfigure>, mem_acct_init = 0x7fa32b17e860 <mem_acct_init>, notify = 0x7fa3304c1190 <default_notify>, loglevel = GF_LOG_NONE, latencies = {{ min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 46 times>}, history = 0x0, ctx = 0xff3010, graph = 0x14a5550, itable = 0x0, init_succeeded = 1 '\001', private = 0x149f370, mem_acct = { num_types = 0, rec = 0x0}, winds = 0, switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false}
Looks like a open-behind xlator issue
I looked at the fd_unrefs that are present in different xlators in the mount graph. I saw one unmatched fd_unref in dht. I am able to recreate a crash at a different place using the following script again in fd_ref in fuse_writev code path. #!/bin/bash . $(dirname $0)/../include.rc . $(dirname $0)/../volume.rc #This test tests that an extra fd_unref does not happen in rebalance #migration completion check code path in dht cleanup; TEST glusterd TEST pidof glusterd TEST $CLI volume create $V0 $H0:$B0/${V0}0 $H0:$B0/${V0}1 TEST $CLI volume set $V0 performance.quick-read off TEST $CLI volume set $V0 performance.io-cache off TEST $CLI volume set $V0 performance.write-behind off TEST $CLI volume set $V0 performance.stat-prefetch off TEST $CLI volume set $V0 performance.read-ahead off TEST $CLI volume start $V0 TEST glusterfs --volfile-id=/$V0 --volfile-server=$H0 $M0 --attribute-timeout=0 --entry-timeout=0 TEST touch $M0/1 #This rename creates a link file for 10 in the other volume. TEST mv $M0/1 $M0/10 #Lets keep writing to the file which will trigger rebalance completion check dd if=/dev/zero of=$M0/10 bs=1k & bg_pid=$! #Now rebalance force will migrate file '10' TEST $CLI volume rebalance $V0 start force EXPECT_WITHIN 60 "completed" rebalance_status_field $V0 #If the bug exists mount would have crashed by now TEST ls $M0 kill -9 $bg_pid > /dev/null 2>&1 wait > /dev/null 2>&1 cleanup
Verified on 3.4.0.8rhs-1.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html