+++ This bug was initially created as a clone of Bug #960368 +++ Description of problem: A distributed-replicate volume had 5 vms on it with i/o running. after couple of add-bricks and rebalance client crashed Version-Release number of selected component (if applicable): [root@rhs1-bb ~]# rpm -qa| grep gluster glusterfs-server-3.4.0.3rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.3rhs-1.el6rhs.x86_64 glusterfs-devel-3.4.0.3rhs-1.el6rhs.x86_64 glusterfs-3.4.0.3rhs-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.3rhs-1.el6rhs.x86_64 How reproducible: Steps to Reproduce: 1. I had a 6x2 distributed-replicate volume serving as virtual machine storage 2. added one pair of brick and initiated rebalance 3. once rebalance is over i hit the bug 960046 4. restarted the vms and added 4 more bricks (2 more pairs) 5. again started rebalance which lead to 960046(vms got paused). after sometime storage domain was down, further inverstigation revealed client crash on the hypervisor. volume info ============ Volume Name: vstore Type: Distributed-Replicate Volume ID: e8fe6a61-6345-41f0-9329-a802b051a026 Status: Started Number of Bricks: 9 x 2 = 18 Transport-type: tcp Bricks: Brick1: 10.70.37.76:/brick1/vs1 Brick2: 10.70.37.133:/brick1/vs1 Brick3: 10.70.37.76:/brick2/vs2 Brick4: 10.70.37.133:/brick2/vs2 Brick5: 10.70.37.76:/brick3/vs3 Brick6: 10.70.37.133:/brick3/vs3 Brick7: 10.70.37.76:/brick4/vs4 Brick8: 10.70.37.133:/brick4/vs4 Brick9: 10.70.37.76:/brick5/vs5 Brick10: 10.70.37.133:/brick5/vs5 Brick11: 10.70.37.76:/brick6/vs6 Brick12: 10.70.37.133:/brick6/vs6 Brick13: 10.70.37.134:/brick1/vs1 Brick14: 10.70.37.59:/brick1/vs1 Brick15: 10.70.37.134:/brick2/vs7 Brick16: 10.70.37.59:/brick2/vs7 Brick17: 10.70.37.134:/brick3/vs8 Brick18: 10.70.37.59:/brick3/vs8 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: on bt === (gdb) bt #0 0x000000362780c170 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007fa3304de4c3 in fd_ref (fd=0x2d54124) at fd.c:447 #2 0x00007fa32e8afd4e in fuse_resolve_fd (state=<value optimized out>) at fuse-resolve.c:461 #3 fuse_resolve (state=<value optimized out>) at fuse-resolve.c:622 #4 0x00007fa32e8b026e in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:665 #5 0x00007fa32e8b02b8 in fuse_resolve_and_resume (state=0x7fa31c038cb0, fn=0x7fa32e8b87a0 <fuse_flush_resume>) at fuse-resolve.c:705 #6 0x00007fa32e8c66f8 in fuse_thread_proc (data=0x100fb70) at fuse-bridge.c:4562 #7 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 #8 0x00000036274e890d in clone () from /lib64/libc.so.6 Program terminated with signal 11, Segmentation fault. #0 0x000000362780c170 in pthread_spin_lock () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.2.x86_64 libcom_err-1.41.12-14.el6.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 openssl-1.0.0-27.el6_4.2.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) thread apply all bt full Thread 7 (Thread 0x7fa32c682700 (LWP 15004)): #0 0x000000362780ed2d in nanosleep () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa3304cfd4c in gf_timer_proc (ctx=0xff3010) at timer.c:173 now = 1367905197921268 now_tv = {tv_sec = 1367905197, tv_usec = 921268} event = 0x7fa3240008e0 reg = 0x10292b0 sleepts = {tv_sec = 1, tv_nsec = 0} __FUNCTION__ = "gf_timer_proc" #2 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 6 (Thread 0x7fa32dea6700 (LWP 15000)): #0 0x000000362780b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa3304eab1f in syncenv_task (proc=0x101b4f0) at syncop.c:335 env = 0x101b4f0 task = 0x0 sleep_till = {tv_sec = 1367905700, tv_nsec = 0} ret = <value optimized out> #2 0x00007fa3304ee8d0 in syncenv_processor (thdata=0x101b4f0) at syncop.c:414 env = 0x101b4f0 proc = 0x101b4f0 task = <value optimized out> #3 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 5 (Thread 0x7fa32e8a7700 (LWP 14999)): #0 0x000000362780f2a5 in sigwait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x000000000040523b in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:1572 set = {__val = {18947, 0 <repeats 15 times>}} ret = <value optimized out> sig = 10 ---Type <return> to continue, or q <return> to quit--- #2 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 4 (Thread 0x7fa32335d700 (LWP 15008)): #0 0x000000362780e54d in read () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa32e8b3ab3 in notify_kernel_loop (data=<value optimized out>) at fuse-bridge.c:3550 this = <value optimized out> priv = 0x10109e0 fouh = <value optimized out> rv = <value optimized out> inval_buf = '\000' <repeats 287 times> __FUNCTION__ = "notify_kernel_loop" #2 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 3 (Thread 0x7fa33006e700 (LWP 14998)): #0 0x00000036274e8f03 in epoll_wait () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fa330500817 in event_dispatch_epoll (event_pool=0x100eea0) at event-epoll.c:428 events = 0x102a050 i = <value optimized out> ret = 0 __FUNCTION__ = "event_dispatch_epoll" #2 0x00000000004066c6 in main (argc=4, argv=0x7fff277ad7c8) at glusterfsd.c:1902 ctx = 0xff3010 ret = 0 cmdlinestr = "/usr/sbin/glusterfs --volfile-id=vstore --volfile-server=10.70.37.76 /rhev/data-center/mnt/10.70.37.76:vstore", '\000' <repeats 3986 times> __FUNCTION__ = "main" Thread 2 (Thread 0x7fa32d4a5700 (LWP 15001)): #0 0x000000362780b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa3304eab1f in syncenv_task (proc=0x101b8b0) at syncop.c:335 env = 0x101b4f0 ---Type <return> to continue, or q <return> to quit--- task = 0x0 sleep_till = {tv_sec = 1367905700, tv_nsec = 0} ret = <value optimized out> #2 0x00007fa3304ee8d0 in syncenv_processor (thdata=0x101b8b0) at syncop.c:414 env = 0x101b4f0 proc = 0x101b8b0 task = <value optimized out> #3 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 1 (Thread 0x7fa323d5e700 (LWP 15007)): #0 0x000000362780c170 in pthread_spin_lock () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa3304de4c3 in fd_ref (fd=0x2d54124) at fd.c:447 refed_fd = 0x0 __FUNCTION__ = "fd_ref" #2 0x00007fa32e8afd4e in fuse_resolve_fd (state=<value optimized out>) at fuse-resolve.c:461 activefd = 0x2d54124 this = <value optimized out> ret = 0 fd_migration_error = 0 '\000' resolve = 0x7fa31c038db0 basefd = 0x1482d30 active_subvol = 0x0 basefd_ctx = 0x7fa31c0314a0 #3 fuse_resolve (state=<value optimized out>) at fuse-resolve.c:622 resolve = 0x7fa31c038db0 #4 0x00007fa32e8b026e in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:665 __FUNCTION__ = "fuse_resolve_all" #5 0x00007fa32e8b02b8 in fuse_resolve_and_resume (state=0x7fa31c038cb0, fn=0x7fa32e8b87a0 <fuse_flush_resume>) at fuse-resolve.c:705 No locals. #6 0x00007fa32e8c66f8 in fuse_thread_proc (data=0x100fb70) at fuse-bridge.c:4562 mount_point = 0x0 this = 0x100fb70 priv = 0x10109e0 res = <value optimized out> iobuf = 0x100e6a0 finh = 0x7fa31c03b970 ---Type <return> to continue, or q <return> to quit--- iov_in = {{iov_base = 0x7fa31c03b970, iov_len = 80}, {iov_base = 0x7fa32f96d000, iov_len = 131072}} msg = <value optimized out> fuse_ops = 0x7fa32ead2920 pfd = {{fd = 5, events = 25, revents = 1}, {fd = 7, events = 25, revents = 1}} mount_finished = _gf_true __FUNCTION__ = "fuse_thread_proc" #7 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #8 0x00000036274e890d in clone () from /lib64/libc.so.6 No symbol table info available. attaching the sosreport --- Additional comment from shylesh on 2013-05-07 02:27:16 EDT --- sosreports@ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/960368/ --- Additional comment from RHEL Product and Program Management on 2013-05-07 02:44:31 EDT --- Since this issue was entered in bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release. --- Additional comment from shishir gowda on 2013-05-07 06:40:25 EDT --- Please provide access to the host where the crash was seen --- Additional comment from shylesh on 2013-05-07 07:06:55 EDT --- (In reply to comment #3) > Please provide access to the host where the crash was seen hypervisor: rhs-gp-srv4.lab.eng.blr.redhat.com please let me know once you are done with access. --- Additional comment from shishir gowda on 2013-05-07 08:35:07 EDT --- Looks like a flush call has come in on a freed fd (gdb) bt #0 0x000000362780c170 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007fa3304de4c3 in fd_ref (fd=0x2d54124) at fd.c:447 #2 0x00007fa32e8afd4e in fuse_resolve_fd (state=<value optimized out>) at fuse-resolve.c:461 #3 fuse_resolve (state=<value optimized out>) at fuse-resolve.c:622 #4 0x00007fa32e8b026e in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:665 #5 0x00007fa32e8b02b8 in fuse_resolve_and_resume (state=0x7fa31c038cb0, fn=0x7fa32e8b87a0 <fuse_flush_resume>) at fuse-resolve.c:705 #6 0x00007fa32e8c66f8 in fuse_thread_proc (data=0x100fb70) at fuse-bridge.c:4562 #7 0x0000003627807851 in start_thread () from /lib64/libpthread.so.0 #8 0x00000036274e890d in clone () from /lib64/libc.so.6 (gdb) f 1 #1 0x00007fa3304de4c3 in fd_ref (fd=0x2d54124) at fd.c:447 447 LOCK (&fd->inode->lock); (gdb) p fd->inode $12 = (struct _inode *) 0xaaaaaaaa (gdb) p *fd $13 = {pid = 0, flags = 0, refcount = 0, inode_list = {next = 0x2d54134, prev = 0x2d54134}, inode = 0xaaaaaaaa, lock = 1, _ctx = 0x7fa31c03a090, xl_count = 32, lk_ctx = 0x7fa31c012230, anonymous = _gf_true} <=====ref count is zero (gdb) f 5 #5 0x00007fa32e8b02b8 in fuse_resolve_and_resume (state=0x7fa31c038cb0, fn=0x7fa32e8b87a0 <fuse_flush_resume>) at fuse-resolve.c:705 705 fuse_resolve_all (state); (gdb) p activefd $15 = (fd_t *) 0x2d54124 (gdb) p *basefd $17 = {pid = 8614, flags = 573442, refcount = 4, inode_list = {next = 0x7fa31b48f370, prev = 0x7fa31b48f370}, inode = 0x7fa31b48f338, lock = 0, _ctx = 0x7fa31c029d70, xl_count = 26, lk_ctx = 0x7fa31c00a5b0, anonymous = _gf_false} (gdb) p *basefd->inode $18 = {table = 0x1482930, gfid = "K\362\247w\304\030G>\226\222\343q\343\270:\t", lock = 1, nlookup = 51, fd_count = 4294967247, ref = 1, ia_type = IA_IFREG, fd_list = {next = 0x1482d40, prev = 0x1482d40}, dentry_list = {next = 0x7fa31b1ee3c8, prev = 0x7fa31b1ee3c8}, hash = {next = 0x7fa31b1270a0, prev = 0x7fa31b1270a0}, list = {next = 0x7fa31b48f91c, prev = 0x7fa31b48f7e4}, _ctx = 0x7fa31c036100} --- Additional comment from shishir gowda on 2013-05-07 08:56:10 EDT --- (gdb) p *(xlator_t *)fd->_ctx[2].xl_key $30 = {name = 0x14a3790 "vstore", type = 0x14a37b0 "debug/io-stats", next = 0x14dc3a0, prev = 0x0, parents = 0x0, children = 0x14a6570, options = 0x7fa32ecbc8a0, dlhandle = 0x1041c30, fops = 0x7fa32b17a5c0, cbks = 0x7fa32b17a860, dumpops = 0x0, volume_options = {next = 0x14a36c0, prev = 0x14a36c0}, fini = 0x7fa32af67f40 <fini>, init = 0x7fa32af683a0 <init>, reconfigure = 0x7fa32af68690 <reconfigure>, mem_acct_init = 0x7fa32af68620 <mem_acct_init>, notify = 0x7fa32af70da0 <notify>, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 46 times>}, history = 0x0, ctx = 0xff3010, graph = 0x14a5550, itable = 0x2c1d030, init_succeeded = 1 '\001', private = 0x14dd6e0, mem_acct = {num_types = 0, rec = 0x0}, winds = 1, switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false} (gdb) p *(xlator_t *)fd->_ctx[1].xl_key $31 = {name = 0x149f340 "vstore-open-behind", type = 0x14a3770 "performance/open-behind", next = 0x14dba00, prev = 0x14dcd40, parents = 0x14a64d0, children = 0x14a3700, options = 0x7fa32ecbc814, dlhandle = 0x1040bc0, fops = 0x7fa32b3824c0, cbks = 0x7fa32b382760, dumpops = 0x7fa32b382780, volume_options = {next = 0x14a36e0, prev = 0x14a36e0}, fini = 0x7fa32b17e650 <fini>, init = 0x7fa32b17e690 <init>, reconfigure = 0x7fa32b17e7e0 <reconfigure>, mem_acct_init = 0x7fa32b17e860 <mem_acct_init>, notify = 0x7fa3304c1190 <default_notify>, loglevel = GF_LOG_NONE, latencies = {{ min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 46 times>}, history = 0x0, ctx = 0xff3010, graph = 0x14a5550, itable = 0x0, init_succeeded = 1 '\001', private = 0x149f370, mem_acct = { num_types = 0, rec = 0x0}, winds = 0, switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false} --- Additional comment from shishir gowda on 2013-05-07 09:28:14 EDT --- Looks like a open-behind xlator issue --- Additional comment from RHEL Product and Program Management on 2013-05-08 03:01:49 EDT --- This bug report previously had all acks and release flag approved. However since at least one of its acks has been changed, the release flag has been reset to ? by the bugbot (pm-rhel). The ack needs to become approved before the release flag can become approved again.
If the following test script is executed we can see that the fuse mount crashes. #!/bin/bash . $(dirname $0)/../include.rc . $(dirname $0)/../volume.rc #This test tests that an extra fd_unref does not happen in rebalance #migration completion check code path in dht cleanup; TEST glusterd TEST pidof glusterd TEST $CLI volume create $V0 $H0:$B0/${V0}0 $H0:$B0/${V0}1 TEST $CLI volume set $V0 performance.quick-read off TEST $CLI volume set $V0 performance.io-cache off TEST $CLI volume set $V0 performance.write-behind off TEST $CLI volume set $V0 performance.stat-prefetch off TEST $CLI volume set $V0 performance.read-ahead off TEST $CLI volume start $V0 TEST glusterfs --volfile-id=/$V0 --volfile-server=$H0 $M0 --attribute-timeout=0 --entry-timeout=0 TEST touch $M0/1 #This rename creates a link file for 10 in the other volume. TEST mv $M0/1 $M0/10 #Lets keep writing to the file which will trigger rebalance completion check dd if=/dev/zero of=$M0/10 bs=1k & bg_pid=$! #Now rebalance force will migrate file '10' TEST $CLI volume rebalance $V0 start force EXPECT_WITHIN 60 "completed" rebalance_status_field $V0 #If the bug exists mount would have crashed by now TEST ls $M0 kill -9 $bg_pid > /dev/null 2>&1 wait > /dev/null 2>&1 cleanup
REVIEW: http://review.gluster.org/4974 (cluster/dht: Don't do extra unref in dht-migration checks) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/4974 (cluster/dht: Don't do extra unref in dht-migration checks) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/4974 committed in master by Vijay Bellur (vbellur) ------ commit 2acc8c1f04d0376bedf36f1f746e542422344fa6 Author: Pranith Kumar K <pkarampu> Date: Fri May 10 10:09:13 2013 +0530 cluster/dht: Don't do extra unref in dht-migration checks Problem: syncop_open used to perform a ref in syncop_open_cbk so the extra unref was needed but now syncop_open_cbk does not take a ref so no need to do extra unref. Fix: remove the extra fd_unref and let dht_local_wipe do the final unref. Change-Id: Ibe8f9a678d456a0c7bff175306068b5cd297ecc4 BUG: 961615 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/4974 Reviewed-by: Raghavendra Bhat <raghavendra> Reviewed-by: Krishnan Parthasarathi <kparthas> Reviewed-by: Amar Tumballi <amarts> Tested-by: Gluster Build System <jenkins.com>
REVIEW: http://review.gluster.org/8029 (cluster/dht: Don't do extra unref in dht-migration checks) posted (#1) for review on release-3.4 by Vijay Bellur (vbellur)
COMMIT: http://review.gluster.org/8029 committed in release-3.4 by Kaleb KEITHLEY (kkeithle) ------ commit 2b789331dc933b186360fc8cbffb06289ee60ee9 Author: Vijay Bellur <vbellur> Date: Tue Jun 10 22:21:28 2014 +0530 cluster/dht: Don't do extra unref in dht-migration checks Problem: syncop_open used to perform a ref in syncop_open_cbk so the extra unref was needed but now syncop_open_cbk does not take a ref so no need to do extra unref. Fix: remove the extra fd_unref and let dht_local_wipe do the final unref. Change-Id: Ibe8f9a678d456a0c7bff175306068b5cd297ecc4 BUG: 961615 Signed-off-by: Pranith Kumar K <pkarampu> Signed-off-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/8029 Tested-by: Gluster Build System <jenkins.com> Tested-by: Joe Julian <joe> Reviewed-by: Kaleb KEITHLEY <kkeithle>
Should this be applied to release-3.5 branch as well? Only seeing mention of master and release-3.4 branches in this BZ.
(In reply to Justin Clift from comment #7) > Should this be applied to release-3.5 branch as well? > > Only seeing mention of master and release-3.4 branches in this BZ. Already fixed in 3.5.0.