Created attachment 1112126 [details] core file Description of problem: ======================= Was tryng to reproduce https://bugzilla.redhat.com/show_bug.cgi?id=1296048 disabling uss and quota and did a detach tier and hit the crash. Backtrace: ========= (gdb) t a a bt Thread 25 (Thread 0x7f7a0ffff700 (LWP 29525)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f7a84a15f18 in syncenv_task (proc=proc@entry=0x7f7a8617c1d0) at syncop.c:607 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a8617c1d0) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a0ffff700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 24 (Thread 0x7f7a6e15d700 (LWP 26306)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f7a76a0175a in gf_defrag_task (opaque=0x7f7a70027230) at dht-rebalance.c:2095 #2 0x00007f7a8383adc5 in start_thread (arg=0x7f7a6e15d700) at pthread_create.c:308 #3 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 23 (Thread 0x7f7a6d15b700 (LWP 26308)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f7a76a0175a in gf_defrag_task (opaque=0x7f7a70027230) at dht-rebalance.c:2095 #2 0x00007f7a8383adc5 in start_thread (arg=0x7f7a6d15b700) at pthread_create.c:308 #3 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 22 (Thread 0x7f7a519f8700 (LWP 26461)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f7a84a15f18 in syncenv_task (proc=proc@entry=0x7f7a8617a010) at syncop.c:607 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a8617a010) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a519f8700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 21 (Thread 0x7f7a2cff9700 (LWP 28208)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f7a84a15f18 in syncenv_task (proc=proc@entry=0x7f7a8617be10) at syncop.c:607 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a8617be10) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a2cff9700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 20 (Thread 0x7f7a7afec700 (LWP 25758)): #0 0x00007f7a83841e91 in do_sigwait (sig=0x7f7a7afebe1c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:61 #1 __sigwait (set=set@entry=0x7f7a7afebe20, sig=sig@entry=0x7f7a7afebe1c) at ../sysdeps/unix/sysv/linux/sigwait.c:99 #2 0x00007f7a84ea58bb in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2006 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a7afec700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 19 (Thread 0x7f7a52ffd700 (LWP 26312)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f7a84a19c1b in syncop_setxattr (subvol=subvol@entry=0x7f7a7001e9b0, loc=loc@entry=0x7f7a52ffcd70, dict=dict@entry=0x7f7a81ef8994, flags=flags@entry=0, xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1577 #2 0x00007f7a76a01249 in gf_defrag_migrate_single_file (opaque=opaque@entry=0x7f7a28007ca0) at dht-rebalance.c:1963 #3 0x00007f7a76a01936 in gf_defrag_task (opaque=0x7f7a70027230) at dht-rebalance.c:2125 #4 0x00007f7a8383adc5 in start_thread (arg=0x7f7a52ffd700) at pthread_create.c:308 #5 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 18 (Thread 0x7f7a79fea700 (LWP 25760)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f7a84a15f18 in syncenv_task (proc=proc@entry=0x7f7a86179c50) at syncop.c:607 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a86179c50) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a79fea700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 17 (Thread 0x7f7a537fe700 (LWP 26311)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f7a84a19c1b in syncop_setxattr (subvol=subvol@entry=0x7f7a7001e9b0, loc=loc@entry=0x7f7a537fdd70, dict=dict@entry=0x7f7a81ef8994, flags=flags@entry=0, xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1577 #2 0x00007f7a76a01249 in gf_defrag_migrate_single_file (opaque=opaque@entry=0x7f7a28005380) at dht-rebalance.c:1963 #3 0x00007f7a76a01936 in gf_defrag_task (opaque=0x7f7a70027230) at dht-rebalance.c:2125 ---Type <return> to continue, or q <return> to quit--- #4 0x00007f7a8383adc5 in start_thread (arg=0x7f7a537fe700) at pthread_create.c:308 #5 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 16 (Thread 0x7f7a53fff700 (LWP 26310)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f7a84a19c1b in syncop_setxattr (subvol=subvol@entry=0x7f7a7001e9b0, loc=loc@entry=0x7f7a53ffed70, dict=dict@entry=0x7f7a81ef8994, flags=flags@entry=0, xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1577 #2 0x00007f7a76a01249 in gf_defrag_migrate_single_file (opaque=opaque@entry=0x7f7a600059d0) at dht-rebalance.c:1963 #3 0x00007f7a76a01936 in gf_defrag_task (opaque=0x7f7a70027230) at dht-rebalance.c:2125 #4 0x00007f7a8383adc5 in start_thread (arg=0x7f7a53fff700) at pthread_create.c:308 #5 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 15 (Thread 0x7f7a84e7c780 (LWP 25756)): #0 0x00007f7a8383bef7 in pthread_join (threadid=140163968702208, thread_return=thread_return@entry=0x0) at pthread_join.c:92 #1 0x00007f7a84a33c28 in event_dispatch_epoll (event_pool=0x7f7a86168d10) at event-epoll.c:762 #2 0x00007f7a84ea27f7 in main (argc=33, argv=0x7ffc0dec5de8) at glusterfsd.c:2350 Thread 14 (Thread 0x7f7a6d95c700 (LWP 26307)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f7a76a0175a in gf_defrag_task (opaque=0x7f7a70027230) at dht-rebalance.c:2095 #2 0x00007f7a8383adc5 in start_thread (arg=0x7f7a6d95c700) at pthread_create.c:308 #3 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 13 (Thread 0x7f7a2effd700 (LWP 26974)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f7a84a15f18 in syncenv_task (proc=proc@entry=0x7f7a8617af10) at syncop.c:607 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a8617af10) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a2effd700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 12 (Thread 0x7f7a6c95a700 (LWP 26309)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f7a76a0175a in gf_defrag_task (opaque=0x7f7a70027230) at dht-rebalance.c:2095 #2 0x00007f7a8383adc5 in start_thread (arg=0x7f7a6c95a700) at pthread_create.c:308 #3 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 11 (Thread 0x7f7a6ffff700 (LWP 26287)): #0 0x00007f7a831817a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f7a84a33720 in event_dispatch_epoll_worker (data=0x7f7a70038350) at event-epoll.c:668 #2 0x00007f7a8383adc5 in start_thread (arg=0x7f7a6ffff700) at pthread_create.c:308 #3 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 10 (Thread 0x7f7a2ffff700 (LWP 26941)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f7a84a15f18 in syncenv_task (proc=proc@entry=0x7f7a8617a790) at syncop.c:607 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a8617a790) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a2ffff700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 9 (Thread 0x7f7a2f7fe700 (LWP 26973)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f7a84a15f18 in syncenv_task (proc=proc@entry=0x7f7a8617ab50) at syncop.c:607 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a8617ab50) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a2f7fe700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 8 (Thread 0x7f7a2e7fc700 (LWP 26976)): #0 swapcontext () at ../sysdeps/unix/sysv/linux/x86_64/swapcontext.S:79 #1 0x00007f7a84a13d80 in synctask_yield (task=0x7f7a643b6020) at syncop.c:343 #2 0x00007f7a830d2110 in ?? () from /lib64/libc.so.6 #3 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f7a77909700 (LWP 25763)): ---Type <return> to continue, or q <return> to quit--- #0 0x00007f7a831817a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f7a84a33720 in event_dispatch_epoll_worker (data=0x7f7a861b6ac0) at event-epoll.c:668 #2 0x00007f7a8383adc5 in start_thread (arg=0x7f7a77909700) at pthread_create.c:308 #3 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 6 (Thread 0x7f7a7b7ed700 (LWP 25757)): #0 0x00007f7a8384196d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f7a849f1924 in gf_timer_proc (ctx=0x7f7a8614a010) at timer.c:205 #2 0x00007f7a8383adc5 in start_thread (arg=0x7f7a7b7ed700) at pthread_create.c:308 #3 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 5 (Thread 0x7f7a527fc700 (LWP 26313)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f7a84a19c1b in syncop_setxattr (subvol=subvol@entry=0x7f7a7001e9b0, loc=loc@entry=0x7f7a527fbd70, dict=dict@entry=0x7f7a81ef8994, flags=flags@entry=0, xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1577 #2 0x00007f7a76a01249 in gf_defrag_migrate_single_file (opaque=opaque@entry=0x7f7a1400a830) at dht-rebalance.c:1963 #3 0x00007f7a76a01936 in gf_defrag_task (opaque=0x7f7a70027230) at dht-rebalance.c:2125 #4 0x00007f7a8383adc5 in start_thread (arg=0x7f7a527fc700) at pthread_create.c:308 #5 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 4 (Thread 0x7f7a7a7eb700 (LWP 25759)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f7a84a15f18 in syncenv_task (proc=proc@entry=0x7f7a86179890) at syncop.c:607 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a86179890) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a7a7eb700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 3 (Thread 0x7f7a50df5700 (LWP 26465)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007f7a84a15f18 in syncenv_task (proc=proc@entry=0x7f7a8617a3d0) at syncop.c:607 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a8617a3d0) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a50df5700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 2 (Thread 0x7f7a2d7fa700 (LWP 27170)): #0 list_empty (head=<optimized out>) at list.h:114 #1 syncenv_task (proc=proc@entry=0x7f7a8617ba50) at syncop.c:609 #2 0x00007f7a84a16c50 in syncenv_processor (thdata=0x7f7a8617ba50) at syncop.c:699 #3 0x00007f7a8383adc5 in start_thread (arg=0x7f7a2d7fa700) at pthread_create.c:308 #4 0x00007f7a831811cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 1 (Thread 0x7f7a2dffb700 (LWP 27152)): #0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24 #1 0x00007f7a84a02e07 in fd_unref (fd=0x7f7a5c005b18) at fd.c:559 #2 0x00007f7a84a1d90e in syncop_close (fd=fd@entry=0x7f7a5c005b18) at syncop.c:2021 #3 0x00007f7a769fdd49 in dht_migrate_file (this=0x7f7a7001e9b0, loc=<optimized out>, from=0x7f7a7001da90, to=0x7f7a7001c5a0, flag=<optimized out>) at dht-rebalance.c:1644 #4 0x00007f7a84a13e02 in synctask_wrap (old_task=<optimized out>) at syncop.c:380 #5 0x00007f7a830d2110 in ?? () from /lib64/libc.so.6 #6 0x0000000000000000 in ?? () (gdb) (gdb) Version-Release number of selected component (if applicable): ============================================================= 3.7.5-14 How reproducible: ================= seen once Steps to Reproduce: 1. Create 2x(4+2) ec volume and nfs mount on the client 2. Disable uss and enable quota 3. Start io (linux untar, files and dir creation) 4. Attach a 2x2 hot tier. 5. Run IO for some time and then detach tier. Actual results: ============== Rebalance crash Expected results: Additional info: ================ Attaching the core file.
The rebalance log file output : pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) [2016-01-06 10:14:14.816071] W [MSGID: 114031] [client-rpc-fops.c:2325:client3_3_setattr_cbk] 0-disperse_vol1-client-13: remote operation failed [No such file or directory] frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2016-01-06 10:14:14 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.5 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f7a849d2002] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f7a849ee48d] /lib64/libc.so.6(+0x35670)[0x7f7a830c0670] /lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x7f7a8383f210] ---------
Analysis: From the core: (gdb) bt #0 pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24 #1 0x00007f7a84a02e07 in fd_unref (fd=0x7f7a5c005b18) at fd.c:559 #2 0x00007f7a84a1d90e in syncop_close (fd=fd@entry=0x7f7a5c005b18) at syncop.c:2021 #3 0x00007f7a769fdd49 in dht_migrate_file (this=0x7f7a7001e9b0, loc=<optimized out>, from=0x7f7a7001da90, to=0x7f7a7001c5a0, flag=<optimized out>) at dht-rebalance.c:1644 #4 0x00007f7a84a13e02 in synctask_wrap (old_task=<optimized out>) at syncop.c:380 #5 0x00007f7a830d2110 in ?? () from /lib64/libc.so.6 #6 0x0000000000000000 in ?? () loc is optimized out but tmp_loc is not. (gdb) p tmp_loc $5 = {path = 0x7f7a2000acb0 "/dirs/dir.2/testfile.919", name = 0x0, inode = 0x7f7a6e814e5c, parent = 0x0, gfid = "\315\373\232\366\212\341OÙ :&R\273\\\330>", pargfid = '\000' <repeats 15 times>} (gdb) From the rebalance log file: [2016-01-06 10:14:14.801309] E [MSGID: 109023] [dht-rebalance.c:598:__dht_rebalance_create_dst_file] 0-disperse_vol1-tier-dht: /dirs/dir.2/testfile.919: file does not existson disperse_vol1-cold-dht (No such file or directory) Examining the code, the dst_fd is unrefed twice. Once in __dht_rebalance_create_dst_file: if (dst_fd) *dst_fd = fd; ... if (-ret == ENOENT) { gf_msg (this->name, GF_LOG_ERROR, 0, DHT_MSG_MIGRATE_FILE_FAILED, "%s: file does not exists" "on %s (%s)", loc->path, to->name, strerror (-ret)); ret = -1; fd_unref (fd); goto out; } and once again in dht_migrate_file () -> syncop_close (dst_fd) The core dump does not show an invalid inode but I was able to reproduce the crash by setting ret to -ENOENT in gdb.
Patch posted upstream.
Downstream patch : https://code.engineering.redhat.com/gerrit/#/c/65199/
Verified this on 3.7.5-17 and didn't see the crash. Marking this as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html