Description of problem: 2 replica pure replicate volume. 1 fuse client. Started running ping_pong on the client and, then enable quota, set limit on /, disabled write-behind. While tests were going on enabled client side io-threads. Then brought down a brick and started it, and the brick crashed in __mq_add_new_contribution_node (nice loc->parent was NULL as now parent need not be resolved because of gfid based backend changes and for that crash a bug has been filed). So disabled quota, brought down a brick and started volume force. glusterfs server crashed with the following backtrace. Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.hyperspace.mnt-sda7'. Program terminated with signal 11, Segmentation fault. #0 0x00007f061fbdf1d3 in pl_lk (frame=0x7f0626682658, this=0x67b1d0, fd=0x0, cmd=7, flock=0x7f0626355320) at ../../../../../xlators/features/locks/src/posix.c:988 988 pl_inode = pl_inode_get (this, fd->inode); (gdb) bt #0 0x00007f061fbdf1d3 in pl_lk (frame=0x7f0626682658, this=0x67b1d0, fd=0x0, cmd=7, flock=0x7f0626355320) at ../../../../../xlators/features/locks/src/posix.c:988 #1 0x00007f061f9c4e5b in iot_lk_wrapper (frame=0x7f06266825ac, this=0x67c370, fd=0x0, cmd=7, flock=0x7f0626355320) at ../../../../../xlators/performance/io-threads/src/io-threads.c:1114 #2 0x00007f06280ba2cc in call_resume_wind (stub=0x7f06263552d8) at ../../../libglusterfs/src/call-stub.c:2350 #3 0x00007f06280c1278 in call_resume (stub=0x7f06263552d8) at ../../../libglusterfs/src/call-stub.c:3853 #4 0x00007f061f9be7c0 in iot_worker (data=0x686ed0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:138 #5 0x00007f062782ed8c in start_thread (arg=0x7f0624146700) at pthread_create.c:304 #6 0x00007f062757a04d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () (gdb) info thr 6 Thread 29621 do_sigwait (set=<value optimized out>, sig=0x7f0626353eb8) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65 5 Thread 29620 __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:373 4 Thread 29622 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 3 Thread 29635 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 2 Thread 29624 0x00007f06278374bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 * 1 Thread 29634 0x00007f061fbdf1d3 in pl_lk (frame=0x7f0626682658, this=0x67b1d0, fd=0x0, cmd=7, flock=0x7f0626355320) at ../../../../../xlators/features/locks/src/posix.c:988 (gdb) f 0 #0 0x00007f061fbdf1d3 in pl_lk (frame=0x7f0626682658, this=0x67b1d0, fd=0x0, cmd=7, flock=0x7f0626355320) at ../../../../../xlators/features/locks/src/posix.c:988 988 pl_inode = pl_inode_get (this, fd->inode); (gdb) p fd $1 = (fd_t *) 0x0 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Running ping_pong on the fuse client, bringing a brick down and bringing it back up will crash the brick.
CHANGE: http://review.gluster.com/2684 (protocol/client: if the remote_fd is -1, then unwind instead of sending the call to server) merged in master by Anand Avati (avati)
Ran ping_pong, brought a brick down and then brought it up. No brick crashed. Checked with glusterfs-3.3.0qa40.