Description of problem: Glusterfs client crashed on rdma with ltp testsuite. Version-Release number of selected component (if applicable): 3.3.0qa41 How reproducible: Consistently Steps to Reproduce: 1./opt/qa/tools/system_light/run.sh -l /tmp/san2.log -t ltp 2. 3. Actual results: end ltp tests: 01:59:58 total 18 tests were successful out of 20 tests rm: cannot remove `ltp': Transport endpoint is not connected 1 Total 1 tests were successful Switching over to the previous working directory /opt/qa/tools/system_light/run.sh: line 96: cd: /mnt: Transport endpoint is not connected Removing /mnt/run8129/ rmdir: failed to remove `/mnt/run8129/': Transport endpoint is not connected rmdir failed:Directory not empty Expected results: Additional info: (gdb) bt #0 pthread_spin_lock (lock=0x90) at ../nptl/sysdeps/i386/pthread_spin_lock.c:35 #1 0x00007fb55e133d5d in decrement_reopen_fd_count (this=0x7fb562b32d60, conf=0x0) at client-lk.c:593 #2 0x00007fb55e130142 in clnt_release_reopen_fd_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7fb56150479c) at client-handshake.c:599 #3 0x00007fb5626b24e5 in rpc_clnt_handle_reply (clnt=0x2dd11d0, pollin=0x7fb54c01d0d0) at rpc-clnt.c:788 #4 0x00007fb5626b2ce0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x2dd1200, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:907 #5 0x00007fb5626adec8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489 #6 0x00007fb55c1ceaaa in gf_rdma_pollin_notify (peer=0x2dd16d8, post=<value optimized out>) at rdma.c:3100 #7 0x00007fb55c1cee5c in gf_rdma_recv_reply (peer=0x2dd16d8, post=0x24697e0) at rdma.c:3187 #8 0x00007fb55c1cf60c in gf_rdma_process_recv (peer=0x2dd16d8, wc=<value optimized out>) at rdma.c:3277 #9 0x00007fb55c1cfa70 in gf_rdma_recv_completion_proc (data=0x18a2850) at rdma.c:3362 #10 0x0000003fe10077f1 in start_thread (arg=0x7fb553578700) at pthread_create.c:301 #11 0x0000003fe0ce570d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) f 1 #1 0x00007fb55e133d5d in decrement_reopen_fd_count (this=0x7fb562b32d60, conf=0x0) at client-lk.c:593 593 LOCK (&conf->rec_lock); (gdb) p *conf Cannot access memory at address 0x0 (gdb) p conf $1 = (clnt_conf_t *) 0x0 (gdb) up #2 0x00007fb55e130142 in clnt_release_reopen_fd_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7fb56150479c) at client-handshake.c:599 599 decrement_reopen_fd_count (this, conf); (gdb) p *this $2 = {name = 0x7fb5629192c9 "glusterfs", type = 0x7fb56291d797 "global", next = 0x0, prev = 0x0, parents = 0x0, children = 0x0, options = 0x0, dlhandle = 0x0, fops = 0x0, cbks = 0x0, dumpops = 0x0, volume_options = {next = 0x1851fc0, prev = 0x1851fc0}, fini = 0, init = 0, reconfigure = 0, mem_acct_init = 0, notify = 0, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 46 times>}, history = 0x0, ctx = 0x182a010, graph = 0x0, itable = 0x0, init_succeeded = 0 '\000', private = 0x0, mem_acct = {num_types = 0, rec = 0x0}, winds = 0, switched = 0 '\000', local_pool = 0x0} (gdb) p this->private $3 = (void *) 0x0
Created attachment 586267 [details] properly setting the 'THIS' variable with right translator. Anush/Du, Can you guys please test with the attached patch, that should probably fix the issue you saw in this bug.
Patches have been pushed (http://review.gluster.com/3421 and 3420)
Verified with the patches above.
CHANGE: http://review.gluster.com/3447 (rpc-transport/rdma: logging enhancements) merged in release-3.3 by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/3463 (protocol/client: provide a buffer for storing reply of readlink.) merged in release-3.3 by Vijay Bellur (vijay)
*** Bug 849132 has been marked as a duplicate of this bug. ***
*** Bug 787258 has been marked as a duplicate of this bug. ***
*** Bug 858453 has been marked as a duplicate of this bug. ***
*** Bug 772880 has been marked as a duplicate of this bug. ***
*** Bug 858452 has been marked as a duplicate of this bug. ***