Bug 822337

Summary: Crash in decrement_reopen_fd_count
Product: [Community] GlusterFS Reporter: Anush Shetty <ashetty>
Component: rdmaAssignee: Raghavendra G <rgowdapp>
Status: CLOSED CURRENTRELEASE QA Contact: Anush Shetty <ashetty>
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: amarts, brs, gluster-bugs, rfortier, shaines, vbellur, vbhat
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 13:52:47 EDT Type: Bug
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 817967, 849131, 849132, 858452, 858453    
Attachments:
Description Flags
properly setting the 'THIS' variable with right translator. none

Description Anush Shetty 2012-05-17 02:12:45 EDT
Description of problem: Glusterfs client crashed on rdma with ltp testsuite. 


Version-Release number of selected component (if applicable): 3.3.0qa41


How reproducible: Consistently


Steps to Reproduce:
1./opt/qa/tools/system_light/run.sh -l /tmp/san2.log -t ltp
2.
3.
  
Actual results:

end ltp tests: 01:59:58
total 18 tests were successful out of 20 tests
rm: cannot remove `ltp': Transport endpoint is not connected
1
Total 1 tests were successful
Switching over to the previous working directory
/opt/qa/tools/system_light/run.sh: line 96: cd: /mnt: Transport endpoint is not connected
Removing /mnt/run8129/
rmdir: failed to remove `/mnt/run8129/': Transport endpoint is not connected
rmdir failed:Directory not empty



Expected results:


Additional info:

(gdb) bt
#0  pthread_spin_lock (lock=0x90) at ../nptl/sysdeps/i386/pthread_spin_lock.c:35
#1  0x00007fb55e133d5d in decrement_reopen_fd_count (this=0x7fb562b32d60, conf=0x0) at client-lk.c:593
#2  0x00007fb55e130142 in clnt_release_reopen_fd_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>,
    myframe=0x7fb56150479c) at client-handshake.c:599
#3  0x00007fb5626b24e5 in rpc_clnt_handle_reply (clnt=0x2dd11d0, pollin=0x7fb54c01d0d0) at rpc-clnt.c:788
#4  0x00007fb5626b2ce0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x2dd1200, event=<value optimized out>,
    data=<value optimized out>) at rpc-clnt.c:907
#5  0x00007fb5626adec8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:489
#6  0x00007fb55c1ceaaa in gf_rdma_pollin_notify (peer=0x2dd16d8, post=<value optimized out>) at rdma.c:3100
#7  0x00007fb55c1cee5c in gf_rdma_recv_reply (peer=0x2dd16d8, post=0x24697e0) at rdma.c:3187
#8  0x00007fb55c1cf60c in gf_rdma_process_recv (peer=0x2dd16d8, wc=<value optimized out>) at rdma.c:3277
#9  0x00007fb55c1cfa70 in gf_rdma_recv_completion_proc (data=0x18a2850) at rdma.c:3362
#10 0x0000003fe10077f1 in start_thread (arg=0x7fb553578700) at pthread_create.c:301
#11 0x0000003fe0ce570d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) f 1
#1  0x00007fb55e133d5d in decrement_reopen_fd_count (this=0x7fb562b32d60, conf=0x0) at client-lk.c:593
593             LOCK (&conf->rec_lock);
(gdb) p *conf
Cannot access memory at address 0x0
(gdb) p conf
$1 = (clnt_conf_t *) 0x0

(gdb) up
#2  0x00007fb55e130142 in clnt_release_reopen_fd_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>,
    myframe=0x7fb56150479c) at client-handshake.c:599
599             decrement_reopen_fd_count (this, conf);
(gdb) p *this
$2 = {name = 0x7fb5629192c9 "glusterfs", type = 0x7fb56291d797 "global", next = 0x0, prev = 0x0, parents = 0x0, children = 0x0,
  options = 0x0, dlhandle = 0x0, fops = 0x0, cbks = 0x0, dumpops = 0x0, volume_options = {next = 0x1851fc0, prev = 0x1851fc0}, fini = 0,
  init = 0, reconfigure = 0, mem_acct_init = 0, notify = 0, loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0,
      mean = 0, count = 0} <repeats 46 times>}, history = 0x0, ctx = 0x182a010, graph = 0x0, itable = 0x0, init_succeeded = 0 '\000',
  private = 0x0, mem_acct = {num_types = 0, rec = 0x0}, winds = 0, switched = 0 '\000', local_pool = 0x0}

(gdb) p this->private
$3 = (void *) 0x0
Comment 1 Amar Tumballi 2012-05-23 03:56:36 EDT
Created attachment 586267 [details]
properly setting the 'THIS' variable with right translator.

Anush/Du,  

Can you guys please test with the attached patch, that should probably fix the issue you saw in this bug.
Comment 2 Amar Tumballi 2012-05-24 01:39:08 EDT
Patches have been pushed (http://review.gluster.com/3421 and 3420)
Comment 3 Anush Shetty 2012-05-29 05:50:59 EDT
Verified with the patches above.
Comment 4 Vijay Bellur 2012-07-03 08:11:18 EDT
CHANGE: http://review.gluster.com/3447 (rpc-transport/rdma: logging enhancements) merged in release-3.3 by Vijay Bellur (vijay@gluster.com)
Comment 5 Vijay Bellur 2012-07-03 15:36:43 EDT
CHANGE: http://review.gluster.com/3463 (protocol/client: provide a buffer for storing reply of readlink.) merged in release-3.3 by Vijay Bellur (vijay@gluster.com)
Comment 6 Raghavendra G 2012-12-26 05:09:00 EST
*** Bug 849132 has been marked as a duplicate of this bug. ***
Comment 7 Raghavendra G 2012-12-26 05:09:08 EST
*** Bug 787258 has been marked as a duplicate of this bug. ***
Comment 8 Raghavendra G 2012-12-26 05:10:28 EST
*** Bug 858453 has been marked as a duplicate of this bug. ***
Comment 9 Raghavendra G 2013-01-01 23:59:28 EST
*** Bug 772880 has been marked as a duplicate of this bug. ***
Comment 10 Raghavendra G 2013-01-02 00:01:13 EST
*** Bug 858452 has been marked as a duplicate of this bug. ***