Bug 1450565 - glfsheal: crashed(segfault) with disperse volume in RDMA
Summary: glfsheal: crashed(segfault) with disperse volume in RDMA
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rdma
Version: 3.11
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Ji-Hyeon Gim
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-13 07:43 UTC by Ji-Hyeon Gim
Modified: 2017-05-30 18:52 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.11.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1449495
Environment:
OS : Linux Distro : CentOS 6.7 Kernel : 2.6.32-573.el6.x86_64 #1 SMP x86_64 GNU/Linux Network : IB(RDMA) OFED Driver Version : Mellanox OFED 3.4-1.0.0.0
Last Closed: 2017-05-30 18:52:18 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Ji-Hyeon Gim 2017-05-13 07:43:41 UTC
+++ This bug was initially created as a clone of Bug #1449495 +++

Description of problem:

In 3.10.1, glfsheal with disperse volume is always crushed in RDMA environment.

Version-Release number of selected component (if applicable): v3.10.1

How reproducible:

Steps to Reproduce:

1. install Mellanox OFED packages(librdmacm, libibverbs, etc.)
2. gluster volume create <vol> disperse 3 server{1..4}:<vol> transport rdma
3. run glfsheal

Actual results:

[root@server-1 ~]# glfsheal IBTEST                                                                                      Segmentation fault (core dumped)  

Expected results:

[root@server-1 ~]# glfsheal IBTEST
Brick 10.10.1.220:/volume/IBTEST
<gfid:d338c46e-bff6-4da0-b962-590ef3a19102> - Is in split-brain

...

Additional info:

- coredump with gdb

Core was generated by `/usr/sbin/glfsheal IBTEST'.                                                                     
Program terminated with signal 11, Segmentation fault.                                                                                                                                               
#0  0x00007efc1467b56c in __gf_rdma_teardown (this=0x7efc08032740) at rdma.c:3255
3255            if (peer->cm_id->qp != NULL) {                                                                                                                                   
(gdb) bt                                                                                                                                                                                                   
#0  0x00007efc1467b56c in __gf_rdma_teardown (this=0x7efc08032740) at rdma.c:3255                                                                        
#1  0x00007efc1467b6b0 in gf_rdma_teardown (this=0x7efc08032740, port=<value optimized out>) at rdma.c:3287    
#2  gf_rdma_connect (this=0x7efc08032740, port=<value optimized out>) at rdma.c:4769                                           
#3  0x00007efc23df24e9 in rpc_clnt_reconnect (conn_ptr=0x7efc080325d0) at rpc-clnt.c:422                                                                   
#4  0x00007efc23df25d6 in rpc_clnt_start (rpc=0x7efc080325a0) at rpc-clnt.c:1210                                 
#5  0x00007efc16b55c53 in notify (this=0x7efc0801cec0, event=1, data=0x7efc0801e930) at client.c:2354                                                                                    
#6  0x00007efc2423f592 in xlator_notify (xl=0x7efc0801cec0, event=1, data=0x7efc0801e930) at xlator.c:566                                                                  
#7  0x00007efc242c07c7 in default_notify (this=0x7efc0801e930, event=1, data=0x7efc08020270) at defaults.c:3090
#8  0x00007efc155d5d18 in notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at snapview-client.c:2393
#9  0x00007efc2423f592 in xlator_notify (xl=0x7efc0801e930, event=1, data=0x7efc08020270) at xlator.c:566          
#10 0x00007efc242c07c7 in default_notify (this=0x7efc08020270, event=1, data=0x7efc08021ea0) at defaults.c:3090                                                                
#11 0x00007efc153ba69e in notify (this=0x7efc08020270, event=<value optimized out>, data=0x7efc08021ea0) at io-stats.c:3991                                                                  
#12 0x00007efc2423f592 in xlator_notify (xl=0x7efc08020270, event=1, data=0x7efc08021ea0) at xlator.c:566              
#13 0x00007efc242c07c7 in default_notify (this=0x7efc08021ea0, event=1, data=0x7efc08021ea0) at defaults.c:3090                
#14 0x00007efc2423f592 in xlator_notify (xl=0x7efc08021ea0, event=1, data=0x7efc08021ea0) at xlator.c:566                                                                            
#15 0x00007efc24279bae in glusterfs_graph_parent_up (graph=<value optimized out>) at graph.c:442        
#16 0x00007efc24279fb2 in glusterfs_graph_activate (graph=0x7efc08003990, ctx=0x7eb1a0) at graph.c:711                           
#17 0x00007efc23bc9f91 in glfs_process_volfp (fs=<value optimized out>, fp=0x7efc08003710) at glfs-mgmt.c:79                                                                     
#18 0x00007efc23bca3aa in glfs_mgmt_getspec_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7efc080012d0) at glfs-mgmt.c:665
#19 0x00007efc23df2ad5 in rpc_clnt_handle_reply (clnt=0x876540, pollin=0x7efc080028c0) at rpc-clnt.c:793                                                                                                                                      #20 0x00007efc23df3c85 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x876570, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7efc080028c0) at rpc-clnt.c:986              
#21 0x00007efc23deed68 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:538
#22 0x00007efc16fd19bd in socket_event_poll_in (this=0x876740) at socket.c:2268                                          
#23 0x00007efc16fd2cbe in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x876740, poll_in=1, poll_out=0, poll_err=0) at socket.c:2398                  
#24 0x00007efc242a0716 in event_dispatch_epoll_handler (data=0x7efc10000920) at event-epoll.c:572
#25 event_dispatch_epoll_worker (data=0x7efc10000920) at event-epoll.c:675                                                 
#26 0x00007efc2351daa1 in start_thread (arg=0x7efc17be0700) at pthread_create.c:301                                                                                                
#27 0x00007efc2326abcd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
#0  0x00007f5f9a28956c in __gf_rdma_teardown (this=0x7f5f94032780) at rdma.c:3255
(gdb) list
3250            gf_rdma_peer_t    *peer = NULL;
3251
3252            priv = this->private;
3253            peer = &priv->peer;
3254
3255            if (peer->cm_id->qp != NULL) {
3256                    __gf_rdma_destroy_qp (this);
3257            }
3258
3259            if (!list_empty (&priv->peer.ioq)) {
(gdb) print peer->cm_id
$4 = (struct rdma_cm_id *) 0x0

In my opinion, It is caused by wrong exception handling at __gf_rdma_teardown() (https://github.com/gluster/glusterfs/blob/master/rpc/rpc-transport/rdma/src/rdma.c#L3256).

If an error has occured with rdma_create_id() in gf_rdma_connect(), process will jump to the 'unlock' label and then call gf_rdma_teardown() which call __gf_rdma_teardown().

Presently, __gf_rdma_teardown() checks InifiniBand QP with peer->cm_id->qp!

Unfortunately, cm_id is not allocated and will be crushed in this situation :)

I attach ugly patch for resolving this issue!

--- Additional comment from Worker Ant on 2017-05-11 05:02:51 EDT ---

REVIEW: https://review.gluster.org/17249 (rpc: fix a routine to destory RDMA qp(queue-pair)) posted (#1) for review on master by Ji-Hyeon Gim

--- Additional comment from Worker Ant on 2017-05-11 05:18:05 EDT ---

REVIEW: https://review.gluster.org/17250 (rpc: fix a routine to destory RDMA qp(queue-pair)) posted (#1) for review on release-3.10 by Ji-Hyeon Gim

--- Additional comment from Worker Ant on 2017-05-11 05:34:16 EDT ---

REVIEW: https://review.gluster.org/17246 (rpc: fix a routine to destory RDMA qp(queue-pair)) posted (#2) for review on release-3.10 by Ji-Hyeon Gim

--- Additional comment from Worker Ant on 2017-05-11 23:09:12 EDT ---

COMMIT: https://review.gluster.org/17249 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit ccfa06767f1282d9a3783e37555515a63cc62e69
Author: Ji-Hyeon Gim <potatogim>
Date:   Thu May 11 18:05:21 2017 +0900

    rpc: fix a routine to destory RDMA qp(queue-pair)
    
    Problem: If an error has occured with rdma_create_id() in gf_rdma_connect(),
             process will jump to the 'unlock' label and then call gf_rdma_teardown()
             which call __gf_rdma_teardown().
             Presently, __gf_rdma_teardown() checks InifiniBand QP with peer->cm_id->qp!
             Unfortunately, cm_id is not allocated and will be crushed in this situation :)
    
    Solution: If 'this->private->peer->cm_id' member is null, do not check
              'this->private->peer->cm_id->qp'.
    
    Change-Id: Ie321b8cf175ef4f1bdd9733d73840f03ddff8c3b
    BUG: 1449495
    Signed-off-by: Ji-Hyeon Gim <potatogim>
    Reviewed-on: https://review.gluster.org/17249
    Reviewed-by: Amar Tumballi <amarts>
    Reviewed-by: Prashanth Pai <ppai>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Tested-by: Ji-Hyeon Gim
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jeff.us>

--- Additional comment from Worker Ant on 2017-05-13 03:30:10 EDT ---

REVIEW: https://review.gluster.org/17281 (rpc: fix a routine to destory RDMA qp(queue-pair)) posted (#1) for review on release-3.10 by Ji-Hyeon Gim

--- Additional comment from Worker Ant on 2017-05-13 03:34:28 EDT ---

REVIEW: https://review.gluster.org/17282 (rpc: fix a routine to destory RDMA qp(queue-pair)) posted (#1) for review on release-3.11 by Ji-Hyeon Gim

Comment 1 Worker Ant 2017-05-13 07:47:50 UTC
REVIEW: https://review.gluster.org/17282 (rpc: fix a routine to destory RDMA qp(queue-pair)) posted (#2) for review on release-3.11 by Ji-Hyeon Gim

Comment 2 Worker Ant 2017-05-13 07:53:06 UTC
REVIEW: https://review.gluster.org/17282 (rpc: fix a routine to destory RDMA qp(queue-pair)) posted (#3) for review on release-3.11 by Ji-Hyeon Gim

Comment 3 Worker Ant 2017-05-16 00:30:38 UTC
COMMIT: https://review.gluster.org/17282 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 6c809f40a1a0b500aa09ccaa597dc7d95b1e5146
Author: Ji-Hyeon Gim <potatogim>
Date:   Fri May 5 16:51:19 2017 +0900

    rpc: fix a routine to destory RDMA qp(queue-pair)
    
        This is backport of https://review.gluster.org/#/c/17249/
    
    Problem: If an error has occured with rdma_create_id() in gf_rdma_connect(),
             process will jump to the 'unlock' label and then call gf_rdma_teardown()
             which call __gf_rdma_teardown().
             Presently, __gf_rdma_teardown() checks InifiniBand QP with peer->cm_id->qp!
             Unfortunately, cm_id is not allocated and will be crushed in this situation :)
    
    Solution: If 'this->private->peer->cm_id' member is null, do not check
              'this->private->peer->cm_id->qp'.
    
    > Change-Id: Ie321b8cf175ef4f1bdd9733d73840f03ddff8c3b
    > BUG: 1449495
    > Signed-off-by: Ji-Hyeon Gim <potatogim>
    > Reviewed-on: https://review.gluster.org/17249
    > Reviewed-by: Amar Tumballi <amarts>
    > Reviewed-by: Prashanth Pai <ppai>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > Tested-by: Ji-Hyeon Gim
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Smoke: Gluster Build System <jenkins.org>
    > Reviewed-by: Jeff Darcy <jeff.us>
    
    (cherry picked from commit ccfa06767f1282d9a3783e37555515a63cc62e69)
    
    Change-Id: Ie321b8cf175ef4f1bdd9733d73840f03ddff8c3b
    BUG: 1450565
    Signed-off-by: Ji-Hyeon Gim <potatogim>
    Reviewed-on: https://review.gluster.org/17282
    Smoke: Gluster Build System <jenkins.org>
    Tested-by: Ji-Hyeon Gim
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Amar Tumballi <amarts>

Comment 4 Mohammed Rafi KC 2017-05-23 12:19:45 UTC
Hi  Ji-Hyeon Gim,

Thank you for taking the time to fix the issue, your contribution is really valuable to the project.

Comment 5 Shyamsundar 2017-05-30 18:52:18 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.