Bug 1421937 - [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible
Summary: [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: mainline
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Poornima G
QA Contact:
: 1340361 (view as bug list)
Depends On:
Blocks: 1341182 1341183 1341184 1409135 1422363 1422787 1422788
TreeView+ depends on / blocked
Reported: 2017-02-14 06:30 UTC by Poornima G
Modified: 2017-09-06 10:48 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.11.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1409135
: 1422363 (view as bug list)
Last Closed: 2017-05-30 18:43:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:

Attachments (Terms of Use)

Description Poornima G 2017-02-14 06:30:12 UTC
+++ This bug was initially created as a clone of Bug #1409135 +++

Description of problem:
RPC failed to decode the msg on nfs mount leading to  Mounted volume on NFS mount and started sequential write with o-direct flag

Version-Release number of selected component (if applicable):
Logs are placed at 

How reproducible:
Tried Once

Steps to Reproduce:
1. 4 Servers And 4 Clients , Mount 1:1 with gnfs
2. Daemonize FIO on 4 Client and start sequential write with 0 direct flag
3. After the inception of IO's the tool got hanged.
4. Multiple errors and warnings in nfs.log

2016-12-29 10:27:29.424871] W [xdr-rpc.c:55:xdr_to_rpc_call] 0-rpc: failed to decode call msg
[2016-12-29 10:27:29.425032] W [rpc-clnt.c:717:rpc_clnt_handle_cbk] 0-testvol-client-0: RPC call decoding failed
[2016-12-29 10:27:29.443275] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f9ccdb40682] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f9ccd90675e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f9ccd90686e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f9ccd907fd4] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f9ccd908864] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2016-12-29 10:26:55.289465 (xid=0xa8ddd)
[2016-12-29 10:27:29.443308] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-testvol-client-0: remote operation failed [Transport endpoint is not connected]
[2016-12-29 10:27:29.443571] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f9ccdb40682] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f9ccd90675e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f9ccd90686e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f9ccd907fd4] (-->


Actual results:
IO tool got hung and multiple error and warning in the logs

Expected results:
NO IO hang should be observed

Additional info:

[root@gqas004 ~]# gluster v info
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: e2212c28-f04a-4f08-9f17-b0fb74434bbf
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Brick1: ip1:/bricks/testvol_brick0
Brick2: ip2:/bricks/testvol_brick1
Brick3: ip3:/bricks/testvol_brick2
Brick4: ip4:/bricks/testvol_brick3
Options Reconfigured:
cluster.use-compound-fops: off
network.remote-dio: off
performance.strict-o-direct: on
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off

Comment 1 Worker Ant 2017-02-14 07:20:59 UTC
REVIEW: https://review.gluster.org/16613 (rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport) posted (#1) for review on master by Poornima G (pgurusid@redhat.com)

Comment 2 Worker Ant 2017-02-15 06:19:23 UTC
REVIEW: https://review.gluster.org/16613 (rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport) posted (#2) for review on master by Poornima G (pgurusid@redhat.com)

Comment 3 Worker Ant 2017-02-16 04:09:40 UTC
COMMIT: https://review.gluster.org/16613 committed in master by Raghavendra G (rgowdapp@redhat.com) 
commit 8607f22dcd1bc9b84e452ae90102fa9d345ad3db
Author: Poornima G <pgurusid@redhat.com>
Date:   Tue Feb 14 12:45:36 2017 +0530

    rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport
    When fio is run on multiple clients (each client writes to its own files),
    and meanwhile the clients does a readdirp, thus the client which did
    a readdirp will now recieve the upcalls. In this scenario the client
    disconnects with rpc decode failed error.
    Upcall calls rpcsvc_request_submit to submit the request to socket:
    rpcsvc_request_submit currently:
    rpcsvc_request_submit () {
       iobuf = iobuf_new
       iov = iobuf->ptr
       fill iobuf to contain xdrised upcall content - proghdr
       rpcsvc_callback_submit (..iov..)
       if (iobuf)
           iobuf_unref (iobuf)
    rpcsvc_callback_submit (... iov...) {
       iobuf = iobuf_new
       iov1 = iobuf->ptr
       fill iobuf to contain xdrised rpc header - rpchdr
       msg.rpchdr = iov1
       msg.proghdr = iov
       rpc_transport_submit_request (msg)
       if (iobuf)
           iobuf_unref (iobuf)
    rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
    returns the msg is written on to socket and thus the buffers(rpchdr, proghdr)
    can be freed, which is not the case. In especially high workload,
    rpc_transport_submit_request() may not be able to write to socket immediately
    and hence adds it to its own queue and returns as successful. Thus, we have
    use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr
    and proghdr and thus fails to decode the rpc, resulting in disconnect.
    To prevent this, we need to add the rpchdr and proghdr to a iobref and send
    it in msg:
       iobref_add (iobref, iobufs)
       msg.iobref = iobref;
    The socket layer takes a ref on msg.iobref, if it cannot write to socket and
    is adding to the queue. Thus we do not have use after free.
    Thank You for discussing, debugging and fixing along:
    Prashanth Pai <ppai@redhat.com>
    Raghavendra G <rgowdapp@redhat.com>
    Rajesh Joseph <rjoseph@redhat.com>
    Kotresh HR <khiremat@redhat.com>
    Mohammed Rafi KC <rkavunga@redhat.com>
    Soumya Koduri <skoduri@redhat.com>
    Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
    BUG: 1421937
    Signed-off-by: Poornima G <pgurusid@redhat.com>
    Reviewed-on: https://review.gluster.org/16613
    Reviewed-by: Prashanth Pai <ppai@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: soumya k <skoduri@redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Raghavendra G <rgowdapp@redhat.com>

Comment 4 Shyamsundar 2017-05-30 18:43:01 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 5 Raghavendra G 2017-09-06 10:48:18 UTC
*** Bug 1340361 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.