Bug 1422363
Summary: | [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Poornima G <pgurusid> | |
Component: | rpc | Assignee: | bugs <bugs> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.10 | CC: | amukherj, bugs, ksandha, nchilaka, rcyriac, rhinduja, rhs-bugs, rjoseph, skoduri, srangana | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.10.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1421937 | |||
: | 1422787 (view as bug list) | Environment: | ||
Last Closed: | 2017-02-27 15:29:15 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1421937 | |||
Bug Blocks: | 1409135, 1416031, 1422787, 1422788 |
Description
Poornima G
2017-02-15 06:31:07 UTC
REVIEW: https://review.gluster.org/16623 (rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport) posted (#1) for review on release-3.10 by Poornima G (pgurusid) REVIEW: https://review.gluster.org/16623 (rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport) posted (#2) for review on release-3.10 by Poornima G (pgurusid) COMMIT: https://review.gluster.org/16623 committed in release-3.10 by Shyamsundar Ranganathan (srangana) ------ commit 69ab6b963585f3080771221c3a0cc4549e6eebb1 Author: Poornima G <pgurusid> Date: Tue Feb 14 12:45:36 2017 +0530 rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport Backport of https://review.gluster.org/16613 Issue: When fio is run on multiple clients (each client writes to its own files), and meanwhile the clients does a readdirp, thus the client which did a readdirp will now recieve the upcalls. In this scenario the client disconnects with rpc decode failed error. RCA: Upcall calls rpcsvc_request_submit to submit the request to socket: rpcsvc_request_submit currently: rpcsvc_request_submit () { iobuf = iobuf_new iov = iobuf->ptr fill iobuf to contain xdrised upcall content - proghdr rpcsvc_callback_submit (..iov..) ... if (iobuf) iobuf_unref (iobuf) } rpcsvc_callback_submit (... iov...) { ... iobuf = iobuf_new iov1 = iobuf->ptr fill iobuf to contain xdrised rpc header - rpchdr msg.rpchdr = iov1 msg.proghdr = iov ... rpc_transport_submit_request (msg) ... if (iobuf) iobuf_unref (iobuf) } rpcsvc_callback_submit assumes that once rpc_transport_submit_request() returns the msg is written on to socket and thus the buffers(rpchdr, proghdr) can be freed, which is not the case. In especially high workload, rpc_transport_submit_request() may not be able to write to socket immediately and hence adds it to its own queue and returns as successful. Thus, we have use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr and proghdr and thus fails to decode the rpc, resulting in disconnect. To prevent this, we need to add the rpchdr and proghdr to a iobref and send it in msg: iobref_add (iobref, iobufs) msg.iobref = iobref; The socket layer takes a ref on msg.iobref, if it cannot write to socket and is adding to the queue. Thus we do not have use after free. Thank You for discussing, debugging and fixing along: Prashanth Pai <ppai> Raghavendra G <rgowdapp> Rajesh Joseph <rjoseph> Kotresh HR <khiremat> Mohammed Rafi KC <rkavunga> Soumya Koduri <skoduri> > Reviewed-on: https://review.gluster.org/16613 > Reviewed-by: Prashanth Pai <ppai> > Smoke: Gluster Build System <jenkins.org> > Reviewed-by: soumya k <skoduri> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Raghavendra G <rgowdapp> Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275 BUG: 1422363 Signed-off-by: Poornima G <pgurusid> Reviewed-on: https://review.gluster.org/16623 NetBSD-regression: NetBSD Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Prashanth Pai <ppai> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-devel/2017-February/052173.html [2] https://www.gluster.org/pipermail/gluster-users/ This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/ |