Bug 1421937
Summary: | [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Poornima G <pgurusid> | |
Component: | rpc | Assignee: | Poornima G <pgurusid> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | amukherj, bugs, ksandha, nchilaka, rcyriac, rgowdapp, rhinduja, rhs-bugs, rjoseph, skoduri | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.11.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1409135 | |||
: | 1422363 (view as bug list) | Environment: | ||
Last Closed: | 2017-05-30 18:43:01 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1341182, 1341183, 1341184, 1409135, 1422363, 1422787, 1422788 |
Description
Poornima G
2017-02-14 06:30:12 UTC
REVIEW: https://review.gluster.org/16613 (rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport) posted (#1) for review on master by Poornima G (pgurusid) REVIEW: https://review.gluster.org/16613 (rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport) posted (#2) for review on master by Poornima G (pgurusid) COMMIT: https://review.gluster.org/16613 committed in master by Raghavendra G (rgowdapp) ------ commit 8607f22dcd1bc9b84e452ae90102fa9d345ad3db Author: Poornima G <pgurusid> Date: Tue Feb 14 12:45:36 2017 +0530 rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport Issue: When fio is run on multiple clients (each client writes to its own files), and meanwhile the clients does a readdirp, thus the client which did a readdirp will now recieve the upcalls. In this scenario the client disconnects with rpc decode failed error. RCA: Upcall calls rpcsvc_request_submit to submit the request to socket: rpcsvc_request_submit currently: rpcsvc_request_submit () { iobuf = iobuf_new iov = iobuf->ptr fill iobuf to contain xdrised upcall content - proghdr rpcsvc_callback_submit (..iov..) ... if (iobuf) iobuf_unref (iobuf) } rpcsvc_callback_submit (... iov...) { ... iobuf = iobuf_new iov1 = iobuf->ptr fill iobuf to contain xdrised rpc header - rpchdr msg.rpchdr = iov1 msg.proghdr = iov ... rpc_transport_submit_request (msg) ... if (iobuf) iobuf_unref (iobuf) } rpcsvc_callback_submit assumes that once rpc_transport_submit_request() returns the msg is written on to socket and thus the buffers(rpchdr, proghdr) can be freed, which is not the case. In especially high workload, rpc_transport_submit_request() may not be able to write to socket immediately and hence adds it to its own queue and returns as successful. Thus, we have use after free, for rpchdr and proghdr. Hence the clients gets garbage rpchdr and proghdr and thus fails to decode the rpc, resulting in disconnect. To prevent this, we need to add the rpchdr and proghdr to a iobref and send it in msg: iobref_add (iobref, iobufs) msg.iobref = iobref; The socket layer takes a ref on msg.iobref, if it cannot write to socket and is adding to the queue. Thus we do not have use after free. Thank You for discussing, debugging and fixing along: Prashanth Pai <ppai> Raghavendra G <rgowdapp> Rajesh Joseph <rjoseph> Kotresh HR <khiremat> Mohammed Rafi KC <rkavunga> Soumya Koduri <skoduri> Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275 BUG: 1421937 Signed-off-by: Poornima G <pgurusid> Reviewed-on: https://review.gluster.org/16613 Reviewed-by: Prashanth Pai <ppai> Smoke: Gluster Build System <jenkins.org> Reviewed-by: soumya k <skoduri> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report. glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html [2] https://www.gluster.org/pipermail/gluster-users/ *** Bug 1340361 has been marked as a duplicate of this bug. *** |