Bug 1340361 - Call_bail of a frame due to not able to find a saved frame in reply
Summary: Call_bail of a frame due to not able to find a saved frame in reply
Keywords:
Status: CLOSED DUPLICATE of bug 1421937
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1341182 1341183 1341184
TreeView+ depends on / blocked
 
Reported: 2016-05-27 07:32 UTC by Raghavendra G
Modified: 2017-09-06 10:51 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
: 1340362 1341182 1341183 1341184 (view as bug list)
Environment:
Last Closed: 2017-09-06 10:48:18 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra G 2016-05-27 07:32:31 UTC
Description of problem:

This was observed on one of user's production setup.

[raghu@unused 01610290]$ grep "cannot lookup the saved" <client-log>
[2016-02-15 22:40:02.575925] C [rpc-clnt.c:452:rpc_clnt_fill_request_info] <client-log>: cannot lookup the saved frame corresponding to xid (14161323)

Above is the log-message indicating that a reply from server couldn't lookup a saved-frame to unwind the response. The xid (0xd815ab) matches to one of the unaccounted call-bails. Similar msg was seen from other log.

As for the RCA for not able to lookup a saved frame for response, I saw in rpc-clnt:

1. Submit the request to transport for transmission to brick.
2. Save the frame for future reference while processing reply.

Now, if we get a response between 1 and 2 (before we were able to save the frame), we would be saving the frame of a request whose reply is already received. This can result in call-bail.


Version-Release number of selected component (if applicable):
Zero day bug, present in all releases

How reproducible:
Racy. Not consistent

Steps to Reproduce:
1.
2.
3.

Actual results:
A reply didn't have an associated saved-frame in rpc-clnt layer.

Expected results:
A reply should always have a saved-frame in rpc-clnt layer.


Additional info:

Comment 1 Vijay Bellur 2016-05-27 07:36:29 UTC
REVIEW: http://review.gluster.org/14547 (rpc-clnt: save the frame before submitting request to transport) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 2 Raghavendra G 2017-09-06 04:43:19 UTC
> As for the RCA for not able to lookup a saved frame for response, I saw in
> rpc-clnt:
> 
> 1. Submit the request to transport for transmission to brick.
> 2. Save the frame for future reference while processing reply.
> 
> Now, if we get a response between 1 and 2 (before we were able to save the
> frame), we would be saving the frame of a request whose reply is already
> received. This can result in call-bail.

This RCA is incorrect, since both 1 and 2 happen atomically under lock. When looking for a saved frame during reply processing we acquire same lock and hence atomicity is preserved.

Comment 3 Raghavendra G 2017-09-06 10:48:18 UTC

*** This bug has been marked as a duplicate of bug 1421937 ***

Comment 4 Raghavendra G 2017-09-06 10:51:59 UTC
bz 1421937 speaks of a corruption in call-back submit codepath. This can result in the corruption of rpc reply in general and xid in particular. If xid is corrupted, we would not be able to map the reply with saved frame resulting in a call-bail.

Since there is no reproducer, we cannot confirm the above hypothesis. Closing this bug as a duplicate of bz 1421937. Please re-open if it reproduced in versions higher than 3.8.11


Note You need to log in before you can comment on or make changes to this bug.