Bug 1341184

Summary: Call_bail of a frame due to not able to find a saved frame in reply
Product: [Community] GlusterFS Reporter: Niels de Vos <ndevos>
Component: rpcAssignee: bugs <bugs>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.6.10CC: bugs, rgowdapp, rkavunga
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1340361 Environment:
Last Closed: 2016-08-23 13:09:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1340361, 1421937    
Bug Blocks:    

Description Niels de Vos 2016-05-31 12:29:03 UTC
+++ This bug was initially created as a clone of Bug #1340361 +++

Description of problem:

This was observed on one of user's production setup.

[raghu@unused 01610290]$ grep "cannot lookup the saved" <client-log>
[2016-02-15 22:40:02.575925] C [rpc-clnt.c:452:rpc_clnt_fill_request_info] <client-log>: cannot lookup the saved frame corresponding to xid (14161323)

Above is the log-message indicating that a reply from server couldn't lookup a saved-frame to unwind the response. The xid (0xd815ab) matches to one of the unaccounted call-bails. Similar msg was seen from other log.

As for the RCA for not able to lookup a saved frame for response, I saw in rpc-clnt:

1. Submit the request to transport for transmission to brick.
2. Save the frame for future reference while processing reply.

Now, if we get a response between 1 and 2 (before we were able to save the frame), we would be saving the frame of a request whose reply is already received. This can result in call-bail.


Version-Release number of selected component (if applicable):
Zero day bug, present in all releases

How reproducible:
Racy. Not consistent

Steps to Reproduce:
1.
2.
3.

Actual results:
A reply didn't have an associated saved-frame in rpc-clnt layer.

Expected results:
A reply should always have a saved-frame in rpc-clnt layer.


Additional info:

--- Additional comment from Vijay Bellur on 2016-05-27 09:36:29 CEST ---

REVIEW: http://review.gluster.org/14547 (rpc-clnt: save the frame before submitting request to transport) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 1 Mohammed Rafi KC 2016-08-23 13:06:29 UTC
master patch : http://review.gluster.org/14547

Comment 2 Mohammed Rafi KC 2016-08-23 13:09:09 UTC
This bug is being closed as GlusterFS-3.6 is nearing its End-Of-Life and only important security bugs will be fixed. This bug has been fixed in more recent GlusterFS releases. If you still face this bug with the newer GlusterFS versions, please open a new bug