Bug 764889 (GLUSTER-3157)

Summary: race in rpc reply submit in glusterd
Product: [Community] GlusterFS Reporter: Anand Avati <aavati>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amarts, chrisw, gluster-bugs, jdarcy, nsathyan, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-11 02:44:55 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Anand Avati 2011-07-12 05:22:17 EDT
reference to the reply message must be kept till churning of reply buffer is complete.
Comment 1 Jeff Darcy 2011-10-18 17:56:18 EDT
Thinking that this might be related to problems I had seen using the SSL/multi-threaded transport code with glusterd, I did some tests to see if I could find a repeatable test case.  I was unsuccessful.  With only the multi-threading part enabled, everything seemed to run fine.  With the SSL part also enabled, I ran into some problems but none attributable to this (mostly they were to do with mismatches between code paths that were trying to use SSL and other code paths that still weren't).

That doesn't mean there's not a bug here, just that it's one I haven't been able to hit.  From my understanding of the socket code, it is possible that a reply will be enqueued for later transmission, and could be freed before it's actually transmitted.  Same with requests BTW.  In either case it would require that the socket be blocked (e.g. window full) which implies a level of activity I wouldn't expect to see.

Also, I looked at how iobrefs are handled in server_submit_reply and glusterd_submit_reply (at KP's suggestion).  AFAICT these functions rightly expect that the transport will take refs if it needs to hang on to the reply for later, and the socket transport (didn't look at RDMA) does so.  Avati, can you elaborate on what sequence of events you're concerned about that would lead to either a premature free or a memory leak?