Bug 765336 (GLUSTER-3604)

Summary: Streamline translator-local structure allocation
Product: [Community] GlusterFS Reporter: Jeff Darcy <jdarcy>
Component: coreAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact: Raghavendra Bhat <rabhat>
Severity: low Docs Contact:
Priority: medium    
Version: pre-releaseCC: gluster-bugs, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:36:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: glusterfs-3.3.0qa43 Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    

Description Jeff Darcy 2011-09-21 17:26:43 UTC
Right now, the translator API includes a "local" field in the
call_frame_t structure, which many translators use to hold their own
state across asynchronous calls/callbacks.  Typically, they GF_CALLOC a
translator-specific structure and stick the pointer into frame->local.
This does get freed automatically when the frame itself is freed, which
is ever so slightly convenient, but otherwise we have a situation in
which every single translator that touches a request incurs a whole
system-level malloc/free pair - not even benefiting from the pool
allocation used for the frames themselves.  Besides the cycles
consumed, this also has implications for page and cache warmth.

What I'm proposing is that each translator can provide an *optional*
new entry point which the translator infrastructure can use to query
the size of the structure that the translator will be using "local" to
hold.  That infrastructure can then walk the translator graph to
determine the total size of all such structures in the longest path, and
use a single malloc/free for all at once.  This might in some cases
result in allocating space for a structure that won't be needed, but in
the great majority of cases it would be more efficient than what we have
now.  It would also be more convenient, since translators that
implement the new query function wouldn't have to manage the
local-structure life cycle themselves.  Other translators could
continue to use the current method indefinitely.

I'd be willing to write the infrastructure code for this, and modify
some of the most commonly used translators to use it as examples.  Any
interest?

Comment 1 Amar Tumballi 2011-09-22 01:13:43 UTC
We do had plans of bringing in mempools for the local allocations. Currently Shehjar started that effort (ref: http://review.gluster.com/#dashboard,1000032 )

We also want to bring in more important logs in statedump about how many times a given type of structure is allocated (so we can decide if it is high %, make it mempool or variable sized iobuf). (ref: http://review.gluster.com/376)

Considering this, I think starting on this effort right now would be bit of duplicate work. Let me know your thoughts.

Comment 2 Anand Avati 2012-02-21 10:42:15 UTC
CHANGE: http://review.gluster.com/2772 (core: utilize mempool for frame->local allocations) merged in master by Anand Avati (avati)

Comment 3 Anand Avati 2012-02-22 12:23:54 UTC
CHANGE: http://review.gluster.com/2784 (mempool: adjustments in pool sizes) merged in master by Vijay Bellur (vijay)

Comment 4 Raghavendra Bhat 2012-05-24 10:05:28 UTC
Now the local structure for the xlators are allocated from the mempool instead of doing malloc/free.

pool-name=mirror-replicate-1:afr_local_t
hot-count=1
cold-count=511
padded_sizeof=10612
alloc-count=48865
max-alloc=5
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=mirror-replicate-2:afr_local_t
hot-count=1
cold-count=511
padded_sizeof=10612
alloc-count=42560
max-alloc=7
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=mirror-dht:dht_local_t
hot-count=1
cold-count=511
padded_sizeof=1972
alloc-count=66128
max-alloc=6
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=mirror-quota:quota_local_t
hot-count=1
cold-count=63
padded_sizeof=356
alloc-count=33124
max-alloc=6
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=mirror-write-behind:wb_local_t