Bug 765336 (GLUSTER-3604) - Streamline translator-local structure allocation
Summary: Streamline translator-local structure allocation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3604
Product: GlusterFS
Classification: Community
Component: core
Version: pre-release
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact: Raghavendra Bhat
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2011-09-21 17:26 UTC by Jeff Darcy
Modified: 2013-12-19 00:06 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:36:07 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: glusterfs-3.3.0qa43
Embargoed:


Attachments (Terms of Use)

Description Jeff Darcy 2011-09-21 17:26:43 UTC
Right now, the translator API includes a "local" field in the
call_frame_t structure, which many translators use to hold their own
state across asynchronous calls/callbacks.  Typically, they GF_CALLOC a
translator-specific structure and stick the pointer into frame->local.
This does get freed automatically when the frame itself is freed, which
is ever so slightly convenient, but otherwise we have a situation in
which every single translator that touches a request incurs a whole
system-level malloc/free pair - not even benefiting from the pool
allocation used for the frames themselves.  Besides the cycles
consumed, this also has implications for page and cache warmth.

What I'm proposing is that each translator can provide an *optional*
new entry point which the translator infrastructure can use to query
the size of the structure that the translator will be using "local" to
hold.  That infrastructure can then walk the translator graph to
determine the total size of all such structures in the longest path, and
use a single malloc/free for all at once.  This might in some cases
result in allocating space for a structure that won't be needed, but in
the great majority of cases it would be more efficient than what we have
now.  It would also be more convenient, since translators that
implement the new query function wouldn't have to manage the
local-structure life cycle themselves.  Other translators could
continue to use the current method indefinitely.

I'd be willing to write the infrastructure code for this, and modify
some of the most commonly used translators to use it as examples.  Any
interest?

Comment 1 Amar Tumballi 2011-09-22 01:13:43 UTC
We do had plans of bringing in mempools for the local allocations. Currently Shehjar started that effort (ref: http://review.gluster.com/#dashboard,1000032 )

We also want to bring in more important logs in statedump about how many times a given type of structure is allocated (so we can decide if it is high %, make it mempool or variable sized iobuf). (ref: http://review.gluster.com/376)

Considering this, I think starting on this effort right now would be bit of duplicate work. Let me know your thoughts.

Comment 2 Anand Avati 2012-02-21 10:42:15 UTC
CHANGE: http://review.gluster.com/2772 (core: utilize mempool for frame->local allocations) merged in master by Anand Avati (avati)

Comment 3 Anand Avati 2012-02-22 12:23:54 UTC
CHANGE: http://review.gluster.com/2784 (mempool: adjustments in pool sizes) merged in master by Vijay Bellur (vijay)

Comment 4 Raghavendra Bhat 2012-05-24 10:05:28 UTC
Now the local structure for the xlators are allocated from the mempool instead of doing malloc/free.

pool-name=mirror-replicate-1:afr_local_t
hot-count=1
cold-count=511
padded_sizeof=10612
alloc-count=48865
max-alloc=5
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=mirror-replicate-2:afr_local_t
hot-count=1
cold-count=511
padded_sizeof=10612
alloc-count=42560
max-alloc=7
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=mirror-dht:dht_local_t
hot-count=1
cold-count=511
padded_sizeof=1972
alloc-count=66128
max-alloc=6
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=mirror-quota:quota_local_t
hot-count=1
cold-count=63
padded_sizeof=356
alloc-count=33124
max-alloc=6
pool-misses=0
max-stdalloc=0
-----=-----
pool-name=mirror-write-behind:wb_local_t


Note You need to log in before you can comment on or make changes to this bug.