Right now, the translator API includes a "local" field in the call_frame_t structure, which many translators use to hold their own state across asynchronous calls/callbacks. Typically, they GF_CALLOC a translator-specific structure and stick the pointer into frame->local. This does get freed automatically when the frame itself is freed, which is ever so slightly convenient, but otherwise we have a situation in which every single translator that touches a request incurs a whole system-level malloc/free pair - not even benefiting from the pool allocation used for the frames themselves. Besides the cycles consumed, this also has implications for page and cache warmth. What I'm proposing is that each translator can provide an *optional* new entry point which the translator infrastructure can use to query the size of the structure that the translator will be using "local" to hold. That infrastructure can then walk the translator graph to determine the total size of all such structures in the longest path, and use a single malloc/free for all at once. This might in some cases result in allocating space for a structure that won't be needed, but in the great majority of cases it would be more efficient than what we have now. It would also be more convenient, since translators that implement the new query function wouldn't have to manage the local-structure life cycle themselves. Other translators could continue to use the current method indefinitely. I'd be willing to write the infrastructure code for this, and modify some of the most commonly used translators to use it as examples. Any interest?
We do had plans of bringing in mempools for the local allocations. Currently Shehjar started that effort (ref: http://review.gluster.com/#dashboard,1000032 ) We also want to bring in more important logs in statedump about how many times a given type of structure is allocated (so we can decide if it is high %, make it mempool or variable sized iobuf). (ref: http://review.gluster.com/376) Considering this, I think starting on this effort right now would be bit of duplicate work. Let me know your thoughts.
CHANGE: http://review.gluster.com/2772 (core: utilize mempool for frame->local allocations) merged in master by Anand Avati (avati)
CHANGE: http://review.gluster.com/2784 (mempool: adjustments in pool sizes) merged in master by Vijay Bellur (vijay)
Now the local structure for the xlators are allocated from the mempool instead of doing malloc/free. pool-name=mirror-replicate-1:afr_local_t hot-count=1 cold-count=511 padded_sizeof=10612 alloc-count=48865 max-alloc=5 pool-misses=0 max-stdalloc=0 -----=----- pool-name=mirror-replicate-2:afr_local_t hot-count=1 cold-count=511 padded_sizeof=10612 alloc-count=42560 max-alloc=7 pool-misses=0 max-stdalloc=0 -----=----- pool-name=mirror-dht:dht_local_t hot-count=1 cold-count=511 padded_sizeof=1972 alloc-count=66128 max-alloc=6 pool-misses=0 max-stdalloc=0 -----=----- pool-name=mirror-quota:quota_local_t hot-count=1 cold-count=63 padded_sizeof=356 alloc-count=33124 max-alloc=6 pool-misses=0 max-stdalloc=0 -----=----- pool-name=mirror-write-behind:wb_local_t