Red Hat Bugzilla – Bug 867132
Add STACK_WIND_TAIL for default functions
Last modified: 2013-07-24 14:02:31 EDT
Currently, we'll allocate (and free) a new call frame for every translator we pass through, including those that don't actually implement the current operation type. For any given operation type xxx, this case could be better handled as a tail call, so default_xxx calls straight through to FIRST_CHILD(this)->fops->xxx without allocating a new frame, and on the way back we skip a call to default_xxx_cbk entirely. This might all seem academic, but I measured the effect on the following configuration (same as for hsrepl tests a while ago).
two ramdisks on two servers
one client via GigE
synchronous random 4KB writes with varying thread counts
At higher thread counts, I was reliably able to reproduce a 2-5% performance difference. Peak CPU load seeemed to drop from ~14% to ~13%, but that's probably smaller than measurement error. I theorize that part of the difference is due to actual CPU cycles, part also to lowered memory usage and cache thrashing. In any case there seems little reason *not* to do such a well-understood kind of optimization, even though there might be implementation details or stack assumptions to work through (like we saw with synctasks and inner functions).
Jeff posted a patch @ http://review.gluster.org/4092
CHANGE: http://review.gluster.org/4092 (core: add STACK_WIND_TAIL for more efficient default_xxx.) merged in master by Anand Avati (firstname.lastname@example.org)