When doing reading tests with dd of=/dev/null if=test.file bs=1048576 count=4192 every client keeps using memory (went all the way up to 8Gb in a few minutes, reading at around 500Mbyte/s). I'm using a stripe 16 setting with 16 independent disks, no RAID, using a 10Gbe link between server and clients. Mounting the partition with valgrind --leak-check=yes glusterfs /mnt/stripe resulted in the attached file, after the glusterfs process reached 1Gb of memory used. Compiled with a fresh git version (git describe reports v3.1.1-45-g0cc2b35) on a Linux Ubuntu 10.10 machine, using gcc 4.4.5.
It is fixed now. In stripe_readv the frame being copied was not getting destroyed if an error happens while allocating the local structure for the copied frame. And in STRIPE_STACK_UNWIND and STRIPE_STACK_DESTROY the local structure of the frame was not being freed. This is the output of the valgrind after making the appropriate changes. LEAK SUMMARY: ==16628== definitely lost: 240 bytes in 4 blocks ==16628== indirectly lost: 152 bytes in 1 blocks ==16628== possibly lost: 25,233,638 bytes in 188 blocks ==16628== still reachable: 1,105,635 bytes in 121 blocks ==16628== suppressed: 0 bytes in 0 blocks ==16628== Reachable blocks (those to which a pointer was found) are not shown. ==16628== To see them, rerun with: --leak-check=full --show-reachable=yes ==16628== ==16628== For counts of detected and suppressed errors, rerun with: -v ==16628== Use --track-origins=yes to see where uninitialised values come from ==16628== ERROR SUMMARY: 110 errors from 106 contexts (suppressed: 32 from 5) (END)
PATCH: http://patches.gluster.com/patch/5947 in master (stripe: fix memory leak)