Description of problem: Directory recursion causes significant memory usage in the client when the md-cache translator is in use. This seems to occur because md-cache caches a significant amount of dictionary data based on the inode cache, which does not aggressively evict on the client side. Version-Release number of selected component (if applicable): 3.3.x How reproducible: 100% Steps to Reproduce: 1. Create/mount a gluster volume. 2. Copy a relatively large directory tree (i.e., kernel source tree). 3. ls -alR the resulting tree, observe RES/RSS of the glusterfs client process. 4. Replicate the directory tree and repeat step #3. Actual results: With 4 replicated kernel source trees, I reproduce about 8GB of memory used in the client process. Expected results: Some kind of cap on memory usage. Additional info: It is expected that md-cache can result in significant memory usage compared to without the translator installed, but the user should be able to restrict the cache from growing beyond a certain point.
I'll try to add an lru mechanism to md-cache (i.e., similar to quick-read) to try and control this behavior...
Avati's comments: "You make a good observation. The problem I suspect is because of the loaded quick-read translator which results in returning of file data in the lookup_cbk's xattr dict. The md-cache's xattr cache is a superset of which is needed (basically the entire xattr dictionary which also holds quick-read's response and not just the interested keys of md-cache). What is noted might very well be the same overrun behavior of quick-read, and for the same data element (rather than extended attributes). It would be interesting to see if we fix the *xatt_set() in md-cache to selectively dup just the required attributes and cache only those rather than blindly storing the entire dict as-is (and thereby caching file contents which were present to satisfy quick-read's request)." --- I reproduce a reduction in glusterfs RSS from 28GB to 1.3GB with a particular dataset by removing quick-read from the client graph, which validates the above.
Suggestion from Amar: < amarts> and about md-cache + quick-read consuming extra memory, we were thinking just doing a 'dict_del(GF_CONTENT_KEY);' in quick-read's lookup_cbk(), before unwind
http://review.gluster.com/3268
CHANGE: http://review.gluster.com/3268 (quick-read, md-cache: selectively cache xattr data to conserve memory) merged in master by Anand Avati (avati)
Tested with more than 10 replicated kernel sources, metadata intensive operations. Memory usage more or less remained constant around 310M. Couldn't even hit close to 1G. (Earlier hitting around 8G for a couple of replicated kernel sources).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html