Description of problem: rsync (up to and including 2.5.6) does not seem to release memory properly once a task is completed. See the following example: free (before rsync) total used free shared buffers cached Mem: 501 279 221 0 39 177 rsync -rpa /home /tmp/test (for the sake of the test, /home was 453MB) free (after rsync) total used free shared buffers cached Mem: 501 490 10 0 43 383 Here is where things get interesting, IF the destination directory is NOT on a remote machine, meaning if it is local, there is a partial return of used memory IF you delete the destination directory. ex: rm -rf /tmp/test free(after removing destination directory) Mem: 501 294 207 0 43 184 -/+ buffers/cache: 65 436 IF you do not remove the destination directory, the memory does not seem to get returned at all, if you are sync'ing directories to a remote machine then the memory is utilized and not returned until reboot. Version-Release number of selected component (if applicable): Tested on RH versions 8.0 , 9.0 with rsync 2.5.5 and 2.5.6 tested on single processor Athlon 1700 512MB , Deall presicion 650 dual Xeon 2.7 3GB Ram, single processor Pentium IV 1GB Ram How reproducible: very reproducible every time for me Steps to Reproduce: 1. free -m, cat /proc/meminfo 2. rsync -rpva <decent sized directory> /<destination directory> 3. free -m (available memory much lower) Actual results: memory is used and not returned Expected results: I would expect that the memory should be recycled to the system once the rsync completes. Additional info: I have tested this on various platforms and the results seem to be consistenet, however, I doubt that I am the only one that would notice such an issue. Other platforms which were tested ar Solaris, BSD and OS X, all three of which would not consume memory in this manner when running rsync. Memory usage was monitored with vmstat, free and by watching /proc/meminfo
Difference in free RAM (from data above): (490 - 279) = 211 Difference in cache sizes (from to data above): (383 - 177) + (43 - 39) = 210 It didn't swallow your RAM, it just moved most of the free RAM to somewhere useful - in this case, the buffer/inode/dentry caches. Additionally, internal kernel data structures (vm, socket structures, cache entries, etc.) are allocated and freed all the time - even running 'free' affects them. I probably won't explain this perfectly, but here it comes: The data is cached based on the principle "Well, you just accessed this file's data, so I'm going to hold on to it incase you need it again ...". When you access new files or start other applications, I believe the least-recently-used cache data gets freed to make space for the incoming data/application. The reason the 'free' RAM 'jumps' when you remove the directory is simple: When you unlink (remove) a file, cache data about that file freed because they're no longer needed - if the file doesn't exist, you certainly won't be using its data any time soon. In this case, you're unlinking a lot of files. This means that a lot of cache information (that the system thought you might need) is freed. FYI - Running "cp -a <src> <dest>" should have a similar effect as using rsync.
Clarification - rsync doesn't allocate/remove cache entries; this is one of the things the kernel does.
Lon, Your explanation makes sense to me, thanks for that. ( I pretty much knew that if there were a problem it would have been noticed some time ago). This does however, raise a question which has been presented to me numerous times. According to your description, the least recently used cached data gets freed to make room for new applications. This being said, if it seemed as though 95% memory was used and an application such as Matlab is launched and run, then this least-recently-used would be allocated for the process such as Matlab which would explain why the swap space did not seem to be utilized. This is all making perfect sense now, thanks for the explanation.
Indeed, Lon's behaviour explains things. The kernel will always try to keep files cached in RAM, because if a file was rsynced by the user there's a chance the user might want to use that file later on and it would be much faster if the file was already in RAM and didn't need to be read from disk. If an application needs the RAM, we can always reclaim it very easily. The data is already on disk, so we just forget that we had it in the cache and use the RAM for something else ...