glusterfs client 3.13.2 is comsuming a lot of memory; here is the output of pmap: you see a lot of [anon] 0000562944f30000 92K r-x-- glusterfsd 0000562945147000 4K r---- glusterfsd 0000562945148000 8K rw--- glusterfsd 0000562946528000 464K rw--- [ anon ] 000056294659c000 132K rw--- [ anon ] 00007fb388000000 184K rw--- [ anon ] 00007fb38802e000 65352K ----- [ anon ] 00007fb390000000 15396K rw--- [ anon ] 00007fb390f09000 50140K ----- [ anon ] 00007fb394000000 6164K rw--- [ anon ] 00007fb394605000 59372K ----- [ anon ] 00007fb398000000 25304K rw--- [ anon ] 00007fb3998b6000 40232K ----- [ anon ] 00007fb39c000000 16588K rw--- [ anon ] 00007fb39d033000 48948K ----- [ anon ] 00007fb3a0000000 47672K rw--- [ anon ] 00007fb3a2e8e000 17864K ----- [ anon ] 00007fb3a4000000 31112K rw--- [ anon ] 00007fb3a5e62000 34424K ----- [ anon ] 00007fb3a8000000 65536K rw--- [ anon ] 00007fb3ac000000 45996K rw--- [ anon ] 00007fb3aeceb000 19540K ----- [ anon ] 00007fb3b0000000 62272K rw--- [ anon ] 00007fb3b3cd0000 3264K ----- [ anon ] 00007fb3b4000000 57056K rw--- [ anon ] 00007fb3b77b8000 8480K ----- [ anon ] 00007fb3b8000000 65536K rw--- [ anon ] 00007fb3bc000000 65536K rw--- [ anon ] 00007fb3c0000000 65536K rw--- [ anon ] 00007fb3c4000000 65536K rw--- [ anon ] 00007fb3c8000000 65536K rw--- [ anon ] 00007fb3cc000000 65536K rw--- [ anon ] 00007fb3d0000000 65536K rw--- [ anon ] 00007fb3d4000000 65536K rw--- [ anon ] 00007fb3d8000000 65536K rw--- [ anon ] 00007fb3dc000000 65536K rw--- [ anon ] 00007fb3e0000000 65536K rw--- [ anon ] 00007fb3e4000000 65536K rw--- [ anon ] 00007fb3e8000000 65536K rw--- [ anon ] 00007fb3ec000000 65536K rw--- [ anon ] 00007fb3f0000000 65536K rw--- [ anon ] 00007fb3f4000000 65536K rw--- [ anon ] 00007fb3f8000000 65536K rw--- [ anon ] 00007fb3fc000000 65524K rw--- [ anon ] 00007fb3ffffd000 12K ----- [ anon ] 00007fb400000000 65536K rw--- [ anon ] 00007fb404000000 65536K rw--- [ anon ] 00007fb408000000 65536K rw--- [ anon ] 00007fb40c000000 65536K rw--- [ anon ] 00007fb410000000 65536K rw--- [ anon ] 00007fb414000000 65536K rw--- [ anon ] 00007fb418000000 65536K rw--- [ anon ] 00007fb41c000000 65536K rw--- [ anon ] 00007fb420000000 65536K rw--- [ anon ] 00007fb424000000 65536K rw--- [ anon ] 00007fb428000000 65536K rw--- [ anon ] 00007fb42c000000 65536K rw--- [ anon ] 00007fb430000000 65536K rw--- [ anon ] 00007fb434000000 65536K rw--- [ anon ] 00007fb438000000 65536K rw--- [ anon ] 00007fb43c000000 65536K rw--- [ anon ] 00007fb440000000 65536K rw--- [ anon ] 00007fb444000000 65536K rw--- [ anon ] 00007fb448000000 65536K rw--- [ anon ] 00007fb44c000000 65536K rw--- [ anon ] 00007fb450000000 65536K rw--- [ anon ] 00007fb454000000 65536K rw--- [ anon ] 00007fb458000000 65536K rw--- [ anon ] 00007fb45c000000 65536K rw--- [ anon ] 00007fb460000000 65536K rw--- [ anon ] 00007fb464000000 65536K rw--- [ anon ] 00007fb468000000 65536K rw--- [ anon ] 00007fb46c000000 65536K rw--- [ anon ] 00007fb470000000 65536K rw--- [ anon ] 00007fb474000000 65536K rw--- [ anon ] 00007fb478000000 65536K rw--- [ anon ] 00007fb47c000000 65536K rw--- [ anon ] 00007fb480000000 65536K rw--- [ anon ] 00007fb484000000 65536K rw--- [ anon ] 00007fb488000000 65536K rw--- [ anon ] 00007fb48c000000 65536K rw--- [ anon ] 00007fb490000000 65536K rw--- [ anon ] 00007fb494000000 65536K rw--- [ anon ] 00007fb498000000 65536K rw--- [ anon ] 00007fb49c000000 65536K rw--- [ anon ] 00007fb4a0000000 65536K rw--- [ anon ] 00007fb4a4000000 65536K rw--- [ anon ] 00007fb4a8000000 65536K rw--- [ anon ] 00007fb4ac000000 65536K rw--- [ anon ] 00007fb4b0000000 65536K rw--- [ anon ] 00007fb4b4000000 65536K rw--- [ anon ] 00007fb4b8000000 65536K rw--- [ anon ] 00007fb4bc000000 65536K rw--- [ anon ] 00007fb4c0000000 65528K rw--- [ anon ] 00007fb4c3ffe000 8K ----- [ anon ] 00007fb4c4000000 65536K rw--- [ anon ] 00007fb4c8000000 65536K rw--- [ anon ] 00007fb4cc000000 65536K rw--- [ anon ] 00007fb4d0000000 65536K rw--- [ anon ] 00007fb4d4000000 65536K rw--- [ anon ] 00007fb4d8000000 65536K rw--- [ anon ] 00007fb4dc000000 65536K rw--- [ anon ] 00007fb4e0000000 65536K rw--- [ anon ] 00007fb4e4000000 65536K rw--- [ anon ] 00007fb4e8000000 65536K rw--- [ anon ] 00007fb4ec000000 65532K rw--- [ anon ] 00007fb4effff000 4K ----- [ anon ] 00007fb4f0000000 65536K rw--- [ anon ] 00007fb4f4000000 65536K rw--- [ anon ] 00007fb4f8000000 65536K rw--- [ anon ] 00007fb4fc000000 65536K rw--- [ anon ] 00007fb500000000 65536K rw--- [ anon ] 00007fb504000000 65536K rw--- [ anon ] 00007fb508000000 65536K rw--- [ anon ] 00007fb50c000000 65536K rw--- [ anon ] 00007fb510000000 65536K rw--- [ anon ] 00007fb514000000 65536K rw--- [ anon ] 00007fb518000000 65536K rw--- [ anon ] 00007fb51c000000 65536K rw--- [ anon ] 00007fb520000000 65536K rw--- [ anon ] 00007fb524000000 65536K rw--- [ anon ] 00007fb528000000 65536K rw--- [ anon ] 00007fb52c000000 65536K rw--- [ anon ] 00007fb530000000 65536K rw--- [ anon ] 00007fb534000000 65536K rw--- [ anon ] 00007fb538000000 65536K rw--- [ anon ] 00007fb53c000000 65536K rw--- [ anon ] 00007fb540000000 65536K rw--- [ anon ] 00007fb544000000 65536K rw--- [ anon ] 00007fb548000000 65536K rw--- [ anon ] 00007fb54c000000 65536K rw--- [ anon ] 00007fb550000000 65536K rw--- [ anon ] 00007fb554000000 65536K rw--- [ anon ] 00007fb558000000 65536K rw--- [ anon ] 00007fb55c000000 65536K rw--- [ anon ] 00007fb560000000 65536K rw--- [ anon ] 00007fb564000000 65536K rw--- [ anon ] 00007fb568000000 65536K rw--- [ anon ] 00007fb56c000000 65536K rw--- [ anon ] 00007fb570000000 65536K rw--- [ anon ] 00007fb574000000 65536K rw--- [ anon ] 00007fb578000000 65536K rw--- [ anon ] 00007fb57c000000 65536K rw--- [ anon ] 00007fb580000000 65536K rw--- [ anon ] 00007fb584000000 65536K rw--- [ anon ] 00007fb588000000 65536K rw--- [ anon ] 00007fb58c000000 65536K rw--- [ anon ] 00007fb590000000 65536K rw--- [ anon ] 00007fb594000000 65536K rw--- [ anon ] 00007fb598000000 65536K rw--- [ anon ] 00007fb59c000000 65536K rw--- [ anon ] 00007fb5a0000000 65536K rw--- [ anon ] 00007fb5a4000000 65536K rw--- [ anon ] 00007fb5a8000000 1084K rw--- [ anon ] 00007fb5a810f000 64452K ----- [ anon ] 00007fb5ac000000 65536K rw--- [ anon ] 00007fb5b0000000 1236K rw--- [ anon ] 00007fb5b0135000 64300K ----- [ anon ] 00007fb5b8000000 1096K rw--- [ anon ] 00007fb5b8112000 64440K ----- [ anon ] 00007fb5bc000000 1172K rw--- [ anon ] 00007fb5bc125000 64364K ----- [ anon ] 00007fb5c0000000 1128K rw--- [ anon ] 00007fb5c011a000 64408K ----- [ anon ] 00007fb5c4000000 1068K rw--- [ anon ] 00007fb5c410b000 64468K ----- [ anon ] 00007fb5c8000000 1124K rw--- [ anon ] 00007fb5c8119000 64412K ----- [ anon ] 00007fb5cc000000 1480K rw--- [ anon ] 00007fb5cc172000 64056K ----- [ anon ] 00007fb5d0000000 1540K rw--- [ anon ] 00007fb5d0181000 63996K ----- [ anon ] 00007fb5d4000000 1540K rw--- [ anon ] 00007fb5d4181000 63996K ----- [ anon ] 00007fb5d8000000 1608K rw--- [ anon ] 00007fb5d8192000 63928K ----- [ anon ] 00007fb5dc000000 1696K rw--- [ anon ] 00007fb5dc1a8000 63840K ----- [ anon ] 00007fb5e0000000 1604K rw--- [ anon ] 00007fb5e0191000 63932K ----- [ anon ] 00007fb5e4000000 1624K rw--- [ anon ] 00007fb5e4196000 63912K ----- [ anon ] 00007fb5e8000000 1664K rw--- [ anon ] 00007fb5e81a0000 63872K ----- [ anon ] 00007fb5ec000000 1728K rw--- [ anon ] 00007fb5ec1b0000 63808K ----- [ anon ] 00007fb5f0000000 132K rw--- [ anon ] 00007fb5f0021000 65404K ----- [ anon ] 00007fb5f4000000 132K rw--- [ anon ] 00007fb5f4021000 65404K ----- [ anon ] 00007fb5f8000000 65536K rw--- [ anon ] 00007fb5fc000000 892K rw--- [ anon ] 00007fb5fc0df000 64644K ----- [ anon ] 00007fb600000000 65536K rw--- [ anon ] 00007fb604000000 65536K rw--- [ anon ] 00007fb608000000 65536K rw--- [ anon ] 00007fb60c000000 65536K rw--- [ anon ] 00007fb6113fb000 4096K rw--- [ anon ] 00007fb6117fb000 4K ----- [ anon ] 00007fb6117fc000 8192K rw--- [ anon ] 00007fb611ffc000 4K ----- [ anon ] 00007fb611ffd000 8192K rw--- [ anon ] 00007fb6127fd000 4K ----- [ anon ] 00007fb6127fe000 8192K rw--- [ anon ] 00007fb612ffe000 4K ----- [ anon ] 00007fb612fff000 8192K rw--- [ anon ] 00007fb6137ff000 4K ----- [ anon ] 00007fb613800000 8192K rw--- [ anon ] 00007fb614000000 65536K rw--- [ anon ] 00007fb618000000 65536K rw--- [ anon ] 00007fb61c000000 65536K rw--- [ anon ] 00007fb620000000 65536K rw--- [ anon ] 00007fb624000000 1592K rw--- [ anon ] 00007fb62418e000 63944K ----- [ anon ] 00007fb6281a4000 1024K rw--- [ anon ] 00007fb6283a4000 4K ----- [ anon ] 00007fb6283a5000 1284K rw--- [ anon ] 00007fb6284e6000 4K ----- [ anon ] 00007fb6284e7000 8192K rw--- [ anon ] 00007fb628ce7000 4K ----- [ anon ] 00007fb628ce8000 8192K rw--- [ anon ] 00007fb6294e8000 4K ----- [ anon ] 00007fb6294e9000 8192K rw--- [ anon ] 00007fb629ce9000 4K ----- [ anon ] 00007fb629cea000 8192K rw--- [ anon ] 00007fb62a4ea000 4K ----- [ anon ] 00007fb62a4eb000 8192K rw--- [ anon ] 00007fb62aceb000 4K ----- [ anon ] 00007fb62acec000 17408K rw--- [ anon ] 00007fb62bdec000 44K r-x-- meta.so 00007fb62bdf7000 2048K ----- meta.so 00007fb62bff7000 4K r---- meta.so 00007fb62bff8000 32K rw--- meta.so 00007fb62c000000 65536K rw--- [ anon ] 00007fb630034000 4K ----- [ anon ] 00007fb630035000 256K rw--- [ anon ] 00007fb630075000 4K ----- [ anon ] 00007fb630076000 1504K rw--- [ anon ] 00007fb6301ee000 160K r-x-- io-stats.so 00007fb630216000 2048K ----- io-stats.so 00007fb630416000 4K r---- io-stats.so 00007fb630417000 32K rw--- io-stats.so 00007fb63041f000 36K r-x-- io-threads.so 00007fb630428000 2044K ----- io-threads.so 00007fb630627000 4K r---- io-threads.so 00007fb630628000 12K rw--- io-threads.so 00007fb63062b000 84K r-x-- md-cache.so 00007fb630640000 2044K ----- md-cache.so 00007fb63083f000 4K r---- md-cache.so 00007fb630840000 16K rw--- md-cache.so 00007fb630844000 36K r-x-- open-behind.so 00007fb63084d000 2044K ----- open-behind.so 00007fb630a4c000 4K r---- open-behind.so 00007fb630a4d000 8K rw--- open-behind.so 00007fb630a4f000 28K r-x-- quick-read.so 00007fb630a56000 2044K ----- quick-read.so 00007fb630c55000 4K r---- quick-read.so 00007fb630c56000 8K rw--- quick-read.so 00007fb630c58000 76K r-x-- io-cache.so 00007fb630c6b000 2044K ----- io-cache.so 00007fb630e6a000 4K r---- io-cache.so 00007fb630e6b000 8K rw--- io-cache.so 00007fb630e6d000 20K r-x-- readdir-ahead.so 00007fb630e72000 2044K ----- readdir-ahead.so 00007fb631071000 4K r---- readdir-ahead.so 00007fb631072000 8K rw--- readdir-ahead.so 00007fb631074000 52K r-x-- read-ahead.so 00007fb631081000 2044K ----- read-ahead.so 00007fb631280000 4K r---- read-ahead.so 00007fb631281000 8K rw--- read-ahead.so 00007fb631283000 72K r-x-- write-behind.so 00007fb631295000 2044K ----- write-behind.so 00007fb631494000 4K r---- write-behind.so 00007fb631495000 12K rw--- write-behind.so 00007fb631498000 632K r-x-- dht.so 00007fb631536000 2044K ----- dht.so 00007fb631735000 4K r---- dht.so 00007fb631736000 48K rw--- dht.so 00007fb631742000 492K r-x-- afr.so 00007fb6317bd000 2048K ----- afr.so 00007fb6319bd000 4K r---- afr.so 00007fb6319be000 60K rw--- afr.so 00007fb6319cd000 352K r-x-- client.so 00007fb631a25000 2048K ----- client.so 00007fb631c25000 4K r---- client.so 00007fb631c26000 24K rw--- client.so 00007fb631c2c000 4K ----- [ anon ] 00007fb631c2d000 8192K rw--- [ anon ] 00007fb63242d000 44K r-x-- libnss_files-2.23.so 00007fb632438000 2044K ----- libnss_files-2.23.so 00007fb632637000 4K r---- libnss_files-2.23.so 00007fb632638000 4K rw--- libnss_files-2.23.so 00007fb632639000 24K rw--- [ anon ] 00007fb63263f000 376K r-x-- libssl.so.1.0.0 00007fb63269d000 2048K ----- libssl.so.1.0.0 00007fb63289d000 16K r---- libssl.so.1.0.0 00007fb6328a1000 28K rw--- libssl.so.1.0.0 00007fb6328a8000 76K r-x-- socket.so 00007fb6328bb000 2044K ----- socket.so 00007fb632aba000 4K r---- socket.so 00007fb632abb000 44K rw--- socket.so 00007fb632ac6000 4K ----- [ anon ] 00007fb632ac7000 8192K rw--- [ anon ] 00007fb6332c7000 4K ----- [ anon ] 00007fb6332c8000 8192K rw--- [ anon ] 00007fb633ac8000 4K ----- [ anon ] 00007fb633ac9000 8192K rw--- [ anon ] 00007fb6342c9000 4K ----- [ anon ] 00007fb6342ca000 8192K rw--- [ anon ] 00007fb634aca000 4K ----- [ anon ] 00007fb634acb000 8192K rw--- [ anon ] 00007fb6352cb000 164K r-x-- fuse.so 00007fb6352f4000 2048K ----- fuse.so 00007fb6354f4000 4K r---- fuse.so 00007fb6354f5000 32K rw--- fuse.so 00007fb6354fd000 1632K r---- locale-archive 00007fb635695000 4096K rw--- [ anon ] 00007fb635b89000 4K ----- [ anon ] 00007fb635b8a000 256K rw--- [ anon ] 00007fb635bca000 4K ----- [ anon ] 00007fb635bcb000 256K rw--- [ anon ] 00007fb635c0b000 4K ----- [ anon ] 00007fb635c0c000 256K rw--- [ anon ] 00007fb635c4c000 4K ----- [ anon ] 00007fb635c4d000 256K rw--- [ anon ] 00007fb635c8d000 4K ----- [ anon ] 00007fb635c8e000 256K rw--- [ anon ] 00007fb635cce000 4K ----- [ anon ] 00007fb635ccf000 256K rw--- [ anon ] 00007fb635d0f000 4K ----- [ anon ] 00007fb635d10000 256K rw--- [ anon ] 00007fb635d50000 4K ----- [ anon ] 00007fb635d51000 256K rw--- [ anon ] 00007fb635d91000 4K ----- [ anon ] 00007fb635d92000 256K rw--- [ anon ] 00007fb635dd2000 4K ----- [ anon ] 00007fb635dd3000 256K rw--- [ anon ] 00007fb635e13000 4K ----- [ anon ] 00007fb635e14000 256K rw--- [ anon ] 00007fb635e54000 4K ----- [ anon ] 00007fb635e55000 2304K rw--- [ anon ] 00007fb636154000 4K ----- [ anon ] 00007fb636155000 256K rw--- [ anon ] 00007fb636195000 2152K r-x-- libcrypto.so.1.0.0 00007fb6363af000 2044K ----- libcrypto.so.1.0.0 00007fb6365ae000 112K r---- libcrypto.so.1.0.0 00007fb6365ca000 48K rw--- libcrypto.so.1.0.0 00007fb6365d6000 12K rw--- [ anon ] 00007fb6365d9000 28K r-x-- librt-2.23.so 00007fb6365e0000 2044K ----- librt-2.23.so 00007fb6367df000 4K r---- librt-2.23.so 00007fb6367e0000 4K rw--- librt-2.23.so 00007fb6367e1000 16K r-x-- libuuid.so.1.3.0 00007fb6367e5000 2044K ----- libuuid.so.1.3.0 00007fb6369e4000 4K r---- libuuid.so.1.3.0 00007fb6369e5000 4K rw--- libuuid.so.1.3.0 00007fb6369e6000 100K r-x-- libz.so.1.2.8 00007fb6369ff000 2044K ----- libz.so.1.2.8 00007fb636bfe000 4K r---- libz.so.1.2.8 00007fb636bff000 4K rw--- libz.so.1.2.8 00007fb636c00000 1792K r-x-- libc-2.23.so 00007fb636dc0000 2048K ----- libc-2.23.so 00007fb636fc0000 16K r---- libc-2.23.so 00007fb636fc4000 8K rw--- libc-2.23.so 00007fb636fc6000 16K rw--- [ anon ] 00007fb636fca000 96K r-x-- libpthread-2.23.so 00007fb636fe2000 2044K ----- libpthread-2.23.so 00007fb6371e1000 4K r---- libpthread-2.23.so 00007fb6371e2000 4K rw--- libpthread-2.23.so 00007fb6371e3000 16K rw--- [ anon ] 00007fb6371e7000 12K r-x-- libdl-2.23.so 00007fb6371ea000 2044K ----- libdl-2.23.so 00007fb6373e9000 4K r---- libdl-2.23.so 00007fb6373ea000 4K rw--- libdl-2.23.so 00007fb6373eb000 92K r-x-- libgfxdr.so.0.0.1 00007fb637402000 2044K ----- libgfxdr.so.0.0.1 00007fb637601000 4K r---- libgfxdr.so.0.0.1 00007fb637602000 4K rw--- libgfxdr.so.0.0.1 00007fb637603000 104K r-x-- libgfrpc.so.0.0.1 00007fb63761d000 2048K ----- libgfrpc.so.0.0.1 00007fb63781d000 4K r---- libgfrpc.so.0.0.1 00007fb63781e000 4K rw--- libgfrpc.so.0.0.1 00007fb63781f000 940K r-x-- libglusterfs.so.0.0.1 00007fb63790a000 2048K ----- libglusterfs.so.0.0.1 00007fb637b0a000 4K r---- libglusterfs.so.0.0.1 00007fb637b0b000 4K rw--- libglusterfs.so.0.0.1 00007fb637b0c000 16K rw--- [ anon ] 00007fb637b10000 152K r-x-- ld-2.23.so 00007fb637b4c000 220K rw--- [ anon ] 00007fb637b83000 4K ----- [ anon ] 00007fb637b84000 256K rw--- [ anon ] 00007fb637bc4000 1436K rw--- [ anon ] 00007fb637d2e000 28K r--s- gconv-modules.cache 00007fb637d35000 4K r---- ld-2.23.so 00007fb637d36000 4K rw--- ld-2.23.so 00007fb637d37000 4K rw--- [ anon ] 00007ffe46f54000 132K rw--- [ stack ] 00007ffe46f93000 12K r---- [ anon ] 00007ffe46f96000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 11113916K
By the time we found this, 3.13 EOLed, https://review.gluster.org/19647 is the fix. It is merged also in 4.0 release: https://review.gluster.org/19654 Could you try the fixed versions and let us know if you still see the issue? Pranith
I just tested the patch against 3.13.2, on a Debian 8 (just client part). Without the patch the FUSE 3.13.2 client reached >6G of virtual size (and stay) after extracting in loop a kernel tar.gz inside the FS about 20 times. With the patch at this time VSZ grows from ~400M to ~600M and seems to be stable now (6 loops performed so far). I let the tests running to be sure, but it is whatever far better than before. Thanks. -- Y.
So, more tests. I performed 16 loops of extracting the same archive onto the same place (overwriting content) in a FUSE mountpoint with patched 3.13.2 client. The volume is a x2 replica (default configuration) from 2 servers with the same OS (fresh Debian 8 64b) and the same glusterfs version (but not patched on servers). Without the patch after *each* FS operation the memory size of the client process grows. With the patch it grows until a certain point and then stay stable. The archive is the latest linux kernel tarball (~154M). Here is the client process VSZ/RSS over time: 427796 10788 (initial memory, just after mounting) 427796 10788 493332 21020 (starting extracting archive) 493332 27772 493332 45620 493332 63072 (…) 493332 88904 493332 104484 493332 128672 (…) 689940 223832 689940 228404 At this point memory size is stable. Later I started an other extraction of the same archive in an other target directory, while the main loop was still running. Memory increase again a little: 689940 232172 (…) 757612 363916 757612 373788 757612 383672 757612 394316 757612 404792 (…) 888684 455848 At this point memory size is again stable. So clearly the memory leak related to every operations is corrected, at least for my configuration / options (note: without the patch even listing content increased the memory size). In my point of view there is still a question: why the memory never reduce? Now all operations are over on the mountpoint (for ~4 hours now) and memory size is still exactly the same. I also then deleted all content from the mountpoint without any change. Is it an other kind of memory leak? Is it some kind of cached data? But if it is cached data it should have expired now, moreover after deleting all content (caching non-existing nodes don't seems useful). I can of course perform more tests if it can help, please let me know. By my side I will run other copies with other directory targets, in order to see if memory will still grows (a little now) and stay like this. Thanks, -- Y.
The issue is that I cannot update the clients until I'm sure that the patch is stable. Unfortunately I'm on live system and it was updated to 3.13 instead of 3.12. All help is highly appreciated.
(In reply to Yannick Perret from comment #3) > So, more tests. > > I performed 16 loops of extracting the same archive onto the same place > (overwriting content) in a FUSE mountpoint with patched 3.13.2 client. > The volume is a x2 replica (default configuration) from 2 servers with the > same OS (fresh Debian 8 64b) and the same glusterfs version (but not patched > on servers). > > Without the patch after *each* FS operation the memory size of the client > process grows. With the patch it grows until a certain point and then stay > stable. The archive is the latest linux kernel tarball (~154M). > > Here is the client process VSZ/RSS over time: > > 427796 10788 (initial memory, just after mounting) > 427796 10788 > 493332 21020 (starting extracting archive) > 493332 27772 > 493332 45620 > 493332 63072 > (…) > 493332 88904 > 493332 104484 > 493332 128672 > (…) > 689940 223832 > 689940 228404 > At this point memory size is stable. > > Later I started an other extraction of the same archive in an other target > directory, while the main loop was still running. Memory increase again a > little: > 689940 232172 > (…) > 757612 363916 > 757612 373788 > 757612 383672 > 757612 394316 > 757612 404792 > (…) > 888684 455848 > At this point memory size is again stable. > > > So clearly the memory leak related to every operations is corrected, at > least for my configuration / options (note: without the patch even listing > content increased the memory size). > > > In my point of view there is still a question: why the memory never reduce? > Now all operations are over on the mountpoint (for ~4 hours now) and memory > size is still exactly the same. > I also then deleted all content from the mountpoint without any change. > > Is it an other kind of memory leak? Is it some kind of cached data? But if > it is cached data it should have expired now, moreover after deleting all > content (caching non-existing nodes don't seems useful). > > > I can of course perform more tests if it can help, please let me know. > By my side I will run other copies with other directory targets, in order to > see if memory will still grows (a little now) and stay like this. > > Thanks, > -- > Y. What you can give me is the statedump of the client before running the test and statedump of the client after running these tests and deleting all the content you created as part of the test. With these two files, I can compare what grew to see if it is expected or something more needs to be fixed. kill -USR1 <pid-of-client-process> generates statedump at "/var/run/gluster" with the pattern glusterdump.<pid>.<timestamp> Upload these two files and we will have some data to analyse.
(In reply to MA from comment #4) > The issue is that I cannot update the clients until I'm sure that the patch > is stable. > Unfortunately I'm on live system and it was updated to 3.13 instead of 3.12. > All help is highly appreciated. I'm afraid the clients will be OOM killed after a point, if you don't downgrade to 3.12.x This bug is not present in 3.12.x
Created attachment 1411062 [details] Statedump after mount Statedump just after mount
Created attachment 1411065 [details] Statedump after archive extraction Statedump after extracting an archive into FS
Created attachment 1411074 [details] Statedump after cleanup Statedump after deleting every content
I performed the following steps: 1. cleanup FS content, and unmount 2. mount glusterfs volume (see comment #3 for details about OS and volume configuration) 3. create statedump "step1" 4. extraction of linux-4.16-rc5.tar.gz into the FS 5. create statedump "step2" 6. rm -rf all FS content 7. create statedump "step3" From 'ps' command I monitored VSZ and RSS during these steps: - just after mounting the FS: 427796/10488 - just after extracting the tarball: 757612/237544 (in between memory grows slowly from 427796 to 757612) - just after removing all content: 757612/241000 This last value is still the same after ~15 minutes the 'rm -rf' was performed, and did not changed after performing 'sync ; echo 3 >/proc/sys/vm/drop_caches' (to be sure). Here is the full 'ps aux' line: root 12118 8.7 5.9 757612 241000 ? Ssl 10:02 3:17 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=xx.yy.zz.ww --volfile-id=test-volume /root/MNT As said previously the initial bug that makes memory growing on each syscall seems corrected. But I would expect after removing all files (FS is currently empty) than memory falls back to a similar value than at start time. I would also expect memory to do the same after a long time of inactivity. Please let me know if I can help. Regards, -- Y.
At this time, comparing the two statedumps, I can see 3 sections with "high" values: [debug/io-stats.test-volume - usage-type gf_common_mt_strdup memusage] size=17450 num_allocs=393 max_size=3467963 max_num_allocs=67310 total_allocs=134621 [debug/io-stats.test-volume - usage-type gf_io_stats_mt_ios_stat memusage] size=198392 num_allocs=403 max_size=33924056 max_num_allocs=67319 total_allocs=67320 [debug/io-stats.test-volume - usage-type gf_io_stats_mt_ios_stat_list memusage] size=12896 num_allocs=403 max_size=12928 max_num_allocs=404 total_allocs=69767 These 3 seems related to "debug": I did not activated any debug option, neither on client nor on servers. Note: I found exactly the same sections again in the statedump file. -- Y.
The malloc implementation in glibc (the one gluster uses) does return the memory to the system only if the released memory belongs to the top of the heap. This means that a single allocation on the top of the heap prevents all other released memory from being returned to the system. Anyway, the memory is still free from the point of view of the application, so it can be reused when more memory is requested. One way to test if this is really the case would be to repeat the same test you did in comment #3 but before untaring to another directory, remove all previous data. This should release a lot of cached data that should be reused by the next untar, keeping memory usage (almost) constant. For example: mkdir /gluster/dir1 tar -C /gluster/dir1 -xf linux.tgz # check memory usage rm -rf /gluster/dir1 # or dropping caches mkdir /gluster/dir2 tar -C /gluster/dir2 -xf linux.tgz # check memory usage There shouldn't be a significant increase in memory usage after the second untar.
(In reply to Xavi Hernandez from comment #12) > The malloc implementation in glibc (the one gluster uses) does return the > memory to the system only if the released memory belongs to the top of the > heap. This means that a single allocation on the top of the heap prevents > all other released memory from being returned to the system. > > Anyway, the memory is still free from the point of view of the application, > so it can be reused when more memory is requested. > I understand this. And, so far, it seems to be confirmed by the fact that I need to perform *new additional* operations to make memory increased. But when I remove all files from the FS − I still performed several 'rm -rf MNT/*' − top of heap (which should have hold stuff about last files operations) should be free, and this memory should returns to the system. This is not the case so far: after a total cleanup of FS content the glusterfs process stay at the same memory value. In the context of few operations on the FS this is clearly not a problem. Neither for temporary mounts (i.e. some of our backups that are automounted). But at my office some of the volumes are users HOME, with about +300 people. They are mounted using NFS export (so without a real failover). Currently by just dealing with ~20 different subdirectories today (with various archives) I reached VSZ/RSS of 1216364/754568, How high would it grows if I use it on a +300 directories activity on a permanent mount? Regards, -- Y.
Hmmm… Please don't mind this previous comment. I tried something: the glusterfs process was still using 1216364/754568 (VSZ/RSS) and I started a small program that performed 'malloc' (and use allocated memory by writing inside allocated areas) in loops until it failed. After that I checked memory used by glusterfs process (for the record: all content destroyed by a 'rm -rf MNT/*'): root 12118 3.0 0.0 1216364 0 ? Ssl 10:02 20:59 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=xx.yy.zz.ww --volfile-id=test-volume /root/MNT Resident memory falls to 0. After a single 'ls -la' I got 1216364/2080. So VSZ don't changed but resident memory seems to be returned to the OS only when the OS needs fresh memory. So at this point all is fine for me (VSZ is not really pertinent) if RSS takes care of free-but-not-released-memory on memory pressure it is fine. In my point of view this bug is solved. Thanks to you guys. Regards, -- Y.
(In reply to Yannick Perret from comment #13) > But when I remove all files from the FS − I still performed several 'rm -rf > MNT/*' − top of heap (which should have hold stuff about last files > operations) should be free, and this memory should returns to the system. > This is not the case so far: after a total cleanup of FS content the > glusterfs process stay at the same memory value. This is not always true. While gluster is running, it uses memory for many internal operations that are not directly related to cached data. So even after having deleted all files, Gluster will still be using some memory, and it can happen that one of these blocks of memory comes from the top of the heap if it has been allocated (or reallocated) while data was being processed.