Description of problem: I launched a rebalance operation ~2 days ago on a 25x2 distributed-replicate volume. Ever since then, some of the rebalance processes' memory usage has been growing by about ~1 megabyte/minute, with no sign of stopping. Following http://gluster.org/community/documentation/index.php/High_Memory_Usage, I tried "echo 2 > /proc/sys/vm/drop_caches". This had no effect on the processes' memory usage. I had the same problem with 3.3.1. Here you can see ml01 growing at ~1M/minute: [root@ml01 ~]# while true; do echo $(date) $(ps aux --sort -rss|grep rebalance|head -n1); sleep 1m;done Thu Jul 18 10:56:21 EDT 2013 root 14974 7.0 13.2 2489900 2173076 ? Ssl Jul16 158:08 /usr/sbin/glusterfs -s localhost --volfile-id bigdatption *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off 702-0a04e25565f8 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.sock --pid-file /var/lib/gl Thu Jul 18 10:57:21 EDT 2013 root 14974 7.0 13.1 2489900 2169976 ? Ssl Jul16 158:10 /usr/sbin/glusterfs -s localhost --volfile-id bigdatption *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off 702-0a04e25565f8 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.sock --pid-file /var/lib/gl Thu Jul 18 10:58:21 EDT 2013 root 14974 7.0 13.2 2491880 2173472 ? Ssl Jul16 158:12 /usr/sbin/glusterfs -s localhost --volfile-id bigdatption *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off 702-0a04e25565f8 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.sock --pid-file /var/lib/gl Thu Jul 18 10:59:21 EDT 2013 root 14974 7.0 13.2 2493200 2174860 ? Ssl Jul16 158:14 /usr/sbin/glusterfs -s localhost --volfile-id bigdatption *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off 702-0a04e25565f8 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.sock --pid-file /var/lib/gl Thu Jul 18 11:00:21 EDT 2013 root 14974 7.0 13.2 2494124 2177076 ? Ssl Jul16 158:15 /usr/sbin/glusterfs -s localhost --volfile-id bigdatption *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off 702-0a04e25565f8 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.sock --pid-file /var/lib/gl Thu Jul 18 11:01:21 EDT 2013 root 14974 7.0 13.2 2494916 2177832 ? Ssl Jul16 158:16 /usr/sbin/glusterfs -s localhost --volfile-id bigdata --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=5c338e03-28ff-429b-b702-0a04e25565f8 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.sock --pid-file /var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.pid -l /var/log/glusterfs/bigdata-rebalance.log Thu Jul 18 11:02:21 EDT 2013 root 14974 7.0 13.2 2496104 2179144 ? Ssl Jul16 158:18 /usr/sbin/glusterfs -s localhost --volfile-id bigdata --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=5c338e03-28ff-429b-b702-0a04e25565f8 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.sock --pid-file /var/lib/glusterd/vols/bigdata/rebalance/5c338e03-28ff-429b-b702-0a04e25565f8.pid -l /var/log/glusterfs/bigdata-rebalance.log It doesn't affect all servers, though. ml59 looks fine: [root@ml59 ~]# while true; do echo $(date) $(ps aux --sort -rss|grep rebalance|head -n1); sleep 1m;done Thu Jul 18 11:00:24 EDT 2013 root 10448 2.3 0.6 1215004 900456 ? Ssl Jul16 52:45 /usr/sbin/glusterfs -s localhost --volfile-id bigdata --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=3e02cee9-5a22-4408-8263-baa0dabb3a13 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.sock --pid-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.pid -l /var/log/glusterfs/bigdata-rebalance.log Thu Jul 18 11:01:24 EDT 2013 root 10448 2.3 0.6 1215004 900464 ? Ssl Jul16 52:45 /usr/sbin/glusterfs -s localhost --volfile-id bigdata --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=3e02cee9-5a22-4408-8263-baa0dabb3a13 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.sock --pid-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.pid -l /var/log/glusterfs/bigdata-rebalance.log Thu Jul 18 11:02:25 EDT 2013 root 10448 2.3 0.6 1215004 900464 ? Ssl Jul16 52:45 /usr/sbin/glusterfs -s localhost --volfile-id bigdata --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=3e02cee9-5a22-4408-8263-baa0dabb3a13 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.sock --pid-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.pid -l /var/log/glusterfs/bigdata-rebalance.log Thu Jul 18 11:03:25 EDT 2013 root 10448 2.3 0.6 1215004 900456 ? Ssl Jul16 52:46 /usr/sbin/glusterfs -s localhost --volfile-id bigdata --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=3e02cee9-5a22-4408-8263-baa0dabb3a13 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.sock --pid-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.pid -l /var/log/glusterfs/bigdata-rebalance.log Thu Jul 18 11:04:25 EDT 2013 root 10448 2.3 0.6 1215004 900460 ? Ssl Jul16 52:46 /usr/sbin/glusterfs -s localhost --volfile-id bigdata --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=3e02cee9-5a22-4408-8263-baa0dabb3a13 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.sock --pid-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.pid -l /var/log/glusterfs/bigdata-rebalance.log Thu Jul 18 11:05:25 EDT 2013 root 10448 2.3 0.6 1215004 900456 ? Ssl Jul16 52:46 /usr/sbin/glusterfs -s localhost --volfile-id bigdata --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=3e02cee9-5a22-4408-8263-baa0dabb3a13 --socket-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.sock --pid-file /var/lib/glusterd/vols/bigdata/rebalance/3e02cee9-5a22-4408-8263-baa0dabb3a13.pid -l /var/log/glusterfs/bigdata-rebalance.log [root@ml01 glusterfs]# gluster volume rebalance bigdata status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 670 9.4MB 5091979 30813 in progress 135199.00 ml57 0 0Bytes 5092048 0 in progress 135198.00 ml59 56037 39.4GB 1574040 55162 in progress 135198.00 ml47 40431 14.5GB 1446407 60641 in progress 135198.00 ml56 10383 1.5GB 5008633 80063 in progress 135198.00 ml55 55960 24.0GB 1695961 27452 in progress 135198.00 ml26 0 0Bytes 5092045 0 in progress 135198.00 ml30 55851 27.5GB 1732056 41349 in progress 135198.00 ml29 0 0Bytes 5000584 0 in progress 135198.00 ml46 0 0Bytes 5000581 0 in progress 135198.00 ml44 0 0Bytes 5000819 0 in progress 135198.00 ml31 0 0Bytes 5001018 0 in progress 135198.00 ml25 3829 1.1GB 5003797 48727 in progress 135198.00 ml43 46461 13.3GB 1297866 20939 in progress 135198.00 ml54 0 0Bytes 5092055 5 in progress 135198.00 ml45 54215 10.5GB 1158990 32371 in progress 135198.00 ml40 50738 24.6GB 2666714 90141 in progress 135198.00 ml52 0 0Bytes 5000662 0 in progress 135198.00 ml48 0 0Bytes 5092022 0 in progress 135198.00 ml41 0 0Bytes 5092064 0 in progress 135198.00 ml51 59019 12.2GB 1072878 4101 in progress 135198.00 How reproducible: Always, but it doesn't happen across all servers. Steps to Reproduce: 1. gluster volume rebalance $volname start 2. watch the memory usage of the resulting glusterfs processes Actual results: Memory usage grows by ~1M per minute on some servers. Expected results: Memory usage should remain relatively constant. Additional info: # gluster volume info Volume Name: bigdata Type: Distributed-Replicate Volume ID: 56498956-7b4b-4ee3-9d2b-4c8cfce26051 Status: Started Number of Bricks: 25 x 2 = 50 Transport-type: tcp Bricks: Brick1: ml43:/mnt/donottouch/localb/brick Brick2: ml44:/mnt/donottouch/localb/brick Brick3: ml43:/mnt/donottouch/localc/brick Brick4: ml44:/mnt/donottouch/localc/brick Brick5: ml45:/mnt/donottouch/localb/brick Brick6: ml46:/mnt/donottouch/localb/brick Brick7: ml45:/mnt/donottouch/localc/brick Brick8: ml46:/mnt/donottouch/localc/brick Brick9: ml47:/mnt/donottouch/localb/brick Brick10: ml48:/mnt/donottouch/localb/brick Brick11: ml47:/mnt/donottouch/localc/brick Brick12: ml48:/mnt/donottouch/localc/brick Brick13: ml45:/mnt/donottouch/locald/brick Brick14: ml46:/mnt/donottouch/locald/brick Brick15: ml47:/mnt/donottouch/locald/brick Brick16: ml48:/mnt/donottouch/locald/brick Brick17: ml51:/mnt/donottouch/localb/brick Brick18: ml52:/mnt/donottouch/localb/brick Brick19: ml51:/mnt/donottouch/localc/brick Brick20: ml52:/mnt/donottouch/localc/brick Brick21: ml51:/mnt/donottouch/locald/brick Brick22: ml52:/mnt/donottouch/locald/brick Brick23: ml59:/mnt/donottouch/locald/brick Brick24: ml54:/mnt/donottouch/locald/brick Brick25: ml59:/mnt/donottouch/localc/brick Brick26: ml54:/mnt/donottouch/localc/brick Brick27: ml59:/mnt/donottouch/localb/brick Brick28: ml54:/mnt/donottouch/localb/brick Brick29: ml55:/mnt/donottouch/localb/brick Brick30: ml29:/mnt/donottouch/localb/brick Brick31: ml55:/mnt/donottouch/localc/brick Brick32: ml29:/mnt/donottouch/localc/brick Brick33: ml30:/mnt/donottouch/localc/brick Brick34: ml31:/mnt/donottouch/localc/brick Brick35: ml30:/mnt/donottouch/localb/brick Brick36: ml31:/mnt/donottouch/localb/brick Brick37: ml40:/mnt/donottouch/localb/brick Brick38: ml41:/mnt/donottouch/localb/brick Brick39: ml40:/mnt/donottouch/localc/brick Brick40: ml41:/mnt/donottouch/localc/brick Brick41: ml56:/mnt/donottouch/localb/brick Brick42: ml57:/mnt/donottouch/localb/brick Brick43: ml56:/mnt/donottouch/localc/brick Brick44: ml57:/mnt/donottouch/localc/brick Brick45: ml25:/mnt/donottouch/localb/brick Brick46: ml26:/mnt/donottouch/localb/brick Brick47: ml01:/mnt/donottouch/localb/brick Brick48: ml25:/mnt/donottouch/localc/brick Brick49: ml01:/mnt/donottouch/localc/brick Brick50: ml26:/mnt/donottouch/localc/brick Options Reconfigured: performance.quick-read: on nfs.disable: on nfs.register-with-portmap: OFF # gluster volume status Status of volume: bigdata Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick ml43:/mnt/donottouch/localb/brick 49152 Y 1202 Brick ml44:/mnt/donottouch/localb/brick 49152 Y 12997 Brick ml43:/mnt/donottouch/localc/brick 49153 Y 1206 Brick ml44:/mnt/donottouch/localc/brick 49153 Y 13003 Brick ml45:/mnt/donottouch/localb/brick 49152 Y 18330 Brick ml46:/mnt/donottouch/localb/brick 49152 Y 5408 Brick ml45:/mnt/donottouch/localc/brick 49153 Y 18336 Brick ml46:/mnt/donottouch/localc/brick 49153 Y 5412 Brick ml47:/mnt/donottouch/localb/brick 49152 Y 4188 Brick ml48:/mnt/donottouch/localb/brick 49152 Y 19622 Brick ml47:/mnt/donottouch/localc/brick 49153 Y 4192 Brick ml48:/mnt/donottouch/localc/brick 49153 Y 19626 Brick ml45:/mnt/donottouch/locald/brick 49154 Y 18341 Brick ml46:/mnt/donottouch/locald/brick 49154 Y 5418 Brick ml47:/mnt/donottouch/locald/brick 49154 Y 4197 Brick ml48:/mnt/donottouch/locald/brick 49154 Y 19632 Brick ml51:/mnt/donottouch/localb/brick 49152 Y 14905 Brick ml52:/mnt/donottouch/localb/brick 49152 Y 17792 Brick ml51:/mnt/donottouch/localc/brick 49153 Y 14909 Brick ml52:/mnt/donottouch/localc/brick 49153 Y 17796 Brick ml51:/mnt/donottouch/locald/brick 49154 Y 14914 Brick ml52:/mnt/donottouch/locald/brick 49154 Y 17801 Brick ml59:/mnt/donottouch/locald/brick 49152 Y 9806 Brick ml54:/mnt/donottouch/locald/brick 49152 Y 31252 Brick ml59:/mnt/donottouch/localc/brick 49153 Y 9810 Brick ml54:/mnt/donottouch/localc/brick 49153 Y 31257 Brick ml59:/mnt/donottouch/localb/brick 49154 Y 9816 Brick ml54:/mnt/donottouch/localb/brick 49154 Y 31271 Brick ml55:/mnt/donottouch/localb/brick 49152 Y 8592 Brick ml29:/mnt/donottouch/localb/brick 49152 Y 26350 Brick ml55:/mnt/donottouch/localc/brick 49153 Y 8593 Brick ml29:/mnt/donottouch/localc/brick 49153 Y 26356 Brick ml30:/mnt/donottouch/localc/brick 49152 Y 29093 Brick ml31:/mnt/donottouch/localc/brick 49152 Y 26159 Brick ml30:/mnt/donottouch/localb/brick 49153 Y 29099 Brick ml31:/mnt/donottouch/localb/brick 49153 Y 26164 Brick ml40:/mnt/donottouch/localb/brick 49152 Y 11005 Brick ml41:/mnt/donottouch/localb/brick 49152 Y 20418 Brick ml40:/mnt/donottouch/localc/brick 49153 Y 11011 Brick ml41:/mnt/donottouch/localc/brick 49153 Y 20424 Brick ml56:/mnt/donottouch/localb/brick 49152 Y 1704 Brick ml57:/mnt/donottouch/localb/brick 49152 Y 1326 Brick ml56:/mnt/donottouch/localc/brick 49153 Y 1708 Brick ml57:/mnt/donottouch/localc/brick 49153 Y 1330 Brick ml25:/mnt/donottouch/localb/brick 49152 Y 6761 Brick ml26:/mnt/donottouch/localb/brick 49152 Y 590 Brick ml01:/mnt/donottouch/localb/brick 49152 Y 13431 Brick ml25:/mnt/donottouch/localc/brick 49153 Y 6765 Brick ml01:/mnt/donottouch/localc/brick 49153 Y 13435 Brick ml26:/mnt/donottouch/localc/brick 49153 Y 596 Self-heal Daemon on localhost N/A Y 9824 Self-heal Daemon on ml40 N/A Y 11019 Self-heal Daemon on ml45 N/A Y 18350 Self-heal Daemon on ml41 N/A Y 20432 Self-heal Daemon on ml43 N/A Y 2128 Self-heal Daemon on ml52 N/A Y 17810 Self-heal Daemon on ml54 N/A Y 31267 Self-heal Daemon on ml44 N/A Y 13011 Self-heal Daemon on ml29 N/A Y 26364 Self-heal Daemon on ml57 N/A Y 1340 Self-heal Daemon on ml47 N/A Y 4206 Self-heal Daemon on ml30 N/A Y 29107 Self-heal Daemon on ml56 N/A Y 1716 Self-heal Daemon on ml51 N/A Y 14923 Self-heal Daemon on ml55 N/A Y 8604 Self-heal Daemon on ml48 N/A Y 19640 Self-heal Daemon on ml31 N/A Y 26172 Self-heal Daemon on 138.15.169.24 N/A Y 13445 Self-heal Daemon on ml46 N/A Y 5426 Self-heal Daemon on ml26 N/A Y 604 Self-heal Daemon on ml25 N/A Y 6773 Task ID Status ---- -- ------ Rebalance 1f4a8910-17ed-41a3-b10e-06fe32e4b517 1 # gluster volume rebalance bigdata status Node Rebalanced-files size scanned failures status run time in secs --------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 670 9.4MB 4979650 30813 in progress 133405.00 ml57 0 0Bytes 4979576 0 in progress 133404.00 ml59 55176 39.1GB 1573179 55162 in progress 133404.00 ml47 39732 14.4GB 1435089 60148 in progress 133404.00 ml56 10383 1.5GB 4967580 80063 in progress 133404.00 ml55 55091 24.0GB 1694495 26855 in progress 133404.00 ml26 0 0Bytes 4979445 0 in progress 133404.00 ml30 55103 27.5GB 1713284 40820 in progress 133404.00 ml29 0 0Bytes 4959041 0 in progress 133404.00 ml46 0 0Bytes 4958944 0 in progress 133404.00 ml44 0 0Bytes 4959757 0 in progress 133404.00 ml31 0 0Bytes 4959638 0 in progress 133404.00 ml25 3829 1.1GB 4962426 48727 in progress 133404.00 ml43 46134 13.3GB 1291150 20734 in progress 133404.00 ml54 0 0Bytes 4979725 5 in progress 133404.00 ml45 53783 10.5GB 1153392 32224 in progress 133404.00 ml40 50688 24.6GB 2544998 86675 in progress 133404.00 ml52 0 0Bytes 4959380 0 in progress 133404.00 ml48 0 0Bytes 4979771 0 in progress 133404.00 ml41 0 0Bytes 4979592 0 in progress 133404.00 ml51 58354 12.0GB 1066889 4092 in progress 133404.00 volume rebalance: bigdata: success: # cat /etc/system-release Scientific Linux release 6.1 (Carbon) # uname -a Linux ml59 2.6.32-131.17.1.el6.x86_64 #1 SMP Wed Oct 5 17:19:54 CDT 2011 x86_64 x86_64 x86_64 GNU/Linux # rpm -qa|grep gluster glusterfs-server-3.4.0-1.el6.x86_64 glusterfs-fuse-3.4.0-1.el6.x86_64 glusterfs-debuginfo-3.4.0-1.el6.x86_64 glusterfs-3.4.0-1.el6.x86_64 # ssh ml01 tail /var/log/glusterfs/bigdata-rebalance.log [2013-07-18 14:58:14.649927] E [dht-helper.c:1052:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7f282da38364]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.649959] E [dht-helper.c:1071:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7f282da38382]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.650018] E [dht-common.c:2157:dht_getxattr] 0-bigdata-dht: layout is NULL [2013-07-18 14:58:14.650047] E [dht-rebalance.c:1167:gf_defrag_migrate_data] 0-bigdata-dht: Failed to get node-uuid for /user/nope/lalala/foobar.baz [2013-07-18 14:58:14.709095] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-5: remote operation failed: File exists. Path: /user/nope/lalala/top-neg-loss (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.709189] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-4: remote operation failed: File exists. Path: /user/nope/lalala/top-neg-loss (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.709262] E [dht-helper.c:1052:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7f282da38364]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.709294] E [dht-helper.c:1071:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7f282da38382]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.709350] E [dht-common.c:2157:dht_getxattr] 0-bigdata-dht: layout is NULL [2013-07-18 14:58:14.709371] E [dht-rebalance.c:1167:gf_defrag_migrate_data] 0-bigdata-dht: Failed to get node-uuid for /user/nope/lalala/top-neg-loss [2013-07-18 14:58:14.729213] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-5: remote operation failed: File exists. Path: /user/nope/lalala/early-011-epi-nope-pairs-softmax-16.trn.allidx (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.729304] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-4: remote operation failed: File exists. Path: /user/nope/lalala/early-011-epi-nope-pairs-softmax-16.trn.allidx (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.729373] E [dht-helper.c:1052:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7f282da38364]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.729405] E [dht-helper.c:1071:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7f282da38382]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.729461] E [dht-common.c:2157:dht_getxattr] 0-bigdata-dht: layout is NULL [2013-07-18 14:58:14.729482] E [dht-rebalance.c:1167:gf_defrag_migrate_data] 0-bigdata-dht: Failed to get node-uuid for /user/nope/lalala/early-011-epi-nope-pairs-softmax-16.trn.allidx [2013-07-18 14:58:14.768502] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-5: remote operation failed: File exists. Path: /user/nope/lalala/early-015-epi-nope-pairs-boost1_.2-16-1.fulltrn.combine.pos.repo.data.ascii (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.768592] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-4: remote operation failed: File exists. Path: /user/nope/lalala/early-015-epi-nope-pairs-boost1_.2-16-1.fulltrn.combine.pos.repo.data.ascii (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.768662] E [dht-helper.c:1052:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7f282da38364]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.768693] E [dht-helper.c:1071:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7f282da38382]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.768748] E [dht-common.c:2157:dht_getxattr] 0-bigdata-dht: layout is NULL [2013-07-18 14:58:14.768789] E [dht-rebalance.c:1167:gf_defrag_migrate_data] 0-bigdata-dht: Failed to get node-uuid for /user/nope/lalala/early-015-epi-nope-pairs-boost1_.2-16-1.fulltrn.combine.pos.repo.data.ascii [2013-07-18 14:58:14.840386] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-5: remote operation failed: File exists. Path: /user/nope/lalala/early-013-epi-nope-pairs-boost1_.2-16-2.123.txt (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.840485] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-4: remote operation failed: File exists. Path: /user/nope/lalala/early-013-epi-nope-pairs-boost1_.2-16-2.123.txt (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.840555] E [dht-helper.c:1052:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7f282da38364]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.840587] E [dht-helper.c:1071:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7f282da38382]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.840643] E [dht-common.c:2157:dht_getxattr] 0-bigdata-dht: layout is NULL [2013-07-18 14:58:14.840664] E [dht-rebalance.c:1167:gf_defrag_migrate_data] 0-bigdata-dht: Failed to get node-uuid for /user/nope/lalala/early-013-epi-nope-pairs-boost1_.2-16-2.123.txt [2013-07-18 14:58:14.881825] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-5: remote operation failed: File exists. Path: /user/nope/lalala/early-015-epi-nope-pairs-boost1_.2-16-1.boosttrn.out (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.881915] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-4: remote operation failed: File exists. Path: /user/nope/lalala/early-015-epi-nope-pairs-boost1_.2-16-1.boosttrn.out (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:14.881982] E [dht-helper.c:1052:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7f282da38364]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.882014] E [dht-helper.c:1071:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7f282da38382]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:14.882076] E [dht-common.c:2157:dht_getxattr] 0-bigdata-dht: layout is NULL [2013-07-18 14:58:14.882097] E [dht-rebalance.c:1167:gf_defrag_migrate_data] 0-bigdata-dht: Failed to get node-uuid for /user/nope/lalala/early-015-epi-nope-pairs-boost1_.2-16-1.boosttrn.out [2013-07-18 14:58:15.594996] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-11: remote operation failed: File exists. Path: /user/nope/lalala/early-029-epi-nope-pairs-boost1_.2-16-2.123.txt (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:15.595085] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-bigdata-client-10: remote operation failed: File exists. Path: /user/nope/lalala/early-029-epi-nope-pairs-boost1_.2-16-2.123.txt (00000000-0000-0000-0000-000000000000) [2013-07-18 14:58:15.595151] E [dht-helper.c:1052:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7f282da38364]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:15.595184] E [dht-helper.c:1071:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f282da50025] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f282da3701e] (-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7f282da38382]))) 0-bigdata-dht: invalid argument: inode [2013-07-18 14:58:15.595251] E [dht-common.c:2157:dht_getxattr] 0-bigdata-dht: layout is NULL [2013-07-18 14:58:15.595276] E [dht-rebalance.c:1167:gf_defrag_migrate_data] 0-bigdata-dht: Failed to get node-uuid for /user/nope/lalala/early-029-epi-nope-pairs-boost1_.2-16-2.123.txt [2013-07-18 14:58:15.608610] I [dht-rebalance.c:1268:gf_defrag_migrate_data] 0-bigdata-dht: Migration operation on dir /user/nope/lalala took 1.97 secs [2013-07-18 14:58:15.676502] I [dht-common.c:2615:dht_setxattr] 0-bigdata-dht: fixing the layout of /user/nope/lalala/breast-gregoire [2013-07-18 14:58:15.720579] I [dht-rebalance.c:1069:gf_defrag_migrate_data] 0-bigdata-dht: migrate data called on /user/nope/lalala/breast-gregoire [2013-07-18 14:58:16.241863] I [dht-rebalance.c:1268:gf_defrag_migrate_data] 0-bigdata-dht: Migration operation on dir /user/nope/lalala/breast-gregoire took 0.52 secs [2013-07-18 14:58:16.623712] I [dht-common.c:2615:dht_setxattr] 0-bigdata-dht: fixing the layout of /user/nope/epi-nope-pairs-real4 [2013-07-18 14:58:16.673556] I [dht-rebalance.c:1069:gf_defrag_migrate_data] 0-bigdata-dht: migrate data called on /user/nope/epi-nope-pairs-real4 [2013-07-18 14:58:16.740587] I [dht-rebalance.c:1268:gf_defrag_migrate_data] 0-bigdata-dht: Migration operation on dir /user/nope/epi-nope-pairs-real4 took 0.07 secs [2013-07-18 14:58:16.865435] I [dht-common.c:2615:dht_setxattr] 0-bigdata-dht: fixing the layout of /user/nope/epi-nope-patches [2013-07-18 14:58:17.150570] I [dht-rebalance.c:1069:gf_defrag_migrate_data] 0-bigdata-dht: migrate data called on /user/nope/epi-nope-patches
This issue already existed for me on Gluster 3.3.x Any brick action like rebalancing or removing causes the glusterfsd process to eat more and more memory. After the file handle bug another serious issue, I never was able to complete a rebalance because my filesystem is 36 million inodes This process as example had 22% memory consumption yesterday, now at 30%. I just have to wait until the VMM kills the process .... root 21446 16.5 30.6 10380448 10061080 ? Ssl Oct15 481:04 /usr/sbin/glusterfs -s localhost --volfile-id content1 --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.rebalance-cmd=5 --xlator-option *dht.node-uuid=88a7a0a0-91e6-4228-8d66-5b11b3b4e7b2 --socket-file /var/lib/glusterd/vols/content1/rebalance/88a7a0a0-91e6-4228-8d66-5b11b3b4e7b2.sock --pid-file /var/lib/glusterd/vols/content1/rebalance/88a7a0a0-91e6-4228-8d66-5b11b3b4e7b2.pid -l /var/log/glusterfs/content1-rebalance.log
This is still a problem in 3.4.1.
This bug was fixed as part of https://bugzilla.redhat.com/show_bug.cgi?id=1144792 and whose fix will be available in glusterfs-3.4.6. Hence, moving the state of the bug to MODIFIED.
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5. This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs". If there is no response by the end of the month, this bug will get automatically closed.
GlusterFS 3.4.x has reached end-of-life. If this bug still exists in a later release please reopen this and change the version or open a new bug.