Hide Forgot
This issue affects a setup where Gluster is used as the root file system. The files being heavily accessed in this use case are the shared libraries (there was a previous issue with leaks related to this), and the RPM database (being accessed many, many times. The procedure that makes the leak particularly obvious is making the initrd for Open Shared Root (what I use for booting Gluster as root). Note - logging appears to be broken in this case - log files never appear on the initial root file system. Thus, I cannot provide them. This configuration does NOT result in a memory leak: Mount with: mount -t glusterfs -o defaults,noatime,nodiratime,direct-io-mode=off,log-file=/dev/null,log-level=NONE /etc/glusterfs/root.vol /mnt/newroot Peers: Fully connected to peer. This configuration DOES result in a leak: Mount with: mount -t glusterfs -o defaults,noatime,nodiratime /etc/glusterfs/root.vol /mnt/newroot Peers: Completely disconnected, network unplugged. This, thus, _appears_ to be related to one of these options: direct-io-mode=off,log-file=/dev/null,log-level=NONE Disabling direct-io or logging appears to suppress the leak. When the leak is occuring, the root fs gluster process grows to over 350MB during a single test pass (a single mkinitrd). Another symptom is very elevated CPU usage. Normally glusterfs process would use about 25-30% CPU during this operation in the non-leaky case above. In the leaky case, CPU usage goes to 100% for the duration of the operation. It also runs slower, since glusterfs process becomes the bottleneck. The operation that causes major leakage is making the initrd for Open Shared Root (much more involved than normal mkinitrd, as it makes repeated calls to the rpm database, which, in this instance, is in SQLite format because BDB flatly refuses to work on GlusterFS). Leakage is approximately 1MB every 5 seconds during the operation. Volume file used to mount the gluster root file system is attached.
There was a patch submitted by gowda and accepted regarding a memory leak in logging. http://patches.gluster.com/patch/679/ The commit id is 8d74fe9ba6c3c2f68f71cacd56fad9328b8b0090 Can this fix the leak observed above? regards, Raghavendra
The patch is for 2.1 development branch. 2.0.3 does not contain the code in which leak was fixed. Sorry for confusion.
I did a bit more research, and I have to apologize - the mount parameters were a complete red herring. I just reproduced the issue with both sets of parameters, that isn't what causes the leak. The condition that appears to cause the leak is actually a misconfiguration. Have a look at the attached spec file. Run this spec file on the machine that has the IP address 10.2.0.11. This causes the machine to talk to the local file system and talk to itself as the peer. This is clearly an error condition, but rather than failing or ignoring this, glusterfsd proceeds to bloat until it uses up all memory and OOMs. So, it looks like this edge/error case both works when it should abort and leaks copious amounts of memory in the process.
> This is clearly an error condition, but rather than failing or ignoring this, > glusterfsd proceeds to bloat until it uses up all memory and OOMs. So, it looks > like this edge/error case both works when it should abort and leaks copious > amounts of memory in the process. Fixing this would be a bit tricky since GlusterFS legally permits local connections (and in fact optimizes such connections by converting them to local function calls). It would be a bit tricky to differentiate good local connections against such mistakes. Avati
How about giving each server instance a (pseudo-unique) "ID", and rejecting the connection when two "server" instances have the same ID? Perhaps something based on the MAC of the first NIC, concatenated with the PID of the server process, and perhaps concatenated with underlying storage path or similar? I'm not sure how this would all interact in the case where a single process serves multiple server and/or client instances, though.
Hi Gordon, We're now recommending the use of glusterfs-volgen to generate volume specs, which does a better job of preventing such loops. Are you able to avoid the problematic config by using the volgen tool? I am trying to determine if this bug can be closed with the introduction of glusterfs-volgen tool. Thanks
I guess volgen is a somewhat mitigating circumstance, but I'm concerned about configuration error conditions like this not getting caught at start time.
(In reply to comment #7) > I guess volgen is a somewhat mitigating circumstance, but I'm concerned about > configuration error conditions like this not getting caught at start time. Since 3.1, we support only configurations generated by gluster command line and hence there is no possibility of having loops in configuration.