+++ This bug was initially created as a clone of Bug #1227996 +++ Description of problem: truncate() [note _not_ ftruncate()] on an object does not trigger signing. Furthermore, the file is _never_ signed upon subsequent modifications. Version-Release number of selected component (if applicable): mainline How reproducible: always Steps to Reproduce: 1. Create and start a Gluster volume 2. Enable bitrot 3. Mount the volume and follow the steps below # echo "ZZZ" > f0 -> wait for the object to get signed # echo "ZZZ" > f0 <-- truncate() -> "f0" is never signed unless the brick(s) restarted. Cause: fd leak in the truncate() code path in stub() never release()'es the fd, resulting in the object never getting signed. Additional info: # ls -l /proc/2658/fd total 0 lr-x------ 1 root root 64 Jun 4 09:02 0 -> /dev/null l-wx------ 1 root root 64 Jun 4 09:02 1 -> /dev/null lrwx------ 1 root root 64 Jun 4 09:02 10 -> socket:[63116] lrwx------ 1 root root 64 Jun 4 09:02 11 -> socket:[63094] lr-x------ 1 root root 64 Jun 4 09:02 12 -> /dev/urandom lrwx------ 1 root root 64 Jun 4 09:02 13 -> socket:[63105] lr-x------ 1 root root 64 Jun 4 09:02 14 -> /export2/reznor lrwx------ 1 root root 64 Jun 4 09:02 15 -> socket:[130558] lrwx------ 1 root root 64 Jun 4 09:02 16 -> socket:[130559] lrwx------ 1 root root 64 Jun 4 09:03 17 -> /export2/reznor/.glusterfs/65/33/6533c392-e0e5-43e6-857f-31620ce0c2a4 lrwx------ 1 root root 64 Jun 4 09:02 18 -> socket:[130561] l-wx------ 1 root root 64 Jun 4 09:02 2 -> /dev/null lrwx------ 1 root root 64 Jun 4 09:02 20 -> socket:[64160] lrwx------ 1 root root 64 Jun 4 09:02 3 -> anon_inode:[eventpoll] lrwx------ 1 root root 64 Jun 4 09:02 4 -> socket:[130463] l-wx------ 1 root root 64 Jun 4 09:02 5 -> /var/log/glusterfs/bricks/export2-reznor.log lrwx------ 1 root root 64 Jun 4 09:02 6 -> /var/lib/glusterd/vols/reznor/run/h3ckers-pride-export2-reznor.pid lrwx------ 1 root root 64 Jun 4 09:02 7 -> socket:[130465] lrwx------ 1 root root 64 Jun 4 09:02 8 -> socket:[63112] lrwx------ 1 root root 64 Jun 4 09:02 9 -> socket:[130474] fd number 17 is the leaked fd. --- Additional comment from Venky Shankar on 2015-06-03 23:42:56 EDT --- This was observed by Arthy (aloganat@) in the test setup. --- Additional comment from Venky Shankar on 2015-06-04 00:35:32 EDT --- In comment #1, the truncate() call [second "echo" command] needs to be timed with bitrot daemon sending a "reopn" call (internal bitd logic) for things to get tripped. What happens behind the scenes in this timing is not met is a memory leak (as a side effect of fd leak). --- Additional comment from Anand Avati on 2015-06-04 00:52:10 EDT --- REVIEW: http://review.gluster.org/11077 (features/bitrot: fix fd leak in truncate (stub)) posted (#1) for review on master by Venky Shankar (vshankar) --- Additional comment from Anand Avati on 2015-06-04 09:00:19 EDT --- REVIEW: http://review.gluster.org/11077 (features/bitrot: fix fd leak in truncate (stub)) posted (#2) for review on master by Venky Shankar (vshankar) --- Additional comment from Anand Avati on 2015-06-15 11:48:43 EDT --- REVIEW: http://review.gluster.org/11077 (features/bitrot: fix fd leak in truncate (stub)) posted (#3) for review on master by Venky Shankar (vshankar)
REVIEW: http://review.gluster.org/11300 (features/bitrot: fix fd leak in truncate (stub)) posted (#1) for review on release-3.7 by Venky Shankar (vshankar)
COMMIT: http://review.gluster.org/11300 committed in release-3.7 by Raghavendra Bhat (raghavendra) ------ commit c79977c23f6108128043986995fe2eacf35dc6ac Author: Venky Shankar <vshankar> Date: Thu Jun 4 10:07:38 2015 +0530 features/bitrot: fix fd leak in truncate (stub) Backport of http://review.gluster.org/#/c/11077 The need to perform object versioning in the truncate() code path required an fd to reuse existing versioning infrastructure that's used by fd based operations (such as writev(), ftruncate(), etc..). This tempted the use of anonymous fd which was never ever unref()'d after use resulting in fd and/or memory leak depending on the code path taken. Versioning resulted in a dangling file descriptor left open in the filesystem effecting the signing process of a given object (no release() would be trigerred, hence no signing would be performed). On the other hand, cases where the object need not be versioned, the anonymous fd in still ref()'d resulting in memory leak (NOTE: there's no "dangling" file descriptor in this case). Change-Id: I29c3d2af9bbc5cd4b8ddf38954080e3c7a44ba61 BUG: 1232179 Signed-off-by: Venky Shankar <vshankar> Reviewed-on: http://review.gluster.org/11300 Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra Bhat <raghavendra>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.2, please reopen this bug report. glusterfs-3.7.2 has been announced on the Gluster Packaging mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/packaging/2015-June/000006.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user