Description of problem: During a rebalance, the number of open files grow and grow until rebalance is stopped. I had to set fs.file-max to a insane high number and when I run a lsof on the glusterfsd process it mainly happen to files which were rebalanced to a new brick Kernel settings $ massh -f hq-dfs -oc 'sudo -i sysctl -a |grep fs.file-nr' hq-dfs-001 : fs.file-nr = 54528 0 10000000 hq-dfs-002 : fs.file-nr = 69280 0 10000000 hq-dfs-003 : fs.file-nr = 31424 0 10000000 hq-dfs-004 : fs.file-nr = 70496 0 10000000 $ massh -f hq-dfs -oc 'sudo -i sysctl -a |grep fs.file-nr' hq-dfs-001 : fs.file-nr = 54592 0 10000000 hq-dfs-002 : fs.file-nr = 69344 0 10000000 hq-dfs-003 : fs.file-nr = 31552 0 10000000 hq-dfs-004 : fs.file-nr = 70592 0 10000000 lsof snippet : glusterfs 1999 root *415u REG 253,1 0 2149736337 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-content/uploads/2008/08/gallblad-300x185.jpg (deleted) glusterfs 1999 root *416u REG 253,1 0 2149738632 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/general-template.php (deleted) glusterfs 1999 root *417u REG 253,1 0 2149738638 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/post-template.php (deleted) glusterfs 1999 root *418u REG 253,1 0 2152158338 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4326/files/2008/04/weighin2.jpg glusterfs 1999 root *419u REG 253,1 0 3222171248 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/images/wlw/wp-icon.png (deleted) glusterfs 1999 root *420u REG 253,1 0 3017973 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/images/smilies/icon_rolleyes.gif (deleted) glusterfs 1999 root *421u REG 253,1 0 2149740408 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/weight-tracker2/statusimg.php (deleted) glusterfs 1999 root *422u REG 253,1 0 1086685710 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4199/files/2008/03/workout-photo1jpg.thumbnail.JPG glusterfs 1999 root *423u REG 253,1 0 1086685709 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4199/files/2008/03/workout-photo1jpg.JPG glusterfs 1999 root *424u REG 253,1 0 2152158343 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4326/files/2008/10/halloween2.jpg glusterfs 1999 root *425u REG 253,1 0 2149738652 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/images/blank.gif (deleted) glusterfs 1999 root *426u REG 253,1 0 3017961 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/images/smilies/icon_lol.gif (deleted) ^Cglusterfs 1999 root *427u REG 253,1 0 2152158328 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4199/files/2008/05/jaalaandvan.jpg $ sudo gluster volume rebalance content1 status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 44096 3.8GB 376956 6 in progress 10.160.112.11 56876 6.2GB 331431 0 in progress hq-dfs-004 0 0Bytes 347574 0 in progress hq-dfs-003 2442 5.6MB 669449 25946 in progress Version-Release number of selected component (if applicable): 3.3.1 How reproducible: Just start rebalance and monitor open file handles Steps to Reproduce: 1. Start rebalancing 2. sudo -i sysctl -a |grep fs.file-nr 3. lsof -p <pid> | wc -l Actual results: FH is held open by glusterfsd Expected results: FH is released after file was moved Additional info:
Looks like I got a "corruption" mostly due to this bug : Couldn't go into the directory, mostly because of the reason that one node had a symlink to nowhere and NFS clients got a I/O error [2013-04-02 18:10:33.894179] W [client3_1-fops.c:2546:client3_1_opendir_cbk] 0-content1-client-1: remote operation failed: Too many open files. Path: /groovejob (00000000-0000-0000-0000-000000000000) What I had in the file system : hq-dfs-001 : File: `/vol/content1/groovejob' hq-dfs-001 : Size: 4096 Blocks: 8 IO Block: 4096 directory hq-dfs-001 : Device: fd01h/64769d Inode: 3367109578 Links: 20 hq-dfs-001 : Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) hq-dfs-001 : Access: 2013-03-28 07:13:32.402112000 -0700 hq-dfs-001 : Modify: 2013-04-05 13:52:29.404069612 -0700 hq-dfs-001 : Change: 2013-04-05 13:52:29.404069612 -0700 hq-dfs-002 : File: `/vol/content1/groovejob' hq-dfs-002 : Size: 4096 Blocks: 8 IO Block: 4096 directory hq-dfs-002 : Device: fd01h/64769d Inode: 210042751 Links: 20 hq-dfs-002 : Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) hq-dfs-002 : Access: 2013-03-28 07:13:32.402112000 -0700 hq-dfs-002 : Modify: 2013-04-05 13:52:29.404456841 -0700 hq-dfs-002 : Change: 2013-04-05 13:52:29.404456841 -0700 hq-dfs-003 : File: `/vol/content1/groovejob' -> `../../9d/15/9d153449-21a4-4107-943c-57db5b65e0e0/groovejob' hq-dfs-003 : Size: 58 Blocks: 0 IO Block: 4096 symbolic link hq-dfs-003 : Device: fd01h/64769d Inode: 229656214 Links: 1 hq-dfs-003 : Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) hq-dfs-003 : Access: 2013-04-05 09:55:44.185843408 -0700 hq-dfs-003 : Modify: 2013-04-05 09:55:44.185843408 -0700 hq-dfs-003 : Change: 2013-04-05 11:19:12.013368867 -0700 hq-dfs-004 : File: `/vol/content1/groovejob' hq-dfs-004 : Size: 4096 Blocks: 8 IO Block: 4096 directory hq-dfs-004 : Device: fd02h/64770d Inode: 2150209673 Links: 20 hq-dfs-004 : Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) hq-dfs-004 : Access: 2013-03-28 07:13:32.402112000 -0700 hq-dfs-004 : Modify: 2013-04-05 13:52:29.402760259 -0700 hq-dfs-004 : Change: 2013-04-05 13:52:29.402760259 -0700 Gluster error message: [2013-04-05 14:36:45.263278] E [dht-common.c:736:dht_lookup_everywhere_done] 0-content1-dht: path /groovejob exists as a file on one subvolume and directory on another. Please fix it manually [2013-04-05 14:36:45.263371] E [dht-rebalance.c:1142:gf_defrag_migrate_data] 0-content1-dht: /groovejob lookup failed I replaced the symlink with a directory and everything works fine.
Hi Bjoern, Please provide us the volume info. Additionally why was rebalance done? were you trying to expand the volume? Rebalance does open the files to migrate them. I see failures reported by rebalance in hq-dfs-003. Can you please provide the rebalance logs from the machine?
Hi, yes I expanded from 2 nodes to 3 and then 4. I'll try to attach the log but it is huge, even compressed. How can I transmit it to you? Here the volume info: Volume Name: content1 Type: Distribute Volume ID: 319a3f36-3369-4626-be07-99201d89abd6 Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: hq-dfs-001:/vol/content1 Brick2: hq-dfs-002:/vol/content1 Brick3: hq-dfs-003:/vol/content1 Brick4: hq-dfs-004:/vol/content1 Options Reconfigured: performance.cache-size: 268435456 performance.io-thread-count: 32
(In reply to comment #1) > Looks like I got a "corruption" mostly due to this bug : > > Couldn't go into the directory, mostly because of the reason that one node > had a symlink to nowhere and NFS clients got a I/O error > > [2013-04-02 18:10:33.894179] W [client3_1-fops.c:2546:client3_1_opendir_cbk] > 0-content1-client-1: remote operation failed: Too many open files. Path: > /groovejob (00000000-0000-0000-0000-000000000000) > > What I had in the file system : > > hq-dfs-001 : File: `/vol/content1/groovejob' > hq-dfs-001 : Size: 4096 Blocks: 8 IO Block: 4096 directory > hq-dfs-001 : Device: fd01h/64769d Inode: 3367109578 Links: 20 > hq-dfs-001 : Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ > root) > hq-dfs-001 : Access: 2013-03-28 07:13:32.402112000 -0700 > hq-dfs-001 : Modify: 2013-04-05 13:52:29.404069612 -0700 > hq-dfs-001 : Change: 2013-04-05 13:52:29.404069612 -0700 > hq-dfs-002 : File: `/vol/content1/groovejob' > hq-dfs-002 : Size: 4096 Blocks: 8 IO Block: 4096 directory > hq-dfs-002 : Device: fd01h/64769d Inode: 210042751 Links: 20 > hq-dfs-002 : Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ > root) > hq-dfs-002 : Access: 2013-03-28 07:13:32.402112000 -0700 > hq-dfs-002 : Modify: 2013-04-05 13:52:29.404456841 -0700 > hq-dfs-002 : Change: 2013-04-05 13:52:29.404456841 -0700 > hq-dfs-003 : File: `/vol/content1/groovejob' -> > `../../9d/15/9d153449-21a4-4107-943c-57db5b65e0e0/groovejob' > hq-dfs-003 : Size: 58 Blocks: 0 IO Block: 4096 symbolic > link > hq-dfs-003 : Device: fd01h/64769d Inode: 229656214 Links: 1 > hq-dfs-003 : Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ > root) > hq-dfs-003 : Access: 2013-04-05 09:55:44.185843408 -0700 > hq-dfs-003 : Modify: 2013-04-05 09:55:44.185843408 -0700 > hq-dfs-003 : Change: 2013-04-05 11:19:12.013368867 -0700 > hq-dfs-004 : File: `/vol/content1/groovejob' > hq-dfs-004 : Size: 4096 Blocks: 8 IO Block: 4096 directory > hq-dfs-004 : Device: fd02h/64770d Inode: 2150209673 Links: 20 > hq-dfs-004 : Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ > root) > hq-dfs-004 : Access: 2013-03-28 07:13:32.402112000 -0700 > hq-dfs-004 : Modify: 2013-04-05 13:52:29.402760259 -0700 > hq-dfs-004 : Change: 2013-04-05 13:52:29.402760259 -0700 > > Gluster error message: > > [2013-04-05 14:36:45.263278] E [dht-common.c:736:dht_lookup_everywhere_done] > 0-content1-dht: path /groovejob exists as a file on one subvolume and > directory on another. Please fix it manually > [2013-04-05 14:36:45.263371] E [dht-rebalance.c:1142:gf_defrag_migrate_data] > 0-content1-dht: /groovejob lookup failed > > > > I replaced the symlink with a directory and everything works fine. OK, you can ignore the "corruption" issue that was human error and created by someone who did file operations on one node instead over the translator. So I have only the open file handle issue open.
I have exactly the same symptoms, identical way to reproduce along with the results. Had a single brick volume (for the sake of getting clients mounted with native fuse client from the very beginning) and added another brick to distributed setup. After running rebalance for roughly 24 hours and ~800k files moved over i got an alert for low file handlers on both systems. Rebalance started to report failures as the new server ran out of handlers, blocking me even from logging in. Stopping rebalance freed up the handlers on the new server pretty much immediately but kept them reserved on the old server for additional 12 hours after the rebalance was stopped (i also unmounted clients and stopped/started the whole volume). Not sure if the latter was some kernel/lsof/reporting "issue" but nevertheless it all originates from the rebalance action.
REVIEW: http://review.gluster.org/4888 (distribute: Fix fds being leaked during rebalance) posted (#1) for review on release-3.3 by Kaushal M (kaushal)
*** Bug 956781 has been marked as a duplicate of this bug. ***
COMMIT: http://review.gluster.org/4888 committed in release-3.3 by Vijay Bellur (vbellur) ------ commit 45a9d1e70e0d3e0ab504643a3cd1bf3f0a3a43fd Author: Kaushal M <kaushal> Date: Fri Apr 26 11:39:15 2013 +0530 distribute: Fix fds being leaked during rebalance This patch is a backport of 2 patches from master branch which fixes the leak of fds during a rebalance process. The patches are, * libglusterfs/syncop: do not hold ref on the fd in cbk (e979c0de9dde14fe18d0ad7298c6da9cc878bbab) * cluster/distribute: Remove suprious fd_unref call (5d29e598665456b2b7250fdca14de7409098877a) Change-Id: Icea1d0b32cb3670f7decc24261996bca3fe816dc BUG: 928631 Signed-off-by: Kaushal M <kaushal> Reviewed-on: http://review.gluster.org/4888 Reviewed-by: Vijay Bellur <vbellur> Tested-by: Gluster Build System <jenkins.com>
Hello Kaushal, Thanks for the reply (in the duplicate). Will you release a 3.3.2 version ? I will appreciate new EPEL-5/6 RPM packages to upgrade my current version. Do you know when will 3.3.2/3.4 be released ?
Hi Laurent, There is going to be a 3.3.2 release and it will contains this fix. There have been a couple of qa releases of it already. But I don't know a possible release date for this version yet.
Thanks, I will try to be patient ;)
I'm testing this patch on glusterfs 3.3git-v3.3.2qa2-1-g45a9d1e and can confirm that there are no filedescriptor leaks (during the first few hours of my test at least ;-) However I'm seeing a disturbing number of claimed failures : Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ host1 0 0Bytes 84830 60238 in progress host2 0 0Bytes 123279 0 in progress host3 0 0Bytes 123279 0 in progress host4 0 0Bytes 123280 0 in progress And every brick logs line pairs like this : E [posix-helpers.c:721:posix_handle_pair] 0-vol01-posix: /bricks/a/testcopy50/50/98/06/somefile: key:trusted.glusterfs.dht.linkto error:File exists E [posix.c:1755:posix_create] 0-vol01-posix: setting xattrs on /bricks/a/testcopy50/50/98/06/somefile failed (File exists) Is this to be expected ? Related to this ticket ?
@hans : regarding failures ... I always get failure when rebalancing files without using the "force" option. So, every "rebalance start" finish with failure and no files (absolutely 0 files) transfered. But, if I use (as documented, after the "start") the "start force", then data are well moved and no more failure occurs. Why ? I really don't know. What the difference between "start" and "start force" in rebalance mode ??? I think I also get the same error "File exists" ... but it's not an error line : ---------- [2013-04-25 14:26:21.239650] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv1-client-3: remote operation failed: File exists. Path: /6 (00000000-0000-0000- 0000-000000000000) [2013-04-25 14:26:21.239792] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv1-client-2: remote operation failed: File exists. Path: /6 (00000000-0000-0000- 0000-000000000000) [2013-04-25 14:26:21.239877] E [dht-common.c:1911:dht_getxattr] 0-gv1-dht: layout is NULL [2013-04-25 14:26:21.239969] E [dht-rebalance.c:1150:gf_defrag_migrate_data] 0-gv1-dht: Failed to get node-uuid for /6 ---------- And, only to try to help, here is a list of discovered error when testing Gluster 3.3.1 : ---------- [2013-04-23 18:12:48.637318] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-gv1-replicate-1: background meta-data data entry missing-entry gfid self-heal failed on /gluster_test2_old/<hidden>.pdf [2013-04-23 18:12:49.226701] E [dht-common.c:1911:dht_getxattr] 0-gv1-dht: layout is NULL [2013-04-23 18:12:49.226740] E [dht-rebalance.c:1150:gf_defrag_migrate_data] 0-gv1-dht: Failed to get node-uuid for /gluster_test2_old/<hidden>.pdf [2013-04-23 18:18:29.720733] E [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap ----------