Bug 928631 - Rebalance leaves file handler open
Summary: Rebalance leaves file handler open
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.3.1
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
: 956781 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-28 05:57 UTC by Bjoern Teipel
Modified: 2013-07-24 17:59 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:59:02 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Bjoern Teipel 2013-03-28 05:57:12 UTC
Description of problem:
During a rebalance, the number of open files grow and grow until rebalance is stopped.
I had to set fs.file-max to a insane high number and when I run a lsof on the glusterfsd process it mainly happen to files which were rebalanced to a new brick

Kernel settings
$ massh -f hq-dfs -oc 'sudo -i sysctl -a |grep fs.file-nr'
hq-dfs-001 : fs.file-nr = 54528 0       10000000
hq-dfs-002 : fs.file-nr = 69280 0       10000000
hq-dfs-003 : fs.file-nr = 31424 0       10000000
hq-dfs-004 : fs.file-nr = 70496 0       10000000

$ massh -f hq-dfs -oc 'sudo -i sysctl -a |grep fs.file-nr'
hq-dfs-001 : fs.file-nr = 54592 0       10000000
hq-dfs-002 : fs.file-nr = 69344 0       10000000
hq-dfs-003 : fs.file-nr = 31552 0       10000000
hq-dfs-004 : fs.file-nr = 70592 0       10000000

lsof snippet :

glusterfs 1999 root *415u   REG              253,1         0 2149736337 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-content/uploads/2008/08/gallblad-300x185.jpg (deleted)
glusterfs 1999 root *416u   REG              253,1         0 2149738632 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/general-template.php (deleted)
glusterfs 1999 root *417u   REG              253,1         0 2149738638 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/post-template.php (deleted)
glusterfs 1999 root *418u   REG              253,1         0 2152158338 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4326/files/2008/04/weighin2.jpg
glusterfs 1999 root *419u   REG              253,1         0 3222171248 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/images/wlw/wp-icon.png (deleted)
glusterfs 1999 root *420u   REG              253,1         0    3017973 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/images/smilies/icon_rolleyes.gif (deleted)
glusterfs 1999 root *421u   REG              253,1         0 2149740408 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/weight-tracker2/statusimg.php (deleted)
glusterfs 1999 root *422u   REG              253,1         0 1086685710 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4199/files/2008/03/workout-photo1jpg.thumbnail.JPG
glusterfs 1999 root *423u   REG              253,1         0 1086685709 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4199/files/2008/03/workout-photo1jpg.JPG
glusterfs 1999 root *424u   REG              253,1         0 2152158343 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4326/files/2008/10/halloween2.jpg
glusterfs 1999 root *425u   REG              253,1         0 2149738652 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/images/blank.gif (deleted)
glusterfs 1999 root *426u   REG              253,1         0    3017961 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html.bak/proto/wp-includes/images/smilies/icon_lol.gif (deleted)
^Cglusterfs 1999 root *427u   REG              253,1         0 2152158328 /vol/content1/3fatchicks/3fatchicks/admin3fc/public_html/diet-blogs/wp-content/blogs.dir/4199/files/2008/05/jaalaandvan.jpg

$ sudo gluster volume rebalance content1 status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost            44096         3.8GB        376956             6    in progress
                           10.160.112.11            56876         6.2GB        331431             0    in progress
                              hq-dfs-004                0        0Bytes        347574             0    in progress
                              hq-dfs-003             2442         5.6MB        669449         25946    in progress


Version-Release number of selected component (if applicable):
3.3.1

How reproducible:
Just start rebalance and monitor open file handles

Steps to Reproduce:
1. Start rebalancing
2. sudo -i sysctl -a |grep fs.file-nr
3. lsof -p <pid> | wc -l
  
Actual results:
FH is held open by glusterfsd

Expected results:
FH is released after file was moved 

Additional info:

Comment 1 Bjoern Teipel 2013-04-05 21:57:11 UTC
Looks like I got a "corruption" mostly due to this bug :

Couldn't go into the directory, mostly because of the reason that one node had a symlink to nowhere and NFS clients got a I/O  error

[2013-04-02 18:10:33.894179] W [client3_1-fops.c:2546:client3_1_opendir_cbk] 0-content1-client-1: remote operation failed: Too many open files. Path: /groovejob (00000000-0000-0000-0000-000000000000)

What I had in the file system :

hq-dfs-001 : File: `/vol/content1/groovejob'
hq-dfs-001 : Size: 4096      	Blocks: 8          IO Block: 4096   directory
hq-dfs-001 : Device: fd01h/64769d	Inode: 3367109578  Links: 20
hq-dfs-001 : Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
hq-dfs-001 : Access: 2013-03-28 07:13:32.402112000 -0700
hq-dfs-001 : Modify: 2013-04-05 13:52:29.404069612 -0700
hq-dfs-001 : Change: 2013-04-05 13:52:29.404069612 -0700
hq-dfs-002 : File: `/vol/content1/groovejob'
hq-dfs-002 : Size: 4096      	Blocks: 8          IO Block: 4096   directory
hq-dfs-002 : Device: fd01h/64769d	Inode: 210042751   Links: 20
hq-dfs-002 : Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
hq-dfs-002 : Access: 2013-03-28 07:13:32.402112000 -0700
hq-dfs-002 : Modify: 2013-04-05 13:52:29.404456841 -0700
hq-dfs-002 : Change: 2013-04-05 13:52:29.404456841 -0700
hq-dfs-003 : File: `/vol/content1/groovejob' -> `../../9d/15/9d153449-21a4-4107-943c-57db5b65e0e0/groovejob'
hq-dfs-003 : Size: 58        	Blocks: 0          IO Block: 4096   symbolic link
hq-dfs-003 : Device: fd01h/64769d	Inode: 229656214   Links: 1
hq-dfs-003 : Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
hq-dfs-003 : Access: 2013-04-05 09:55:44.185843408 -0700
hq-dfs-003 : Modify: 2013-04-05 09:55:44.185843408 -0700
hq-dfs-003 : Change: 2013-04-05 11:19:12.013368867 -0700
hq-dfs-004 : File: `/vol/content1/groovejob'
hq-dfs-004 : Size: 4096      	Blocks: 8          IO Block: 4096   directory
hq-dfs-004 : Device: fd02h/64770d	Inode: 2150209673  Links: 20
hq-dfs-004 : Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
hq-dfs-004 : Access: 2013-03-28 07:13:32.402112000 -0700
hq-dfs-004 : Modify: 2013-04-05 13:52:29.402760259 -0700
hq-dfs-004 : Change: 2013-04-05 13:52:29.402760259 -0700

Gluster error message:

[2013-04-05 14:36:45.263278] E [dht-common.c:736:dht_lookup_everywhere_done] 0-content1-dht: path /groovejob exists as a file on one subvolume and directory on another. Please fix it manually
[2013-04-05 14:36:45.263371] E [dht-rebalance.c:1142:gf_defrag_migrate_data] 0-content1-dht: /groovejob lookup failed



I replaced the symlink with a directory and everything works fine.

Comment 2 shishir gowda 2013-04-08 05:16:21 UTC
Hi Bjoern,

Please provide us the volume info. Additionally why was rebalance done? were you trying to expand the volume?

Rebalance does open the files to migrate them. I see failures reported by rebalance in hq-dfs-003. Can you please provide the rebalance logs from the machine?

Comment 3 Bjoern Teipel 2013-04-08 05:25:09 UTC
Hi,

yes I expanded from 2 nodes to 3 and then 4.
I'll try to attach the log but it is huge, even compressed.
How can I transmit it to you?

Here the volume info:


Volume Name: content1
Type: Distribute
Volume ID: 319a3f36-3369-4626-be07-99201d89abd6
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: hq-dfs-001:/vol/content1
Brick2: hq-dfs-002:/vol/content1
Brick3: hq-dfs-003:/vol/content1
Brick4: hq-dfs-004:/vol/content1
Options Reconfigured:
performance.cache-size: 268435456
performance.io-thread-count: 32

Comment 4 Bjoern Teipel 2013-04-08 05:27:50 UTC
(In reply to comment #1)
> Looks like I got a "corruption" mostly due to this bug :
> 
> Couldn't go into the directory, mostly because of the reason that one node
> had a symlink to nowhere and NFS clients got a I/O  error
> 
> [2013-04-02 18:10:33.894179] W [client3_1-fops.c:2546:client3_1_opendir_cbk]
> 0-content1-client-1: remote operation failed: Too many open files. Path:
> /groovejob (00000000-0000-0000-0000-000000000000)
> 
> What I had in the file system :
> 
> hq-dfs-001 : File: `/vol/content1/groovejob'
> hq-dfs-001 : Size: 4096      	Blocks: 8          IO Block: 4096   directory
> hq-dfs-001 : Device: fd01h/64769d	Inode: 3367109578  Links: 20
> hq-dfs-001 : Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/
> root)
> hq-dfs-001 : Access: 2013-03-28 07:13:32.402112000 -0700
> hq-dfs-001 : Modify: 2013-04-05 13:52:29.404069612 -0700
> hq-dfs-001 : Change: 2013-04-05 13:52:29.404069612 -0700
> hq-dfs-002 : File: `/vol/content1/groovejob'
> hq-dfs-002 : Size: 4096      	Blocks: 8          IO Block: 4096   directory
> hq-dfs-002 : Device: fd01h/64769d	Inode: 210042751   Links: 20
> hq-dfs-002 : Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/
> root)
> hq-dfs-002 : Access: 2013-03-28 07:13:32.402112000 -0700
> hq-dfs-002 : Modify: 2013-04-05 13:52:29.404456841 -0700
> hq-dfs-002 : Change: 2013-04-05 13:52:29.404456841 -0700
> hq-dfs-003 : File: `/vol/content1/groovejob' ->
> `../../9d/15/9d153449-21a4-4107-943c-57db5b65e0e0/groovejob'
> hq-dfs-003 : Size: 58        	Blocks: 0          IO Block: 4096   symbolic
> link
> hq-dfs-003 : Device: fd01h/64769d	Inode: 229656214   Links: 1
> hq-dfs-003 : Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/
> root)
> hq-dfs-003 : Access: 2013-04-05 09:55:44.185843408 -0700
> hq-dfs-003 : Modify: 2013-04-05 09:55:44.185843408 -0700
> hq-dfs-003 : Change: 2013-04-05 11:19:12.013368867 -0700
> hq-dfs-004 : File: `/vol/content1/groovejob'
> hq-dfs-004 : Size: 4096      	Blocks: 8          IO Block: 4096   directory
> hq-dfs-004 : Device: fd02h/64770d	Inode: 2150209673  Links: 20
> hq-dfs-004 : Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/
> root)
> hq-dfs-004 : Access: 2013-03-28 07:13:32.402112000 -0700
> hq-dfs-004 : Modify: 2013-04-05 13:52:29.402760259 -0700
> hq-dfs-004 : Change: 2013-04-05 13:52:29.402760259 -0700
> 
> Gluster error message:
> 
> [2013-04-05 14:36:45.263278] E [dht-common.c:736:dht_lookup_everywhere_done]
> 0-content1-dht: path /groovejob exists as a file on one subvolume and
> directory on another. Please fix it manually
> [2013-04-05 14:36:45.263371] E [dht-rebalance.c:1142:gf_defrag_migrate_data]
> 0-content1-dht: /groovejob lookup failed
> 
> 
> 
> I replaced the symlink with a directory and everything works fine.


OK, you can ignore the "corruption" issue that was human error and created by someone who did file operations on one node instead over the translator.
So I have only the open file handle issue open.

Comment 5 Joonas Vilenius 2013-04-24 21:02:39 UTC
I have exactly the same symptoms, identical way to reproduce along with the results.

Had a single brick volume (for the sake of getting clients mounted with native fuse client from the very beginning) and added another brick to distributed setup. After running rebalance for roughly 24 hours and ~800k files moved over i got an alert for low file handlers on both systems. Rebalance started to report failures as the new server ran out of handlers, blocking me even from logging in. Stopping rebalance freed up the handlers on the new server pretty much immediately but kept them reserved on the old server for additional 12 hours after the rebalance was stopped (i also unmounted clients and stopped/started the whole volume). Not sure if the latter was some kernel/lsof/reporting "issue" but nevertheless it all originates from the rebalance action.

Comment 6 Anand Avati 2013-04-26 06:35:21 UTC
REVIEW: http://review.gluster.org/4888 (distribute: Fix fds being leaked during rebalance) posted (#1) for review on release-3.3 by Kaushal M (kaushal)

Comment 7 Kaushal 2013-04-26 06:40:03 UTC
*** Bug 956781 has been marked as a duplicate of this bug. ***

Comment 8 Anand Avati 2013-04-26 07:35:30 UTC
COMMIT: http://review.gluster.org/4888 committed in release-3.3 by Vijay Bellur (vbellur) 
------
commit 45a9d1e70e0d3e0ab504643a3cd1bf3f0a3a43fd
Author: Kaushal M <kaushal>
Date:   Fri Apr 26 11:39:15 2013 +0530

    distribute: Fix fds being leaked during rebalance
    
    This patch is a backport of 2 patches from master branch which fixes the
    leak of fds during a rebalance process.
    
    The patches are,
    * libglusterfs/syncop: do not hold ref on the fd in cbk
      (e979c0de9dde14fe18d0ad7298c6da9cc878bbab)
    * cluster/distribute: Remove suprious fd_unref call
      (5d29e598665456b2b7250fdca14de7409098877a)
    
    Change-Id: Icea1d0b32cb3670f7decc24261996bca3fe816dc
    BUG: 928631
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/4888
    Reviewed-by: Vijay Bellur <vbellur>
    Tested-by: Gluster Build System <jenkins.com>

Comment 9 loopx 2013-04-26 08:07:40 UTC
Hello Kaushal, 


Thanks for the reply (in the duplicate).

Will you release a 3.3.2 version ? I will appreciate new EPEL-5/6 RPM packages to upgrade my current version.


Do you know when will 3.3.2/3.4 be released ?

Comment 10 Kaushal 2013-04-26 09:54:52 UTC
Hi Laurent,

There is going to be a 3.3.2 release and it will contains this fix. There have been a couple of qa releases of it already. But I don't know a possible release date for this version yet.

Comment 11 loopx 2013-04-26 11:59:35 UTC
Thanks, I will try to be patient ;)

Comment 12 hans 2013-04-26 15:54:51 UTC
I'm testing this patch on glusterfs 3.3git-v3.3.2qa2-1-g45a9d1e and can confirm that there are no filedescriptor leaks (during the first few hours of my test at least ;-)


However I'm seeing a disturbing number of claimed failures :

 Node Rebalanced-files size scanned failures status
 --------- ----------- ----------- ----------- ----------- ------------
 host1 0 0Bytes  84830 60238 in progress
 host2 0 0Bytes 123279     0 in progress
 host3 0 0Bytes 123279     0 in progress
 host4 0 0Bytes 123280     0 in progress

And every brick logs line pairs like this :

E [posix-helpers.c:721:posix_handle_pair] 0-vol01-posix: /bricks/a/testcopy50/50/98/06/somefile: key:trusted.glusterfs.dht.linkto error:File exists
E [posix.c:1755:posix_create] 0-vol01-posix: setting xattrs on /bricks/a/testcopy50/50/98/06/somefile failed (File exists)

Is this to be expected ? Related to this ticket ?

Comment 13 loopx 2013-04-26 16:05:13 UTC
@hans : regarding failures ... I always get failure when rebalancing files without using the "force" option.

So, every "rebalance start" finish with failure and no files (absolutely 0 files) transfered.

But, if I use (as documented, after the "start") the "start force", then data are well moved and no more failure occurs.

Why ? I really don't know. What the difference between "start" and "start force" in rebalance mode ???


I think I also get the same error "File exists" ... but it's not an error line :
----------
[2013-04-25 14:26:21.239650] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv1-client-3: remote operation failed: File exists. Path: /6 (00000000-0000-0000-
0000-000000000000)
[2013-04-25 14:26:21.239792] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv1-client-2: remote operation failed: File exists. Path: /6 (00000000-0000-0000-
0000-000000000000)
[2013-04-25 14:26:21.239877] E [dht-common.c:1911:dht_getxattr] 0-gv1-dht: layout is NULL
[2013-04-25 14:26:21.239969] E [dht-rebalance.c:1150:gf_defrag_migrate_data] 0-gv1-dht: Failed to get node-uuid for /6
----------

And, only to try to help, here is a list of discovered error when testing Gluster 3.3.1 :
----------

[2013-04-23 18:12:48.637318] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-gv1-replicate-1: background  meta-data data entry missing-entry gfid self-heal failed on /gluster_test2_old/<hidden>.pdf
[2013-04-23 18:12:49.226701] E [dht-common.c:1911:dht_getxattr] 0-gv1-dht: layout is NULL
[2013-04-23 18:12:49.226740] E [dht-rebalance.c:1150:gf_defrag_migrate_data] 0-gv1-dht: Failed to get node-uuid for /gluster_test2_old/<hidden>.pdf
[2013-04-23 18:18:29.720733] E [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap
----------


Note You need to log in before you can comment on or make changes to this bug.