956781 – On volume rebalance, max file limit is reached because files doesn't seem to be closed after the move (too many open files)

Bug 956781 - On volume rebalance, max file limit is reached because files doesn't seem to be closed after the move (too many open files)

Summary: On volume rebalance, max file limit is reached because files doesn't seem to ...

Keywords:
Status:	CLOSED DUPLICATE of bug 928631
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	3.3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kaushal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-04-25 15:40 UTC by loopx
Modified:	2013-04-26 06:40 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-04-26 06:40:03 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description loopx 2013-04-25 15:40:15 UTC

Description of problem:
With a 4 nodes/bricks replicated-distributed Gluster volume, I have an issue when "rebalancing" data (from 2 to 4).

So, when "start force" is used to "rebalance" data, file are moved from source to destination (which are new nodes). Unfortunately, files moved are still open on both side :
- on source side : file is open and I can see "(deleted)" with "lsof"
- on destination side : file is also open (but without any "(deleted)" because file is new)
- this is not a socket problem

Once file balancing is complete :
- source: file are still open
- destination : file are closed

So, on destination nodes, I can reach maximum open file number and crash some other application :
------------
VFS: file-max limit 64000 reached
VFS: file-max limit 64000 reached
gmond[2964]: segfault at ffffffffffffffff rip 000000000040acc4 rsp 00007fff88a7db60 error 4
VFS: file-max limit 64000 reached
VFS: file-max limit 64000 reached
------------

On source, file are still open ; lsof :
----------------
glusterfs 16840      root *258u      REG                8,5        0   55644726 /mnt/data1/brick1/gluster_test2_old/<hidden>.pdf (deleted)
glusterfs 16840      root *259u      REG                8,5        0   55642740 /mnt/data1/brick1/gluster_test2_old/<hidden2>.pdf (deleted)
glusterfs 16840      root *260u      REG                8,5        0   55648839 /mnt/data1/brick1/gluster_test2_old/<hidden3>.pdf (deleted)
glusterfs 16840      root *261u      REG                8,5        0   55644412 /mnt/data1/brick1/gluster_test2_old/<hidden4>.pdf (deleted)
glusterfs 16840      root *262u      REG                8,5        0   55646524 /mnt/data1/brick1/gluster_test2_old/<hidden5>.pdf (deleted)
glusterfs 16840      root *263u      REG                8,5        0   55645671 /mnt/data1/brick1/gluster_test2_old/<hidden5>.pdf (deleted)
----------------


Please note that there is another big issue : sometime, at the and of "rebalance", CPU goes to 100% (even more) : it's because of "glusterfsd" which run several process eating all CPU (I can see it with Ganglia and htop).

I also get this error in log (/var/log/glusterfs/gv1-rebalance.log) on source :
--------------
2013-04-25 17:09:30.499705] I [dht-rebalance.c:647:dht_migrate_file] 0-gv1-dht: /gluster_test2_old/<hidden>.pdf: attempting to move from gv1-replicate-0 to gv1-replicate-1
[2013-04-25 17:09:30.564466] I [dht-rebalance.c:856:dht_migrate_file] 0-gv1-dht: completed migration of /gluster_test2_old/<hidden2>.pdf from subvolume gv1-replicate-0 to gv1-replicate-1
[2013-04-25 17:09:41.476468] I [dht-common.c:2337:dht_setxattr] 0-gv1-dht: fixing the layout of /gluster_test2_old/<hidden3>
[2013-04-25 17:09:41.477772] I [dht-rebalance.c:1063:gf_defrag_migrate_data] 0-gv1-dht: migrate data called on /gluster_test2_old/<hidden4>
[2013-04-25 17:09:41.488814] I [dht-rebalance.c:1626:gf_defrag_status_get] 0-glusterfs: Rebalance is completed
[2013-04-25 17:09:41.488872] I [dht-rebalance.c:1629:gf_defrag_status_get] 0-glusterfs: Files migrated: 58478, size: 2809361015, lookups: 206117, failures: 0
[2013-04-25 17:09:41.489467] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x36236d500d] (-->/lib64/libpthread.so.0 [0x3623e0683d] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x4052ec]))) 0-: received signum (15), shutting down
[2013-04-25 17:09:41.489714] E [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap
--------------


Note: these node are not clients => no fuse.



Version-Release number of selected component (if applicable):
glusterfs-3.3.1-1.el5
glusterfs-fuse-3.3.1-1.el5
glusterfs-server-3.3.1-1.el5


Steps to Reproduce:
- create a replicated volume
- fill it with 140.000 small file (pdf)
- extends it to 2 more nodes
- fix-layout, rebalance start (always only fails), rebalance start force
- use "lsof" to check opened files (number increase at the begeining until the end of rebalance)
- look for cpu (Ganglia/htop) : could go to 100% at the end of rebalance
- for now, 20 minutes with 100% CPU and load of 20 on both source nodes


 
Actual results:
- file number open on both source/destination node
- at the end, still open on the source node + CPU à 100% with high load


Expected results:
- file should NOT increase like that (these are even NOT used ...)
- lsof should NOT show crazy number of opened files (thousand)
- no CPU at 100% should occurs at the end


Additional info:
[0][root@serveur1 brick1]$ gluster volume info

Volume Name: gv1
Type: Distributed-Replicate
Volume ID: 7ef3f923-3dd2-4817-b5a7-5e946d8661fb
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: serveur1:/mnt/data1/brick1
Brick2: serveur2:/mnt/data1/brick2
Brick3: serveur3:/mnt/data1/brick3
Brick4: serveur4:/mnt/data1/brick4


[255][root@serveur1 brick1]$ gluster volume status gv1 detail
Status of volume: gv1                                       
------------------------------------------------------------------------------
Brick                : Brick serveur1:/mnt/data1/brick1
Port                 : 24009
Online               : Y
Pid                  : 16840
File System          : ext3
Device               : /dev/sda5
Mount Options        : rw
Inode Size           : 128
Disk Space Free      : 252.0GB
Total Disk Space     : 260.9GB
Inode Count          : 70615040
Free Inodes          : 70442582
------------------------------------------------------------------------------
Brick                : Brick serveur2:/mnt/data1/brick2
Port                 : 24009
Online               : Y
Pid                  : 29293
File System          : ext3
Device               : /dev/sda5
Mount Options        : rw
Inode Size           : 128
Disk Space Free      : 110.9GB
Total Disk Space     : 122.5GB
Inode Count          : 33193984
Free Inodes          : 33019070
------------------------------------------------------------------------------
Brick                : Brick serveur3:/mnt/data1/brick3
Port                 : 24009
Online               : Y
Pid                  : 17189
File System          : ext3
Device               : /dev/sda5
Mount Options        : rw
Inode Size           : 128
Disk Space Free      : 49.3GB
Total Disk Space     : 56.4GB
Inode Count          : 15287712
Free Inodes          : 15141537
------------------------------------------------------------------------------
Brick                : Brick serveur4:/mnt/data1/brick4
Port                 : 24009
Online               : Y
Pid                  : 28968
File System          : ext3
Device               : /dev/sda5
Mount Options        : rw
Inode Size           : 128
Disk Space Free      : 50.1GB
Total Disk Space     : 56.3GB
Inode Count          : 15254976
Free Inodes          : 15125681

Comment 1 loopx 2013-04-25 15:44:04 UTC

More information on the "source" side :
- file are still open
- but, at the end, CPU go very high
- at the same time, number of opened file start decreasing
- so, when CPU is high, file number decrease
- I suppose that file open number will go to "normal" number
- and then, CPU will go back to normal load


The client have trouble accessing the volume : very slow
Command "gluster" is also very slow (and could fail).



If you need more information ... just ask


Please also note that I'm running RHEL5.9 on these 4 nodes.

Comment 2 loopx 2013-04-25 16:00:14 UTC

On source nodes :
- they get the problem, but end not specially at the same time
- one is done and file opened is normal
- but the second source node has not finished yet and have still more than 20.000 opened files


For now, second node source is still at 100% (50 minutes) and will probably go to 1 hour before calm down ...



How much open file Gluster is requiring ?

Comment 3 Kaushal 2013-04-26 06:40:03 UTC

Thanks for reporting this. This issue has been reported earlier and is being tracked in another bug, for which we have a fix.
Closing this as a duplicate.

*** This bug has been marked as a duplicate of bug 928631 ***

Note You need to log in before you can comment on or make changes to this bug.