Bug 1528571 - glusterd Too many open files
Summary: glusterd Too many open files
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 3.10
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-22 07:47 UTC by stefan_bo
Modified: 2018-06-20 18:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-20 18:28:52 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description stefan_bo 2017-12-22 07:47:32 UTC
Description of problem:
glusterd max open files can't change

Version-Release number of selected component (if applicable):
3.10.3 (CentOS7.3) production environment
3.12.3 (CentOS7.3) test environment

How reproducible:
after install glusterd service in centos. increase kernel paramater `fs.file-max` , `fs.nr_open` and increase LimitNOFILE
start glusterd service , check /proc/pid/limits

Steps to Reproduce:
1. yum install -y glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma
2. sysctl -w fs.file-max = 4194304
3. sysctl -wfs.nr_open = 2097152
4. make sure /usr/lib/systemd/system/glusterd.service LimitNOFILE=2097152
5. systemctl start glusterd
6. check gluster process max openfile by `cat /proc/pid/limits` 

Actual results:
[root@gfs1 ~]# ps -ef|grep gluster
root       23602       1  0 15:13 ?        00:00:01 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level DEBUG
root       23613       1  0 15:13 ?        00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/187faae78e098adf3f277b8be93e2e7f.socket --xlator-option *replicate*.node-uuid=9728e5d2-5135-47bb-ab7a-d186be9e804a
root       23622       1  0 15:13 ?        00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/89152d51ca7da373002a84bcab99e3e2.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root       23631       1  0 15:13 ?        00:00:00 /usr/sbin/glusterfsd -s 172.17.18.61 --volfile-id douyu-volume.172.17.18.61.opt-gluster-data -p /var/run/gluster/vols/douyu-volume/172.17.18.61-opt-gluster-data.pid -S /var/run/gluster/53e32eeabb2d21c972cfca0e3fd17e49.socket --brick-name /opt/gluster/data -l /var/log/glusterfs/bricks/opt-gluster-data.log --xlator-option *-posix.glusterd-uuid=9728e5d2-5135-47bb-ab7a-d186be9e804a --brick-port 49152 --xlator-option douyu-volume-server.listen-port=49152

[root@gfs1 ~]# cat /proc/23602/limits |grep "open files"
Max open files            65536                65536                files     
[root@gfs1 ~]# cat /proc/23613/limits |grep "open files"
Max open files            65536                65536                files     
[root@gfs1 ~]# cat /proc/23631/limits |grep "open files"
Max open files            1048576              1048576              files     
[root@gfs1 ~]# cat /proc/23622/limits |grep "open files"
Max open files            65536                65536                files

Expected results:
Max open files            2097152              2097152             files

Additional info:
No matter how fs.file-max fs.nr_open is , the output is same:
1. glusterd  max open file is always 65536
2. glusterfs max open file is always 65536
3. glusterfsd max open file is always 1048576


My problem is one gluster brick self-heal open huge numbers file fd and not release.
I can see /proc/pid/fd/ has large fd number almost 1048576 fd. 
brick data path has almost 3,000,000 + files

and last `Too many open files` until health check process cannot open health_check file, in the last health-check failed, going down
`/var/log/glusterfs/bricks/{data_path}.log` display
[2017-12-22 05:39:00.399010] E [MSGID: 115056] [server-rpc-fops.c:627:server_readdir_cbk] 0-douyu-volume-server: 544076318: READDIR -2 (bf4ac1aa-4b4c-48e7-9f57-a3b9e848ffce), client: Dy-JXQ-4-18-2731592-2017/08/15-22:13:09:109245-douyu-volume-client-3-0-30, error-xlator: douyu-volume-posix [Too many open files]
[2017-12-22 05:39:00.425369] W [MSGID: 113006] [posix.c:6368:posix_do_readdir] 0-douyu-volume-posix: pfd is NULL, fd=0x7fde7384f3e0 [Operation not permitted]
[2017-12-22 05:39:00.425428] E [MSGID: 115056] [server-rpc-fops.c:627:server_readdir_cbk] 0-douyu-volume-server: 544076359: READDIR -2 (bf4ac1aa-4b4c-48e7-9f57-a3b9e848ffce), client: Dy-JXQ-4-18-2731592-2017/08/15-22:13:09:109245-douyu-volume-client-3-0-30, error-xlator: douyu-volume-posix [Too many open files]
[2017-12-22 05:39:02.436663] W [MSGID: 113075] [posix-helpers.c:1777:posix_fs_health_check] 0-douyu-volume-posix: open() on /home/gluster/data_36/.glusterfs/health_check returned [Too many open files]

But No Matter how I change system kernel parameter, the glusterd open file is always same.

Comment 1 Mohit Agrawal 2017-12-22 10:16:02 UTC
Hi,

 As per current code, this is expected behavior, at the time of  spawn glusterd we do set hardcode limit for RLIMIT_NOFILE is 65536 and in current code we 
 don't ask from user to configure the limit but we can provide some configurable option to
configure the same.

Regards
Mohit Agrawal

Comment 2 stefan_bo 2017-12-24 08:59:16 UTC
Tks reply.
Is there a configurable option to Increase the RLIMIT_NOFILE?
My Cluster has one node can't recovery cause too many files problem.

Comment 3 Mohit Agrawal 2017-12-27 05:38:28 UTC
Hi,

 Currently, we don't provide any option to configure RLIMIT_NOFILE for any gluster daemon but to control the same I can provide some workaround to you.
 1) Can you please confirm have you configured any limit for cluster.shd-max-threads, by default the value 
    of this option is 1. To control the no. of file limits usage for shd u can reduce the value of this option.
 2) You can kill the shd process and from command-line, you can start the process with the same argument as it(glustershd)is showing before killing the process.At the time of start new shd process, you can tune the value of RLIMIT_NOFILE through ulimit or systemd, I think it should work.


Regards
Mohit Agrawal

Comment 4 stefan_bo 2017-12-28 09:01:21 UTC
(In reply to Mohit Agrawal from comment #3)
> Hi,
> 
>  Currently, we don't provide any option to configure RLIMIT_NOFILE for any
> gluster daemon but to control the same I can provide some workaround to you.
>  1) Can you please confirm have you configured any limit for
> cluster.shd-max-threads, by default the value 
>     of this option is 1. To control the no. of file limits usage for shd u
> can reduce the value of this option.

cluster.shd-max-threads is default 1
no mater it value is high or low, nofile is just same 65536


>  2) You can kill the shd process and from command-line, you can start the
> process with the same argument as it(glustershd)is showing before killing
> the process.At the time of start new shd process, you can tune the value of
> RLIMIT_NOFILE through ulimit or systemd, I think it should work.
> 
> 
> Regards
> Mohit Agrawal


I just do what you say

[root@gfs1 ~]# ulimit -a|grep "open files"
open files                      (-n) 2097100

glusterfs daemon can achive ulimit what i set:

[root@gfs1 ~]# cat /proc/$(ps -ef|grep "/usr/sbin/glusterfs "|grep -v grep |awk '{print $2}')/limits|grep "open files"
Max open files            2097100              2097100              files 

glusterfsd daemon can't achive ulimit what i set:
[root@gfs1 ~]# cat /proc/$(ps -ef|grep "/usr/sbin/glusterfsd "|grep -v grep |awk '{print $2}')/limits|grep "open files"
Max open files            1048576              1048576              files 

It's always 1048576, can't increase any more


My problem is glusterfsd daemon, after log /var/log/glusterfs/bricks/home-gluster-data_36.log a while "[Too many open files]"

systemd status display 
Dec 24 15:36:55 gfs home-gluster-data_36[181536]: [2017-12-24 07:36:55.579134] M [MSGID: 113075] [posix-helpers.c:1837:posix_health_check_thread_proc] 0-volume-posix: health-check failed, going down
Dec 24 15:37:25 gfs home-gluster-data_36[181536]: [2017-12-24 07:37:25.579509] M [MSGID: 113075] [posix-helpers.c:1844:posix_health_check_thread_proc] 0-volume-posix: still alive! -> SIGTERM

gluster volume status show
[root@gfs1 ~]# gluster volume status
Status of volume: volume
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.11.0.33:/home/gluster/data_33      49152     0          Y       228792
Brick 10.11.0.34:/home/gluster/data_34      49152     0          Y       231688
Brick 10.11.0.35:/home/gluster/data_35      49152     0          Y       229844
Brick 10.11.0.36:/home/gluster/data_36      N/A       N/A        N       N/A  
Brick 10.11.0.37:/home/gluster/data_37      49152     0          Y       440437
Brick 10.11.0.38:/home/gluster/data_38      49152     0          Y       349833
Brick 10.11.0.39:/home/gluster/data_39      49152     0          Y       482584
Brick 10.11.0.40:/home/gluster/data_40      49152     0          Y       483945
....

Comment 5 Atin Mukherjee 2018-01-02 07:57:57 UTC
Doesn't look like to be a glusterd specific issue. Changing the component to core.

Comment 6 Shyamsundar 2018-06-20 18:28:52 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.


Note You need to log in before you can comment on or make changes to this bug.