Description of problem: glusterd max open files can't change Version-Release number of selected component (if applicable): 3.10.3 (CentOS7.3) production environment 3.12.3 (CentOS7.3) test environment How reproducible: after install glusterd service in centos. increase kernel paramater `fs.file-max` , `fs.nr_open` and increase LimitNOFILE start glusterd service , check /proc/pid/limits Steps to Reproduce: 1. yum install -y glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma 2. sysctl -w fs.file-max = 4194304 3. sysctl -wfs.nr_open = 2097152 4. make sure /usr/lib/systemd/system/glusterd.service LimitNOFILE=2097152 5. systemctl start glusterd 6. check gluster process max openfile by `cat /proc/pid/limits` Actual results: [root@gfs1 ~]# ps -ef|grep gluster root 23602 1 0 15:13 ? 00:00:01 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level DEBUG root 23613 1 0 15:13 ? 00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/187faae78e098adf3f277b8be93e2e7f.socket --xlator-option *replicate*.node-uuid=9728e5d2-5135-47bb-ab7a-d186be9e804a root 23622 1 0 15:13 ? 00:00:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/89152d51ca7da373002a84bcab99e3e2.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off root 23631 1 0 15:13 ? 00:00:00 /usr/sbin/glusterfsd -s 172.17.18.61 --volfile-id douyu-volume.172.17.18.61.opt-gluster-data -p /var/run/gluster/vols/douyu-volume/172.17.18.61-opt-gluster-data.pid -S /var/run/gluster/53e32eeabb2d21c972cfca0e3fd17e49.socket --brick-name /opt/gluster/data -l /var/log/glusterfs/bricks/opt-gluster-data.log --xlator-option *-posix.glusterd-uuid=9728e5d2-5135-47bb-ab7a-d186be9e804a --brick-port 49152 --xlator-option douyu-volume-server.listen-port=49152 [root@gfs1 ~]# cat /proc/23602/limits |grep "open files" Max open files 65536 65536 files [root@gfs1 ~]# cat /proc/23613/limits |grep "open files" Max open files 65536 65536 files [root@gfs1 ~]# cat /proc/23631/limits |grep "open files" Max open files 1048576 1048576 files [root@gfs1 ~]# cat /proc/23622/limits |grep "open files" Max open files 65536 65536 files Expected results: Max open files 2097152 2097152 files Additional info: No matter how fs.file-max fs.nr_open is , the output is same: 1. glusterd max open file is always 65536 2. glusterfs max open file is always 65536 3. glusterfsd max open file is always 1048576 My problem is one gluster brick self-heal open huge numbers file fd and not release. I can see /proc/pid/fd/ has large fd number almost 1048576 fd. brick data path has almost 3,000,000 + files and last `Too many open files` until health check process cannot open health_check file, in the last health-check failed, going down `/var/log/glusterfs/bricks/{data_path}.log` display [2017-12-22 05:39:00.399010] E [MSGID: 115056] [server-rpc-fops.c:627:server_readdir_cbk] 0-douyu-volume-server: 544076318: READDIR -2 (bf4ac1aa-4b4c-48e7-9f57-a3b9e848ffce), client: Dy-JXQ-4-18-2731592-2017/08/15-22:13:09:109245-douyu-volume-client-3-0-30, error-xlator: douyu-volume-posix [Too many open files] [2017-12-22 05:39:00.425369] W [MSGID: 113006] [posix.c:6368:posix_do_readdir] 0-douyu-volume-posix: pfd is NULL, fd=0x7fde7384f3e0 [Operation not permitted] [2017-12-22 05:39:00.425428] E [MSGID: 115056] [server-rpc-fops.c:627:server_readdir_cbk] 0-douyu-volume-server: 544076359: READDIR -2 (bf4ac1aa-4b4c-48e7-9f57-a3b9e848ffce), client: Dy-JXQ-4-18-2731592-2017/08/15-22:13:09:109245-douyu-volume-client-3-0-30, error-xlator: douyu-volume-posix [Too many open files] [2017-12-22 05:39:02.436663] W [MSGID: 113075] [posix-helpers.c:1777:posix_fs_health_check] 0-douyu-volume-posix: open() on /home/gluster/data_36/.glusterfs/health_check returned [Too many open files] But No Matter how I change system kernel parameter, the glusterd open file is always same.
Hi, As per current code, this is expected behavior, at the time of spawn glusterd we do set hardcode limit for RLIMIT_NOFILE is 65536 and in current code we don't ask from user to configure the limit but we can provide some configurable option to configure the same. Regards Mohit Agrawal
Tks reply. Is there a configurable option to Increase the RLIMIT_NOFILE? My Cluster has one node can't recovery cause too many files problem.
Hi, Currently, we don't provide any option to configure RLIMIT_NOFILE for any gluster daemon but to control the same I can provide some workaround to you. 1) Can you please confirm have you configured any limit for cluster.shd-max-threads, by default the value of this option is 1. To control the no. of file limits usage for shd u can reduce the value of this option. 2) You can kill the shd process and from command-line, you can start the process with the same argument as it(glustershd)is showing before killing the process.At the time of start new shd process, you can tune the value of RLIMIT_NOFILE through ulimit or systemd, I think it should work. Regards Mohit Agrawal
(In reply to Mohit Agrawal from comment #3) > Hi, > > Currently, we don't provide any option to configure RLIMIT_NOFILE for any > gluster daemon but to control the same I can provide some workaround to you. > 1) Can you please confirm have you configured any limit for > cluster.shd-max-threads, by default the value > of this option is 1. To control the no. of file limits usage for shd u > can reduce the value of this option. cluster.shd-max-threads is default 1 no mater it value is high or low, nofile is just same 65536 > 2) You can kill the shd process and from command-line, you can start the > process with the same argument as it(glustershd)is showing before killing > the process.At the time of start new shd process, you can tune the value of > RLIMIT_NOFILE through ulimit or systemd, I think it should work. > > > Regards > Mohit Agrawal I just do what you say [root@gfs1 ~]# ulimit -a|grep "open files" open files (-n) 2097100 glusterfs daemon can achive ulimit what i set: [root@gfs1 ~]# cat /proc/$(ps -ef|grep "/usr/sbin/glusterfs "|grep -v grep |awk '{print $2}')/limits|grep "open files" Max open files 2097100 2097100 files glusterfsd daemon can't achive ulimit what i set: [root@gfs1 ~]# cat /proc/$(ps -ef|grep "/usr/sbin/glusterfsd "|grep -v grep |awk '{print $2}')/limits|grep "open files" Max open files 1048576 1048576 files It's always 1048576, can't increase any more My problem is glusterfsd daemon, after log /var/log/glusterfs/bricks/home-gluster-data_36.log a while "[Too many open files]" systemd status display Dec 24 15:36:55 gfs home-gluster-data_36[181536]: [2017-12-24 07:36:55.579134] M [MSGID: 113075] [posix-helpers.c:1837:posix_health_check_thread_proc] 0-volume-posix: health-check failed, going down Dec 24 15:37:25 gfs home-gluster-data_36[181536]: [2017-12-24 07:37:25.579509] M [MSGID: 113075] [posix-helpers.c:1844:posix_health_check_thread_proc] 0-volume-posix: still alive! -> SIGTERM gluster volume status show [root@gfs1 ~]# gluster volume status Status of volume: volume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.11.0.33:/home/gluster/data_33 49152 0 Y 228792 Brick 10.11.0.34:/home/gluster/data_34 49152 0 Y 231688 Brick 10.11.0.35:/home/gluster/data_35 49152 0 Y 229844 Brick 10.11.0.36:/home/gluster/data_36 N/A N/A N N/A Brick 10.11.0.37:/home/gluster/data_37 49152 0 Y 440437 Brick 10.11.0.38:/home/gluster/data_38 49152 0 Y 349833 Brick 10.11.0.39:/home/gluster/data_39 49152 0 Y 482584 Brick 10.11.0.40:/home/gluster/data_40 49152 0 Y 483945 ....
Doesn't look like to be a glusterd specific issue. Changing the component to core.
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.