Hide Forgot
Description of problem: nova list fails with : but in the nova-api.log we see: https://gist.github.com/jtaleric/a5f9036da5fc7a65c865 Version-Release number of selected component (if applicable): How reproducible: 100% (at scale) Steps to Reproduce: 1. Deploy with ospd - run multiple scales, eventually you will hit this. Actual results: [stack@rackspace ~]$ nova list ERROR (ClientException): The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-f0c15265-c3a4-46fa-b1a8-ba2f7a0cd232) mariadb log 160301 17:48:09 [ERROR] Error in accept: Too many open files [stack@rackspace ~]$ sysctl fs.file-max fs.file-max = 26225232 [stack@rackspace ~]$ cat /proc/$(pgrep mysqld$)/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 1030571 1030571 processes Max open files 1024 4096 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1030571 1030571 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Expected results: Not to run out of fd for mariadb
the 1024 is too low and mysqld_safe must be run as root, so that it can alter the file handle limit on startup. when it is not run as root, we see the warning in the logs: "160216 15:13:29 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4907)".
alternatively, for a systemd run, need to set LimitNOFILE to a bigger number than what it needs, in this case 4907.
This is very simple fix for OSP8 and unblocks our scale deployments which are failing due to this issue. Though this is not a blocker for the release, it is asked to be fixed for the OSP8 - I am requesting blocker flag mainly for the reason if we don't manage to land fix in time, we need doc_text for OSP8 workaround.
I would note that I think LimitNOFILE needs to be set to a number, and not the special "Infinity" value, but I have not confirmed this. Joe T's experiments seemed to suggest this was the case, however.
since we're not using systemd, i think we'd need to set this in /etc/security/limits.conf per: https://ma.ttias.be/increase-open-files-limit-in-mariadb-on-centos-7-with-systemd/ can we get confirmation on the exact value that we should be using?
(In reply to James Slagle from comment #7) > since we're not using systemd, i think we'd need to set this in > /etc/security/limits.conf per: > https://ma.ttias.be/increase-open-files-limit-in-mariadb-on-centos-7-with- > systemd/ sorry wrong link. should be: https://mariadb.com/kb/en/mariadb/server-system-variables/#open_files_limit > > can we get confirmation on the exact value that we should be using?
if you're not using systemd, then mysqld_safe should be run as root. That will allow it to set the file limits that are optimal for its configuration, before it spawns off the mysqld service itself which it will run as the mysql user.
ok, i think we can call this one fixed then. using latest osp-d 8 packages, I see: [root@overcloud-controller-0 ~]# ps aux | grep mysqld root 14065 0.0 0.0 112644 924 pts/0 R+ 21:39 0:00 grep --color=auto mysqld root 14607 0.0 0.0 11768 1624 ? S 21:33 0:00 /bin/sh /usr/bin/mysqld_safe --defaults-file=/etc/my.cnf --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --log-error=/var/log/mysqld.log --user=mysql --open-files-limit=16384 --wsrep-cluster-address=gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2 mysql 15529 2.2 4.6 1506672 181804 ? Sl 21:33 0:08 /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep-provider=/usr/lib64/galera/libgalera_smm.so --wsrep-cluster-address=gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2 --log-error=/var/log/mysqld.log --open-files-limit=16384 --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1 mysqld_safe started as the root user and we actually pass --open-files-limit=16384 and the limit is set to that value: [root@overcloud-controller-0 ~]# cat /proc/14607/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 15011 15011 processes Max open files 16384 16384 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 15011 15011 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us I'm not entirely sure where the 16384 value is coming from. It could be either the galera resource agent or puppet module, but it seems like a reasonable default.
OK take a look at /proc/15529/limits, because that's the actual mysqld process. mysqld_safe is just a wrapper script.
i've actually rebooted the environment, so the pids are different: [root@overcloud-controller-0 ~]# ps aux | grep mysql root 14608 0.0 0.0 11768 1624 ? S 14:33 0:00 /bin/sh /usr/bin/mysqld_safe --defaults-file=/etc/my.cnf --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --log-error=/var/log/mysqld.log --user=mysql --open-files-limit=16384 --wsrep-cluster-address=gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2 mysql 15517 1.2 1.7 1720380 143364 ? Sl 14:33 0:41 /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep-provider=/usr/lib64/galera/libgalera_smm.so --wsrep-cluster-address=gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2 --log-error=/var/log/mysqld.log --open-files-limit=16384 --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306 --wsrep_start_position=82c57d9f-eaf5-11e5-8ea4-365f2dfd153b:60857 root 17570 0.0 0.0 112648 924 pts/0 R+ 15:28 0:00 grep --color=auto mysql [root@overcloud-controller-0 ~]# cat /proc/14608/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 31139 31139 processes Max open files 16384 16384 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 31139 31139 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us [root@overcloud-controller-0 ~]# cat /proc/15517/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 31139 31139 processes Max open files 20485 20485 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 31139 31139 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us looks like the limit is 20485 for the actual mysql process.
mike, does the last comment make it look like this is resolved?
yes, you can see that mysqld_safe allowed the open files to be a pretty custom-tailored value for mysqld itself.
alright, so the limit is already 16384 for the overcloud, but this bug is actually for the undercloud (sorry I missed that). I submitted a patch to address that: https://review.openstack.org/293675