Description of problem: /var/log filesystem gets full, even if logrotate is enabled. Causes place for DoS. Version-Release number of selected component (if applicable): Any RDO/RHOS packages for openstack, in our case:https://rhn.redhat.com/errata/RHBA-2014-0912.html See "rpm -qf /etc/logrotate.d/openstack.*" How reproducible: Happens after logrotate. Steps to Reproduce: 1. start openstack 2. set logrotate scritps to have small size for logs 3. do api calls, e.g. "nova list" 4. see logs grow 5. run logrotate 6. see that logs got rotated 7. see that logs are using already deleted files: ls -la /proc/[0-9]*/fd|grep deleted Actual results: The daemons keep logging to deleted file's inodes, thus filling the disk still l-wx------. 1 nova nova 64 25.7. 12:14 3 -> /var/log/nova/api.log-20140725 (deleted) Expected results: The daemons should use the log file ment for them, e.g. /var/log/nova/api.log Additional info: This happens in Icehouse, and could be fixed by having coputruncate option for logrotate configs in /etc/logrotate.d/openstack* Applies to both RDO and RHOS
ah, faulty cut'n paste. s,case:https://rhn.redhat.com/errata/RHBA-2014-0912.html,case: openstack-nova-common-2014.1-2.el6.noarch,
btw, seeing this likely needs several logrotates, to get the "*.log-date" file deleted first.
Can you please clarify what version of OpenStack you are using? It sounds like you're using RDO, is that right?
Sorry for the delay, I was on holiday. We are internally using RDO, but I see the same configs in RHOSP5 too at our customer.
I made a Red Hat Support ticket about this too. Understand guys, that this is a DoS issue. If one bombs APIs, thus causing log writings, you end up filling the log disks. In many cases that is the /var partition, which might have SQL and stuff there, and thus kills your OpenStack. https://access.redhat.com/support/cases/01213626
Proposed lines to add to logrotate files: ------ copytruncate minsize 100k size 200M ------
BTW RDO el6 builds are missing http://pkgs.fedoraproject.org/cgit/openstack-nova.git/commit/nova.logrotate?id=14f05c2a5ebcf4b76dc526a1fa6d79d66484de94 ... Note we don't need to use copytruncate or delaycompress here, as openstack services use logging.handlers.WatchedFileHandler ... https://docs.python.org/2/library/logging.handlers.html#watchedfilehandler So looks like that's not working as advertised?
yes, it definately does not.
[Bug triaging here, wonder if this is still reproducible with Juno release?]
I can not confirm, as we still run icehouse.
Running RDO icehouse (e.g., openstack-nova-common-2014.1.3-2.fc21.noarch), as soon as I rename a log file (mv nova-api.log nova-api.log.old), nova-api immediately starts logging to a *new* file, and no further updates are made to the renamed file. If I delete the log file (rm nova-api.log) I see the same behavioe -- nova-api creates a *new* log file, rather than continuing to log to the deleted log. However: Looking at /proc/<nova-api_pid>/fd, I can see that nova is holding open a reference to the original log file. E.g., after the `mv` operation: # ls -l /proc/452/fd total 0 lr-x------. 1 nova nova 64 Mar 27 17:19 0 -> /dev/null lrwx------. 1 nova nova 64 Mar 27 17:19 1 -> socket:[13321] lrwx------. 1 nova nova 64 Mar 27 17:19 16 -> anon_inode:[eventpoll] lrwx------. 1 nova nova 64 Mar 27 17:19 2 -> socket:[13321] l-wx------. 1 nova nova 64 Mar 27 17:19 3 -> /var/log/nova/nova-api.log.old lr-x------. 1 nova nova 64 Mar 27 17:19 4 -> pipe:[19145] l-wx------. 1 nova nova 64 Mar 27 17:19 5 -> pipe:[19145] lrwx------. 1 nova nova 64 Mar 27 17:19 6 -> socket:[20355] lrwx------. 1 nova nova 64 Mar 27 17:19 7 -> socket:[21239] lrwx------. 1 nova nova 64 Mar 27 17:19 8 -> socket:[22059] In fact, of the four 'nova-api' processes running on my system, three of them still have that open: # ps -e -o pid,cmd | grep nova-api | awk '/python/ {print $1}' | xargs -iPID ls -l /proc/PID/fd/3 l-wx------. 1 nova nova 64 Mar 27 17:19 /proc/452/fd/3 -> /var/log/nova/nova-api.log.old l-wx------. 1 nova nova 64 Mar 27 17:23 /proc/2062/fd/3 -> /var/log/nova/nova-api.log.old l-wx------. 1 nova nova 64 Mar 27 17:23 /proc/2161/fd/3 -> /var/log/nova/nova-api.log l-wx------. 1 nova nova 64 Mar 27 17:23 /proc/2244/fd/3 -> /var/log/nova/nova-api.log.old This means that even if that log file is deleted, the space will not be recovered until the nova-api processes exit. So, while I can't reproduce the problem described in this bz (in all my tests, nova would start logging to a new file after renaming or deleting the old file), it still looks like using copytruncate in the logrotate configuration would be a good idea (and wouldn't hurt in any case). The same behavior can be seen in our Juno packages.
I've opened https://bugs.launchpad.net/oslo.log/+bug/1437444 upstream on this issue, in case there is a graceful way of solving this in the code rather than requiring particular behavior in a log rotation script.
Yep I think the orig problem is not an issue as mentioned in http://pkgs.fedoraproject.org/cgit/openstack-nova.git/commit/?id=14f05c2a5e due to the use of logging.handlers.WatchedFileHandler
Fix for nova: https://review.gerrithub.io/#/c/229295/
Fix for glance: https://review.gerrithub.io/229296/
Fix for cinder: https://review.gerrithub.io/229297/
Fix for neutron: https://review.gerrithub.io/229298/
From upstream bug: "With latest nova trunk, we have a new feature "kill -SIGHUP 5321" where 5321 is the parent process of all the nova-api processes and you will see all the subprocesses switch immediately to the new fds." Looks like this should solve the issue at least from Nova, without copytruncate.
Fix was merged, closing.