Bug 1123298
| Summary: | logrotate should copytruncate to avoid openstack logging to deleted files | ||
|---|---|---|---|
| Product: | [Community] RDO | Reporter: | Ilkka Tengvall <ikke> |
| Component: | openstack-nova | Assignee: | Lars Kellogg-Stedman <lars> |
| Status: | CLOSED ERRATA | QA Contact: | Toure Dunnon <tdunnon> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | unspecified | CC: | apevec, chris.brown, eglynn, ikke, kchamart, lars, pneedle, rbryant, srevivo |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-06-22 15:33:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1148031, 1297008, 1297009, 1297010, 1298215 | ||
|
Description
Ilkka Tengvall
2014-07-25 09:15:42 UTC
ah, faulty cut'n paste. s,case:https://rhn.redhat.com/errata/RHBA-2014-0912.html,case: openstack-nova-common-2014.1-2.el6.noarch, btw, seeing this likely needs several logrotates, to get the "*.log-date" file deleted first. Can you please clarify what version of OpenStack you are using? It sounds like you're using RDO, is that right? Sorry for the delay, I was on holiday. We are internally using RDO, but I see the same configs in RHOSP5 too at our customer. I made a Red Hat Support ticket about this too. Understand guys, that this is a DoS issue. If one bombs APIs, thus causing log writings, you end up filling the log disks. In many cases that is the /var partition, which might have SQL and stuff there, and thus kills your OpenStack. https://access.redhat.com/support/cases/01213626 Proposed lines to add to logrotate files:
------
copytruncate
minsize 100k
size 200M
------
BTW RDO el6 builds are missing http://pkgs.fedoraproject.org/cgit/openstack-nova.git/commit/nova.logrotate?id=14f05c2a5ebcf4b76dc526a1fa6d79d66484de94 ... Note we don't need to use copytruncate or delaycompress here, as openstack services use logging.handlers.WatchedFileHandler ... https://docs.python.org/2/library/logging.handlers.html#watchedfilehandler So looks like that's not working as advertised? yes, it definately does not. [Bug triaging here, wonder if this is still reproducible with Juno release?] I can not confirm, as we still run icehouse. Running RDO icehouse (e.g., openstack-nova-common-2014.1.3-2.fc21.noarch), as soon as I rename a log file (mv nova-api.log nova-api.log.old), nova-api immediately starts logging to a *new* file, and no further updates are made to the renamed file.
If I delete the log file (rm nova-api.log) I see the same behavioe -- nova-api creates a *new* log file, rather than continuing to log to the deleted log.
However:
Looking at /proc/<nova-api_pid>/fd, I can see that nova is holding open a reference to the original log file. E.g., after the `mv` operation:
# ls -l /proc/452/fd
total 0
lr-x------. 1 nova nova 64 Mar 27 17:19 0 -> /dev/null
lrwx------. 1 nova nova 64 Mar 27 17:19 1 -> socket:[13321]
lrwx------. 1 nova nova 64 Mar 27 17:19 16 -> anon_inode:[eventpoll]
lrwx------. 1 nova nova 64 Mar 27 17:19 2 -> socket:[13321]
l-wx------. 1 nova nova 64 Mar 27 17:19 3 -> /var/log/nova/nova-api.log.old
lr-x------. 1 nova nova 64 Mar 27 17:19 4 -> pipe:[19145]
l-wx------. 1 nova nova 64 Mar 27 17:19 5 -> pipe:[19145]
lrwx------. 1 nova nova 64 Mar 27 17:19 6 -> socket:[20355]
lrwx------. 1 nova nova 64 Mar 27 17:19 7 -> socket:[21239]
lrwx------. 1 nova nova 64 Mar 27 17:19 8 -> socket:[22059]
In fact, of the four 'nova-api' processes running on my system, three
of them still have that open:
# ps -e -o pid,cmd |
grep nova-api |
awk '/python/ {print $1}' |
xargs -iPID ls -l /proc/PID/fd/3
l-wx------. 1 nova nova 64 Mar 27 17:19 /proc/452/fd/3 -> /var/log/nova/nova-api.log.old
l-wx------. 1 nova nova 64 Mar 27 17:23 /proc/2062/fd/3 -> /var/log/nova/nova-api.log.old
l-wx------. 1 nova nova 64 Mar 27 17:23 /proc/2161/fd/3 -> /var/log/nova/nova-api.log
l-wx------. 1 nova nova 64 Mar 27 17:23 /proc/2244/fd/3 -> /var/log/nova/nova-api.log.old
This means that even if that log file is deleted, the space will not
be recovered until the nova-api processes exit.
So, while I can't reproduce the problem described in this bz (in all my tests, nova would start logging to a new file after renaming or deleting the old file), it still looks like using copytruncate in the logrotate configuration would be a good idea (and wouldn't hurt in any case).
The same behavior can be seen in our Juno packages.
I've opened https://bugs.launchpad.net/oslo.log/+bug/1437444 upstream on this issue, in case there is a graceful way of solving this in the code rather than requiring particular behavior in a log rotation script. Yep I think the orig problem is not an issue as mentioned in http://pkgs.fedoraproject.org/cgit/openstack-nova.git/commit/?id=14f05c2a5e due to the use of logging.handlers.WatchedFileHandler Fix for nova: https://review.gerrithub.io/#/c/229295/ Fix for glance: https://review.gerrithub.io/229296/ Fix for cinder: https://review.gerrithub.io/229297/ Fix for neutron: https://review.gerrithub.io/229298/ From upstream bug: "With latest nova trunk, we have a new feature "kill -SIGHUP 5321" where 5321 is the parent process of all the nova-api processes and you will see all the subprocesses switch immediately to the new fds." Looks like this should solve the issue at least from Nova, without copytruncate. Fix was merged, closing. |