1123298 – logrotate should copytruncate to avoid openstack logging to deleted files

RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/

Bug 1123298 - logrotate should copytruncate to avoid openstack logging to deleted files

Summary: logrotate should copytruncate to avoid openstack logging to deleted files

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	RDO
Classification:	Community
Component:	openstack-nova
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Lars Kellogg-Stedman
QA Contact:	Toure Dunnon
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1148031 1297008 1297009 1297010 1298215
TreeView+	depends on / blocked

Reported:	2014-07-25 09:15 UTC by Ilkka Tengvall
Modified:	2017-06-22 15:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-06-22 15:33:38 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Launchpad	1437444	0	None	None	None	Never
Red Hat Knowledge Base (Solution)	2770111	0	None	None	None	2016-11-16 11:54:57 UTC

Description Ilkka Tengvall 2014-07-25 09:15:42 UTC

Description of problem:

/var/log filesystem gets full, even if logrotate is enabled. Causes place for DoS.

Version-Release number of selected component (if applicable):

Any RDO/RHOS packages for openstack, in our case:https://rhn.redhat.com/errata/RHBA-2014-0912.html

See "rpm -qf /etc/logrotate.d/openstack.*"


How reproducible:

Happens after logrotate.

Steps to Reproduce:
1. start openstack
2. set logrotate scritps to have small size for logs
3. do api calls, e.g. "nova list"
4. see logs grow
5. run logrotate
6. see that logs got rotated
7. see that logs are using already deleted files: ls -la /proc/[0-9]*/fd|grep deleted


Actual results:

The daemons keep logging to deleted file's inodes, thus filling the disk still

l-wx------. 1 nova nova 64 25.7. 12:14 3 -> /var/log/nova/api.log-20140725 (deleted)



Expected results:

The daemons should use the log file ment for them, e.g. /var/log/nova/api.log 

Additional info:

This happens in Icehouse, and could be fixed by having coputruncate option for logrotate configs in /etc/logrotate.d/openstack*

Applies to both RDO and RHOS

Comment 1 Ilkka Tengvall 2014-07-25 09:18:13 UTC

ah, faulty cut'n paste.

s,case:https://rhn.redhat.com/errata/RHBA-2014-0912.html,case: openstack-nova-common-2014.1-2.el6.noarch,

Comment 3 Ilkka Tengvall 2014-07-25 09:34:53 UTC

btw, seeing this likely needs several logrotates, to get the "*.log-date" file deleted first.

Comment 4 Russell Bryant 2014-07-30 15:26:52 UTC

Can you please clarify what version of OpenStack you are using?  It sounds like you're using RDO, is that right?

Comment 5 Ilkka Tengvall 2014-08-11 12:13:24 UTC

Sorry for the delay, I was on holiday. We are internally using RDO, but I see the same configs in RHOSP5 too at our customer.

Comment 6 Ilkka Tengvall 2014-09-30 06:00:53 UTC

I made a Red Hat Support ticket about this too. Understand guys, that this is a DoS issue. If one bombs APIs, thus causing log writings, you end up filling the log disks. In many cases that is the /var partition, which might have SQL and stuff there, and thus kills your OpenStack.

https://access.redhat.com/support/cases/01213626

Comment 7 Alan Pevec 2014-10-08 15:29:12 UTC

Proposed lines to add to logrotate files:

------
    copytruncate
    minsize 100k
    size 200M
------

Comment 8 Alan Pevec 2014-10-08 18:11:45 UTC

BTW RDO el6 builds are missing
http://pkgs.fedoraproject.org/cgit/openstack-nova.git/commit/nova.logrotate?id=14f05c2a5ebcf4b76dc526a1fa6d79d66484de94

...
Note we don't need to use copytruncate or delaycompress here, as openstack services use logging.handlers.WatchedFileHandler
...

https://docs.python.org/2/library/logging.handlers.html#watchedfilehandler

So looks like that's not working as advertised?

Comment 9 Ilkka Tengvall 2014-10-21 07:35:55 UTC

yes, it definately does not.

Comment 10 Kashyap Chamarthy 2015-02-19 15:41:06 UTC

[Bug triaging here, wonder if this is still reproducible with Juno release?]

Comment 11 Ilkka Tengvall 2015-02-23 08:18:53 UTC

I can not confirm, as we still run icehouse.

Comment 12 Lars Kellogg-Stedman 2015-03-27 17:32:16 UTC

Running RDO icehouse (e.g., openstack-nova-common-2014.1.3-2.fc21.noarch), as soon as I rename a log file (mv nova-api.log nova-api.log.old), nova-api immediately starts logging to a *new* file, and no further updates are made to the renamed file.

If I delete the log file (rm nova-api.log) I see the same behavioe -- nova-api creates a *new* log file, rather than continuing to log to the deleted log.

However:

Looking at /proc/<nova-api_pid>/fd, I can see that nova is holding open a reference to the original log file.  E.g., after the `mv` operation:

    # ls -l /proc/452/fd
    total 0
    lr-x------. 1 nova nova 64 Mar 27 17:19 0 -> /dev/null
    lrwx------. 1 nova nova 64 Mar 27 17:19 1 -> socket:[13321]
    lrwx------. 1 nova nova 64 Mar 27 17:19 16 -> anon_inode:[eventpoll]
    lrwx------. 1 nova nova 64 Mar 27 17:19 2 -> socket:[13321]
    l-wx------. 1 nova nova 64 Mar 27 17:19 3 -> /var/log/nova/nova-api.log.old
    lr-x------. 1 nova nova 64 Mar 27 17:19 4 -> pipe:[19145]
    l-wx------. 1 nova nova 64 Mar 27 17:19 5 -> pipe:[19145]
    lrwx------. 1 nova nova 64 Mar 27 17:19 6 -> socket:[20355]
    lrwx------. 1 nova nova 64 Mar 27 17:19 7 -> socket:[21239]
    lrwx------. 1 nova nova 64 Mar 27 17:19 8 -> socket:[22059]

In fact, of the four 'nova-api' processes running on my system, three
of them still have that open:

    # ps -e -o pid,cmd |
      grep nova-api |
      awk '/python/ {print $1}' |
      xargs -iPID ls -l /proc/PID/fd/3
    l-wx------. 1 nova nova 64 Mar 27 17:19 /proc/452/fd/3 -> /var/log/nova/nova-api.log.old
    l-wx------. 1 nova nova 64 Mar 27 17:23 /proc/2062/fd/3 -> /var/log/nova/nova-api.log.old
    l-wx------. 1 nova nova 64 Mar 27 17:23 /proc/2161/fd/3 -> /var/log/nova/nova-api.log
    l-wx------. 1 nova nova 64 Mar 27 17:23 /proc/2244/fd/3 -> /var/log/nova/nova-api.log.old
      
This means that even if that log file is deleted, the space will not
be recovered until the nova-api processes exit.

So, while I can't reproduce the problem described in this bz (in all my tests, nova would start logging to a new file after renaming or deleting the old file), it still looks like using copytruncate in the logrotate configuration would be a good idea (and wouldn't hurt in any case).

The same behavior can be seen in our Juno packages.

Comment 13 Lars Kellogg-Stedman 2015-03-27 18:05:04 UTC

I've opened https://bugs.launchpad.net/oslo.log/+bug/1437444 upstream on this issue, in case there is a graceful way of solving this in the code rather than requiring particular behavior in a log rotation script.

Comment 14 Pádraig Brady 2015-03-27 18:19:08 UTC

Yep I think the orig problem is not an issue as mentioned in http://pkgs.fedoraproject.org/cgit/openstack-nova.git/commit/?id=14f05c2a5e
due to the use of logging.handlers.WatchedFileHandler

Comment 15 Lars Kellogg-Stedman 2015-04-03 14:03:25 UTC

Fix for nova: https://review.gerrithub.io/#/c/229295/

Comment 16 Lars Kellogg-Stedman 2015-04-03 14:09:44 UTC

Fix for glance: https://review.gerrithub.io/229296/

Comment 17 Lars Kellogg-Stedman 2015-04-03 14:12:59 UTC

Fix for cinder: https://review.gerrithub.io/229297/

Comment 18 Lars Kellogg-Stedman 2015-04-03 14:15:42 UTC

Fix for neutron: https://review.gerrithub.io/229298/

Comment 19 Alan Pevec 2015-04-26 15:02:07 UTC

From upstream bug:
"With latest nova trunk, we have a new feature "kill -SIGHUP 5321" where 5321 is the parent process of all the nova-api processes and you will see all the subprocesses switch immediately to the new fds."

Looks like this should solve the issue at least from Nova, without copytruncate.

Comment 20 Christopher Brown 2017-06-22 15:33:38 UTC

Fix was merged, closing.

Note You need to log in before you can comment on or make changes to this bug.