Bug 1557933

Summary: big qemu logs vs log rotation
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Dr. David Alan Gilbert <dgilbert>
Component: libvirtAssignee: Daniel Berrangé <berrange>
Status: CLOSED ERRATA QA Contact: chhu
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.1CC: berrange, dgilbert, dyuan, fjin, jdenemar, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-5.6.0-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-06 07:11:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dr. David Alan Gilbert 2018-03-19 09:54:16 UTC
Description of problem:

I've got a set of repeated migrations happening, so the logs for a particular guest are getting quite long;   after leaving it doing that all weekend I'm noticing something odd:
[root@virtlab403 qemu]# ls -l /var/log/libvirt/qemu/rhel6*
-rw-------. 1 root root 1827041 Mar 19 05:49 /var/log/libvirt/qemu/rhel6.log
-rw-------. 1 root root 2097152 Mar 19 01:53 /var/log/libvirt/qemu/rhel6.log.0
-rw-------. 1 root root 2097152 Mar 18 21:19 /var/log/libvirt/qemu/rhel6.log.1
-rw-------. 1 root root 2097152 Mar 18 16:48 /var/log/libvirt/qemu/rhel6.log.2
-rw-------. 1 root root    5080 Mar  1 03:37 /var/log/libvirt/qemu/rhel6.log-20180301.gz
-rw-------. 1 root root      20 Mar  4 03:12 /var/log/libvirt/qemu/rhel6.log-20180304.gz
-rw-------. 1 root root      20 Mar 15 15:23 /var/log/libvirt/qemu/rhel6.log-20180315.gz
-rw-------. 1 root root  841447 Mar 18 03:11 /var/log/libvirt/qemu/rhel6.log-20180318

so each log file seems to be renamed to .0,.1,... when it hits 2MB - OK
but then the rhel6.log-2018-0318 seems shorter than that - so I'm guessing that the log rotation has taken the current rhel6.log and renamed that to add the date; but that's lost the other .1/.1/.2 etc - so anything that happened on 17th is gone.   The same pattern seems to be in the other host which it's ping-ponging to.

Velrsion number of selected component (if applicable):
libvirt-3.9.0-13.el7.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Daniel Berrangé 2018-03-19 11:37:38 UTC
Historically we relied on logrotate scripts to rollover to the guest log files. This is barely acceptable because by the time logrotate runs, the guest could already have inflicted a DOS by filling up all disk space with logs.

Thus we introduced virtlogd which owns the logfile, and QEMU just writes to a pipe to virtlogd. Virtlogd does rollover at a specific file size, to guarantee the guest can't perform a DoS.

The trouble comes because the user can optionally turn off virtlogd, so we still ship the logrotate config files in the RPM to cope with that scenario.

We could mitigate most of the problem by setting a minimum size in the log rotate config file, that is larger than the default rollover size used by virtlogd. That would turn logrotate into a no-op with the out of the box config.

Comment 3 Erik Skultety 2019-03-11 12:35:14 UTC
(In reply to Dr. David Alan Gilbert from comment #0)
> Description of problem:
> 
> I've got a set of repeated migrations happening, so the logs for a
> particular guest are getting quite long;   after leaving it doing that all
> weekend I'm noticing something odd:
> [root@virtlab403 qemu]# ls -l /var/log/libvirt/qemu/rhel6*
> -rw-------. 1 root root 1827041 Mar 19 05:49 /var/log/libvirt/qemu/rhel6.log
> -rw-------. 1 root root 2097152 Mar 19 01:53
> /var/log/libvirt/qemu/rhel6.log.0
> -rw-------. 1 root root 2097152 Mar 18 21:19
> /var/log/libvirt/qemu/rhel6.log.1
> -rw-------. 1 root root 2097152 Mar 18 16:48
> /var/log/libvirt/qemu/rhel6.log.2
> -rw-------. 1 root root    5080 Mar  1 03:37
> /var/log/libvirt/qemu/rhel6.log-20180301.gz
> -rw-------. 1 root root      20 Mar  4 03:12
> /var/log/libvirt/qemu/rhel6.log-20180304.gz
> -rw-------. 1 root root      20 Mar 15 15:23
> /var/log/libvirt/qemu/rhel6.log-20180315.gz
> -rw-------. 1 root root  841447 Mar 18 03:11
> /var/log/libvirt/qemu/rhel6.log-20180318
> 
> so each log file seems to be renamed to .0,.1,... when it hits 2MB - OK
> but then the rhel6.log-2018-0318 seems shorter than that - so I'm guessing
> that the log rotation has taken the current rhel6.log and renamed that to
> add the date; but that's lost the other .1/.1/.2 etc - so anything that
> happened on 17th is gone.   The same pattern seems to be in the other host
> which it's ping-ponging to.

Can you confirm you're not using virtlogd as the logging backend? If so, I'd suggest giving virtlogd a try whether it helps your use case. I don't think it's worth doing anything from libvirt's POV unless this happens with virtlogd (which is the default) too.

Comment 4 Dr. David Alan Gilbert 2019-03-19 11:39:47 UTC
Hmm this is a long time ago; I'm not sure what the config was at the time.

However, even on a rhel8 I still see that we've got logrotate happening on all the files in /var/log/libvirt/qemu/ and that's got virtlogd configured - so I think it will still happen.

[The other bad logrotate interaction we have is that if you started a VM once, say a guestfs with temporary name, logrotate will create a 0 length file when it rotates
the file, and so you permenantly get 5 approximately empty files for a VM you've not used in years]

Comment 5 Daniel Berrangé 2019-03-19 11:46:02 UTC
(In reply to Erik Skultety from comment #3)
> Can you confirm you're not using virtlogd as the logging backend? If so, I'd
> suggest giving virtlogd a try whether it helps your use case. I don't think
> it's worth doing anything from libvirt's POV unless this happens with
> virtlogd (which is the default) too.

It is exactly the interaction between virtlogd & logrotate that is the problem - they're both trying to rotate the same files per my comment #2

Comment 6 Daniel Berrangé 2019-09-11 12:11:32 UTC
Since this ticket we did an improvement in upstream to try to stop logrotate interfering with virtlogd:

commit 975b004d07e472a4f7557f923fed228f3fa02f86
Author: Daniel P. Berrangé <berrange>
Date:   Fri Jul 5 10:32:53 2019 +0100

    logging: ensure virtlogd rollover takes priority over logrotate
    
    The virtlogd config is set to rollover logs every 2 MB.
    
    Normally a logrotate config file is also installed to handle cases where
    virtlogd is disabled. This is set to rollover weekly with no size
    constraint.
    
    As a result logrotate can interfere with virtlogd's, rolling over files
    that virtlogd has already taken care of.
    
    This changes logrotate configs to rollover based on a max size
    constraint of 2 MB + 1 byte. When virtlogd is running the log files will
    never get this large, making logrotate a no-op.
    
    If the user changes the size in virtlogd's config to something larger,
    they are responsible for also changing the logrotate config suitably.
    
    The LXC/libxl drivers don't use virtlogd, but there logrotate config is
    altered to match the QEMU driver config, for the sake of consistency.
    
    Reviewed-by: Jim Fehlig <jfehlig>
    Signed-off-by: Daniel P. Berrangé <berrange>

This merged in v5.6.0, so we got it via the recent rebase

Comment 8 chhu 2019-09-25 01:04:54 UTC
Reproduced on package:
libvirt-daemon-kvm-4.5.0-35.module+el8.1.0+4227+b2722cb3.x86_64

Test steps:
1. Check the lograte conf file libvirtd.qemu. It's weekly and without file size.
# cat /etc/logrotate.d/libvirtd.qemu
/var/log/libvirt/qemu/*.log {
        weekly
        missingok
        rotate 4
        compress
        delaycompress
        copytruncate
}

Tested on package:
libvirt-daemon-kvm-5.6.0-6.module+el8.1.0+4244+9aa4e6bb.x86_64

Test steps:
1. Check the lograte conf file libvirtd.qemu. It's with size 2097153,
which is max_size [in virtlogd.conf]+1.

# cat /etc/libvirt/virtlogd.conf |grep max_size
#max_size = 2097152

# cat /etc/logrotate.d/libvirtd.qemu
/var/log/libvirt/qemu/*.log {
        size 2097153
        missingok
        rotate 4
        compress
        delaycompress
        copytruncate
}

2. Start/stop VM for 5000 times, and check the logs: the virtlogd max_size work well,
   no logrotate log file created.
 # for i in {1..5000}; do virsh destroy r8t; sleep 2; virsh start r8t; done
 # ls -lh /var/log/libvirt/qemu/ | grep r8t
-rw-------. 1 root root 954K Sep 24 12:35 r8t.log
-rw-------. 1 root root 2.0M Sep 24 12:21 r8t.log.0
-rw-------. 1 root root 2.0M Sep 24 11:51 r8t.log.1
-rw-------. 1 root root 2.0M Sep 24 11:22 r8t.log.2

Comment 9 chhu 2019-09-25 01:12:22 UTC
According to comment8, set the bug status to verified.

Comment 11 errata-xmlrpc 2019-11-06 07:11:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723