Bug 1730211 - Log rotation for rsyslog doesn't work
Summary: Log rotation for rsyslog doesn't work
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Noriko Hosoi
QA Contact: Qiaoling Tang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-16 08:01 UTC by Qiaoling Tang
Modified: 2019-10-16 06:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:29:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rsyslog.log-20190717-1563406201 (249.91 KB, text/plain)
2019-07-18 00:27 UTC, Qiaoling Tang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github https://github.com/openshift origin-aggregated-logging pull 1685 0 None None None 2019-12-04 08:01:20 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:32:03 UTC

Description Qiaoling Tang 2019-07-16 08:01:41 UTC
Description of problem:
The log rotation for rsyslog doesn't work.

$ oc exec rsyslog-gkqvh env |grep LOGGING_FILE
Defaulting container name to rsyslog.
Use 'oc describe pod/rsyslog-gkqvh -n openshift-logging' to see all of the containers in this pod.
LOGGING_FILE_SIZE=1024000
LOGGING_FILE_AGE=5

$  oc exec rsyslog-gkqvh -- ls /var/log/rsyslog -l
Defaulting container name to rsyslog.
Use 'oc describe pod/rsyslog-gkqvh -n openshift-logging' to see all of the containers in this pod.
total 65472
-rw-r--r--. 1 root root 35792065 Jul 16 07:56 rsyslog.log

I checked logrotate container, no log.

Version-Release number of selected component (if applicable):
ose-logging-rsyslog-v4.2.0-201907141446
ose-cluster-logging-operator-v4.2.0-201907121819



How reproducible:
Always

Steps to Reproduce:
1. Deploy logging via OLM, set log collector to rsyslog
2. oc set env ds/rsyslog LOGGING_FILE_SIZE=1024000 LOGGING_FILE_AGE=5
3. wait for a while, check files in /var/log/rsyslog in rsyslog pods

Actual results:
The log rotation for rsyslog doesn't work.

Expected results:
The log rotation should work.

Additional info:

Comment 1 Rich Megginson 2019-07-16 14:10:35 UTC
Note that rsyslog does not have log rotation built-in to it - there is no way we can ever make it rotate exactly at the LOGGING_FILE_SIZE - rsyslog size based log rotation is "best effort" or "approximate".

rsyslog logrotation runs every day: https://github.com/richm/cluster-logging-operator/blob/master/files/logrotate/logrotate

# run logrotate (/var/log/rsyslog/*.log) at 2am everyday
# run logrotate_pod (/var/lib/rsyslog.pod) at 3am everyday
0 2 * * *       root /usr/bin/bash /opt/app-root/bin/logrotate.sh
0 3 * * * root /usr/bin/bash /opt/app-root/bin/logrotate_pod.sh

With this configuration, the file can exceed 10M in size before logrotate runs.

If this is not acceptable, we can change logrotate to run every hour or every 10 minutes.  Or, leave this configuration as the default, and document for customers how it works and how to change it.  The configuration is in cron format.  See `man 5 crontab`.

Comment 2 Qiaoling Tang 2019-07-17 03:42:36 UTC
Thanks for the note. 

I tried to set LOGGING_FILE_AGE=5, LOGGING_FILE_PATH and LOGGING_FILE_SIZE were using the default value,  and set `*/5 * * * *       root /usr/bin/bash /opt/app-root/bin/logrotate.sh` in cm/logrotate-crontab, here are what I found:

1. after the rotate finished, the log file rsyslog.log was renamed to rsyslog.log-$timestamp, and a new file rsyslog.log  was created at the same time, but the logs were still written to rsyslog.log-$timestamp, not in the rsyslog.log.  

$ oc rsh rsyslog-wztfl
Defaulting container name to rsyslog.
Use 'oc describe pod/rsyslog-wztfl -n openshift-logging' to see all of the containers in this pod.
sh-4.2# ls /var/log/rsyslog/ -l
total 75072
-rw-r--r--. 1 root root      467 Jul 17 03:40 logrotate.log
-rw-r--r--. 1 root root     1007 Jul 17 03:25 rsyslog.log
-rw-r--r--. 1 root root 10382396 Jul 17 02:49 rsyslog.log-20190717-1563331501
-rw-r--r--. 1 root root 12649621 Jul 17 03:02 rsyslog.log-20190717-1563332101
-rw-r--r--. 1 root root 11345789 Jul 17 03:11 rsyslog.log-20190717-1563332701
-rw-r--r--. 1 root root 12271294 Jul 17 03:21 rsyslog.log-20190717-1563333302
-rw-r--r--. 1 root root 23122678 Jul 17 03:40 rsyslog.log-20190717-1563333901


If delete rsyslog pods, then wait until new pods start, the logs were written to rsyslog.log.

I checked the content in logrotate.log, it will only rotate file "*.log", it won't rotate file "rsyslog.log-$timestamp". If the rsyslog pods were not deleted, the log files rsyslog.log-$timestamp will never be deleted, and it can be very large.

# cat logrotate.log 
reading config file /tmp/logrotate.conf
Allocating hash table for state file, size 15360 B

Handling 1 logs

rotating pattern: "/var/log/rsyslog/*.log"
 1024000 bytes (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/rsyslog/logrotate.log
  log does not need rotating (log size is below the 'size' threshold)
considering log /var/log/rsyslog/rsyslog.log
  log does not need rotating (log size is below the 'size' threshold)


I thought the rsyslog pod's log should always be written to rsyslog.log.


2. I have set LOGGING_FILE_AGE=5, there always has 5 files named as rsyslog.log-$timestamp

Use 'oc describe pod/rsyslog-wztfl -n openshift-logging' to see all of the containers in this pod.
total 198984
-rw-r--r--. 1 root root       467 Jul 17 03:20 logrotate.log
-rw-r--r--. 1 root root   2142615 Jul 17 03:22 rsyslog.log
-rw-r--r--. 1 root root 152907148 Jul 17 02:41 rsyslog.log-20190717-1563330901
-rw-r--r--. 1 root root  10382396 Jul 17 02:49 rsyslog.log-20190717-1563331501
-rw-r--r--. 1 root root  12649621 Jul 17 03:02 rsyslog.log-20190717-1563332101
-rw-r--r--. 1 root root  11345789 Jul 17 03:11 rsyslog.log-20190717-1563332701
-rw-r--r--. 1 root root  12271294 Jul 17 03:21 rsyslog.log-20190717-1563333302


Use 'oc describe pod/rsyslog-wztfl -n openshift-logging' to see all of the containers in this pod.
total 58688
-rw-r--r--. 1 root root      921 Jul 17 03:25 logrotate.log
-rw-r--r--. 1 root root     1007 Jul 17 03:25 rsyslog.log
-rw-r--r--. 1 root root 10382396 Jul 17 02:49 rsyslog.log-20190717-1563331501
-rw-r--r--. 1 root root 12649621 Jul 17 03:02 rsyslog.log-20190717-1563332101
-rw-r--r--. 1 root root 11345789 Jul 17 03:11 rsyslog.log-20190717-1563332701
-rw-r--r--. 1 root root 12271294 Jul 17 03:21 rsyslog.log-20190717-1563333302
-rw-r--r--. 1 root root  6091556 Jul 17 03:25 rsyslog.log-20190717-1563333901

Comment 3 Noriko Hosoi 2019-07-17 22:51:56 UTC
(In reply to Qiaoling Tang from comment #2)
> 1. after the rotate finished, the log file rsyslog.log was renamed to
> rsyslog.log-$timestamp, and a new file rsyslog.log  was created at the same
> time, but the logs were still written to rsyslog.log-$timestamp, not in the
> rsyslog.log.

Currently, rsyslog.log is shared between the main rsyslogd process and its mmexternal plugin undefined_field.
Obviously, undefined_field is not handling SIGHUP.  I'm afraid we need to include and use this kind of library...
https://github.com/client9/reopen

To make sure my guess is right, could you please find out these?
1) Once logrotation happens, the size of rsyslog.log remains the same and nothing is added?  Or something is added even if it is very small?
2) What rsyslog.log-$timestamp is getting?  Could you share some sample lines?

Thanks.

Comment 4 Qiaoling Tang 2019-07-18 00:26:41 UTC
(In reply to Noriko Hosoi from comment #3)

> To make sure my guess is right, could you please find out these?
> 1) Once logrotation happens, the size of rsyslog.log remains the same and
> nothing is added?  Or something is added even if it is very small?

sh-4.2# ls -lh
total 218M
-rw-r--r--. 1 root root  467 Jul 18 00:15 logrotate.log
-rw-r--r--. 1 root root 1010 Jul 17 23:46 rsyslog.log
-rw-r--r--. 1 root root 4.1M Jul 17 23:10 rsyslog.log-20190717-1563405001
-rw-r--r--. 1 root root 3.9M Jul 17 23:15 rsyslog.log-20190717-1563405301
-rw-r--r--. 1 root root 7.9M Jul 17 23:23 rsyslog.log-20190717-1563405601
-rw-r--r--. 1 root root 1.1M Jul 17 23:24 rsyslog.log-20190717-1563405901
-rw-r--r--. 1 root root 144M Jul 18 00:18 rsyslog.log-20190717-1563406201
sh-4.2# cat rsyslog.log
{"@timestamp":"2019-07-17T23:30:01.789081+00:00", "host":"rsyslog-5lkwz", "level":"info", "facility":"syslog", "syslog-tag":"rsyslogd:", "source":"rsyslogd", "procid":"-", "msgid":"-", "message":" [origin software=\"rsyslogd\" swVersion=\"8.37.0-9.el7\" x-pid=\"80546\" x-info=\"http:\/\/www.rsyslog.com\"] rsyslogd was HUPed"}
{"@timestamp":"2019-07-17T23:30:01.789280+00:00", "host":"rsyslog-5lkwz", "level":"info", "facility":"syslog", "syslog-tag":"rsyslogd0:", "source":"rsyslogd0", "procid":"-", "msgid":"-", "message":"lookup table 'prio_to_level' reloaded from file '\/etc\/rsyslog.d\/prio_to_level.json' [v8.37.0-9.el7 try http:\/\/www.rsyslog.com\/e\/0 ]"}
{"@timestamp":"2019-07-17T23:30:01.789296+00:00", "host":"rsyslog-5lkwz", "level":"info", "facility":"syslog", "syslog-tag":"rsyslogd0:", "source":"rsyslogd0", "procid":"-", "msgid":"-", "message":"lookup table 'normalize_level' reloaded from file '\/etc\/rsyslog.d\/normalize_level.json' [v8.37.0-9.el7 try http:\/\/www.rsyslog.com\/e\/0 ]"}


> 2) What rsyslog.log-$timestamp is getting?  Could you share some sample
> lines?

The whole file is in the attachment. and I deleted lines contain "No Action Needed for xxxxxxxxx".

sh-4.2# cat rsyslog.log-20190717-1563406201
mmexternal: merge_json_log:  false
mmexternal: use_undefined:  false
mmexternal: default_keep_fields:  CEE,time,@timestamp,aushape,ci_job,collectd,docker,fedora-ci,file,foreman,geoip,hostname,ipaddr4,ipaddr6,kubernetes,level,message,namespace_name,namespace_uuid,offset,openstack,ovirt,pid,pipeline_metadata,rsyslog,service,systemd,tags,testcase,tlog,viaq_msg_id
mmexternal: extra_keep_fields:
mmexternal: undefined_name:  undefined
mmexternal: keep_empty_fields:
mmexternal: undefined_to_string:  false
mmexternal: undefined_dot_replace_char:  UNUSED
mmexternal: undefined_max_num_fields:  9223372036854775807
mmexternal: noaction:  true
{"@timestamp":"2019-07-17T23:05:33.140008+00:00", "host":"rsyslog-5lkwz", "level":"err", "facility":"syslog", "syslog-tag":"rsyslogd:", "source":"rsyslogd", "procid":"-", "msgid":"-", "message":"program '\/usr\/local\/bin\/undefined_field' exited normally, state 0 [v8.37.0-9.el7]"}
{"@timestamp":"2019-07-17T23:05:33.186499+00:00", "host":"rsyslog-5lkwz", "level":"err", "facility":"syslog", "syslog-tag":"rsyslogd:", "source":"rsyslogd", "procid":"-", "msgid":"-", "message":"program '\/usr\/local\/bin\/undefined_field' exited normally, state 0 [v8.37.0-9.el7]"}
{"@timestamp":"2019-07-17T23:05:33.232430+00:00", "host":"rsyslog-5lkwz", "level":"err", "facility":"syslog", "syslog-tag":"rsyslogd:", "source":"rsyslogd", "procid":"-", "msgid":"-", "message":"program '\/usr\/local\/bin\/undefined_field' exited normally, state 0 [v8.37.0-9.el7]"}
{"@timestamp":"2019-07-17T23:05:33.281407+00:00", "host":"rsyslog-5lkwz", "level":"err", "facility":"syslog", "syslog-tag":"rsyslogd:", "source":"rsyslogd", "procid":"-", "msgid":"-", "message":"program '\/usr\/local\/bin\/undefined_field' exited normally, state 0 [v8.37.0-9.el7]"}
{"@timestamp":"2019-07-17T23:05:33.329929+00:00", "host":"rsyslog-5lkwz", "level":"err", "facility":"syslog", "syslog-tag":"rsyslogd:", "source":"rsyslogd", "procid":"-", "msgid":"-", "message":"program '\/usr\/local\/bin\/undefined_field' exited normally, state 0 [v8.37.0-9.el7]"}
{"@timestamp":"2019-07-17T23:05:33.376303+00:00", "host":"rsyslog-5lkwz", "level":"err", "facility":"syslog", "syslog-tag":"rsyslogd:", "source":"rsyslogd", "procid":"-", "msgid":"-", "message":"program '\/usr\/local\/bin\/undefined_field' exited normally, state 0 [v8.37.0-9.el7]"}

Comment 5 Qiaoling Tang 2019-07-18 00:27:36 UTC
Created attachment 1591650 [details]
rsyslog.log-20190717-1563406201

Comment 6 Noriko Hosoi 2019-07-18 03:38:21 UTC
Thank you, @Qialling!  Your data proved my theory.

Comment 9 Qiaoling Tang 2019-07-26 05:38:21 UTC
Verified with ose-logging-rsyslog-v4.2.0-201907251819.

Comment 10 errata-xmlrpc 2019-10-16 06:29:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.