Bug 2180883

Summary: rsyslog stops sending logs to elasticsearch
Product: Red Hat OpenStack Reporter: Darin Sorrentino <dsorrent>
Component: openstack-tripleo-heat-templatesAssignee: Martin Magr <mmagr>
Status: ON_DEV --- QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact: mgeary <mgeary>
Priority: high    
Version: 17.0 (Wallaby)CC: cjanisze, jelynch, lmadsen, mburns, mmagr, mrunge, pgrist
Target Milestone: z2Keywords: Triaged
Target Release: 17.1Flags: astillma: needinfo? (mmagr)
pgrist: needinfo? (mmagr)
jelynch: needinfo? (mmagr)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Currently, Logrotate archives all log files once a day and Rsyslog stops sending logs to Elasticsearch Workaround: Add "RsyslogReopenOnTruncate: true" to your environment file during deployment so that Rsyslog reopens all log files on log rotation. Currently, RHOSP 17.1 uses an older puppet-rsyslog module with an incorrectly configured Rsyslog. Workaround: Manually apply patch [1] in `/usr/share/openstack-tripleo-heat-templates/deployment/logging/rsyslog-container-puppet.yaml` before deployment to configure Rsyslog correctly.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Darin Sorrentino 2023-03-22 14:07:40 UTC
Description of problem:

After a stack update/deployment, logs appear to be sent to elasticsearch fine.  Sometime over the next 24 hours, they stop for no apparently reason.  Restarting rsyslog has no impact.

Focused on contoller-0 to troubleshoot the issue.

Noticed the following message at startup:

imfile: no working or state file directory set, imfile will create state files in the current working directory

Checked the default config for rsyslog and it shows it should be using /var/spool/rsyslog, however that directory was empty.  Digging into the tripleo rsyslog-container-pupper.yaml I saw this:

        - name: create persistent state directory for rsyslog
          file:
            path: /var/lib/rsyslog.container
            state: directory
            setype: container_file_t

Added line:

global(workDirectory="/var/lib/rsyslog")

to 50_openstack_logs.conf and restarted the rsyslog container.  This resulted in imstate files being created in /var/lib/rsyslog and now caused the following message to show up multiple times in the podman logs:

rsyslogd: imfile error: message received is larger than max msg size; message will be split and processed as another message [v8.2102.0-101.el9_0.1]

Added the following directive to rsyslog.conf:

$MaxMessageSize 8k

Restarted the container and the logs finally started showing up in elastic search.

Made the same changes to controller-1 and controller-2.

After a 24 hour period, controller-0 continues to send logs to elasticsearch.  The other 2 controllers have stopped again and I am unsure why.  Restarting them seems to have no impact.

Version-Release number of selected component (if applicable):

Environment recently upgraded to 17.0.1, however this was not working on 17.0 either.

How reproducible:

100%

Steps to Reproduce:
1. Configure rsyslog to send to elasticsearch
2. Wait 24 hours
3.

Actual results:
Logs stop showing in elasticsearch

Expected results:
Logs show in elasticsearch

Additional info:

Comment 11 Matthias Runge 2023-04-20 15:18:55 UTC
The proposed upstream patch fails on the gate, Martin can you please take a look?

Comment 18 Leonid Natapov 2023-05-30 14:34:09 UTC
Failed QA.
There is newer puppet-rsyslog in upstream,so usage of rsyslog::config class parameter works there, but downstream we need to use rsyslog::server instead.

Comment 20 Matthias Runge 2023-06-06 09:21:07 UTC
A workaround exists, moving this to z1

Comment 23 Jenny-Anne Lynch 2023-08-15 16:56:18 UTC
Hi Martin,

The target milestone is z2, but can we include this doc text as a Known Issue in 17.1 GA? And then update it to a Bug Fix in z2? 

The workaround suggests applying apply "patch [1]" but I don't see a link for [1]. Is it available downstream for 17.1 GA? 

Original doc text:

Issue one:

Cause: Logrotate archives all log files once a day.
Consequence: Rsyslog stops sending logs to Elasticsearch
Workaround (if any): Add "RsyslogReopenOnTruncate: true" to environment file during deployment. 
Result: Rsyslog reopens all log files on log rotation.

Issue two:
Cause: OSP-17.1 uses older puppet-rsyslog module
Consequence: Rsyslog is incorrectly connfigured
Workaround: Manually apply patch [1] in /usr/share/openstack-tripleo-heat-templates/deployment/logging/rsyslog-container-puppet.yaml before deployment
Result: Rsyslog is configured correctly

New doc text:

Currently, Logrotate archives all log files once a day and Rsyslog stops sending logs to Elasticsearch
Workaround: Add "RsyslogReopenOnTruncate: true" to your environment file during deployment so that Rsyslog reopens all log files on log rotation.

Currently, RHOSP 17.1 uses an older puppet-rsyslog module with an incorrectly configured Rsyslog. Workaround: Manually apply patch [1] in `/usr/share/openstack-tripleo-heat-templates/deployment/logging/rsyslog-container-puppet.yaml` before deployment to configure Rsyslog correctly.