Bug 1507712 - Fluentd logging Issues after patch 3.6 patch
Summary: Fluentd logging Issues after patch 3.6 patch
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.z
Assignee: ewolinet
QA Contact: Qiaoling Tang
URL:
Whiteboard:
Depends On: 1555367
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-31 01:28 UTC by Miheer Salunke
Modified: 2021-09-09 12:46 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-08-09 22:14:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3230951 0 None None None 2017-11-02 16:47:17 UTC
Red Hat Product Errata RHBA-2018:2337 0 None None None 2018-08-09 22:15:05 UTC

Description Miheer Salunke 2017-10-31 01:28:16 UTC
Description of problem:

I patched our 3.6 to the latest errata(RHBA-2017:3049 - OpenShift Container Platform 3.6.173.0.49) and something seems off with logging. We setup a secure forwarder which forwards our logs to splunk which has been working fine but after this upgrade a couple things happened. 

The configmaps I modified seemed to have been wiped out which has never happened before during and upgrade. 

It seems like fluentd is writing a lot of logs to stdout and seems like the metadata around container logs is gone like kubernetes_namespace, pod name , container name, etc.....  

Please let us know if any further details are required.

Version-Release number of selected component (if applicable):
OCP 3.6

How reproducible:
Always 

Steps to Reproduce:
1.As mentioned in description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Rich Megginson 2017-10-31 02:02:35 UTC
(In reply to Miheer Salunke from comment #0)
> Description of problem:
> 
> I patched our 3.6 to the latest errata(RHBA-2017:3049 - OpenShift Container
> Platform 3.6.173.0.49) and something seems off with logging. We setup a
> secure forwarder which forwards our logs to splunk which has been working
> fine but after this upgrade a couple things happened. 
> 
> The configmaps I modified seemed to have been wiped out which has never
> happened before during and upgrade. 

Which configmaps did you modify, and what were those modifications?

> 
> It seems like fluentd is writing a lot of logs to stdout

Like what?

> and seems like the
> metadata around container logs is gone like kubernetes_namespace, pod name ,
> container name, etc.....  

Can you provide example Elasticsearch searches to demonstrate this e.g.

https://docs.google.com/document/d/1MHvHwVSkkO5ohus2Pl3aFcvxXfSAY7qVEblIM1xgcXk/edit#heading=h.c0kdwi7yimxo

> 
> Please let us know if any further details are required.

https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh

> 
> Version-Release number of selected component (if applicable):
> OCP 3.6
> 
> How reproducible:
> Always 
> 
> Steps to Reproduce:
> 1.As mentioned in description
> 2.
> 3.
> 
> Actual results:
> 
> 
> Expected results:
> 
> 
> Additional info:

Comment 2 Jeff Cantrill 2017-11-01 19:50:10 UTC
@Eric are we preserving configmaps at all now?

Comment 3 ewolinet 2017-11-01 21:49:27 UTC
That is planned as a 3.8 feature... 

For now a customer can set the contents of their specific config files to one of the following variables to maintain it:

fluentd_config_contents
fluentd_throttle_contents
fluentd_secureforward_contents

Comment 4 ewolinet 2018-02-13 19:19:19 UTC
We should now be preserving the configmap changes in 3.9

Comment 5 Steven Walter 2018-02-15 21:32:45 UTC
(In reply to ewolinet from comment #3)
> That is planned as a 3.8 feature... 
> 
> For now a customer can set the contents of their specific config files to
> one of the following variables to maintain it:
> 
> fluentd_config_contents
> fluentd_throttle_contents
> fluentd_secureforward_contents

Can you explain what you mean by this? These variables are not documented anywhere I can find, and customer is looking for a way to preserve configmaps in their upgrade to 3.7

Comment 6 ewolinet 2018-02-15 21:51:17 UTC
(In reply to Steven Walter from comment #5)
> Can you explain what you mean by this? These variables are not documented
> anywhere I can find, and customer is looking for a way to preserve
> configmaps in their upgrade to 3.7

Sure.. we don't document those variables since they could cause a misconfiguration within your cluster in situations where we are provided necessary configmap changes. They are commented out in the bottom of the defaults/main.yml for the openshift-logging role [1]. 


Those variables each correspond with a file within the fluentd configmap data section. The intent is that you set the value of the variable equal the contents of the configmap section and the installer would use the variable contents instead of the files provided within the role when building the configmap for doing `oc apply` with.


We phased those out for 3.9 in favor of patching the configmap changes on an existing system onto the files we provide. At this time there isn't a plan to backport it, but it doesn't seem unreasonable to do that so the variables linked below are not needed.



[1] https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/openshift_logging/defaults/main.yml#L179

Comment 12 Anping Li 2018-03-28 05:17:52 UTC
The configmap are overwritten when redeploy logging with openshift3/ose-ansible/images/v3.7.40-1.

Comment 13 ewolinet 2018-03-28 17:00:09 UTC
@anli,

can you please provide the process you used to test this for 3.7.40?

I am unable to verify this locally with the following steps --
1) Deploy logging
2) oc edit configmap/logging-fluentd
3) Rerun openshift-logging.yml playbook
4) Check contents of configmap/logging-fluentd

I see the contents are preserved from what I manually edited them to be after rerunning the playbook.

Comment 14 Anping Li 2018-03-29 02:37:05 UTC
@ewolinet,
The root causes may be the logging_namespaces. I am using a different namespace openshift-logging. 

1. deploy logging.
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=0
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_cluster_size=1
openshift_logging_purge_logging=true
openshift_logging_namespace=openshift-logging
openshift_logging_install_logging=true

2. Enable throttle-config.yaml in  logging-fluentd configmap

3. Redeploy logging with same inventory file.

Comment 15 ewolinet 2018-03-29 14:10:59 UTC
Thanks @anli,

I can recreate this when using a different namespace.
I will have a PR today to resolve this.

Comment 19 Anping Li 2018-03-30 05:53:13 UTC
The fix haven't been merged to ose-ansible/images/v3.7.42-2

Comment 23 Qiaoling Tang 2018-08-01 09:44:41 UTC
Verified on logging-curator-v3.7.61-1
logging-elasticsearch-v3.7.61-1
logging-fluentd-v3.7.61-1

Comment 24 Qiaoling Tang 2018-08-02 01:46:19 UTC
(In reply to Qiaoling Tang from comment #23)
> Verified on logging-curator-v3.7.61-1
> logging-elasticsearch-v3.7.61-1
> logging-fluentd-v3.7.61-1

Verified on ose-ansible v3.7.61

Comment 26 errata-xmlrpc 2018-08-09 22:14:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2337


Note You need to log in before you can comment on or make changes to this bug.