Bug 1849188

Summary:	logStore stanza is required to deploy fluentd standalone
Product:	OpenShift Container Platform	Reporter:	Chad Scribner <cscribne>
Component:	Logging	Assignee:	Periklis Tsirakidis <periklis>
Status:	CLOSED ERRATA	QA Contact:	Anping Li <anli>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	4.4	CC:	aos-bugs, cvogel, dseals, ewolinet, jcantril, oarribas, periklis
Target Milestone:	---
Target Release:	4.6.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Missing LogStore stanza check to add ES-related fluentd init container. Consequence: Fluentd init container spinning up and failing due to unavailable log store. Fix: Check if log store stanza exists before adding fluentd init containers to spec. Result: ClusterLogging CR with collection only stanza working as expexted. Fluentd pods are started with empty configuration.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-10-27 16:08:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1850076

Description Chad Scribner 2020-06-19 19:04:52 UTC

Description of problem:
The customer desires the ability to deploy fluentd standalone using the CLO.  However, this requires that additional pieces be defined even if they're not necessary.


Version-Release number of selected component (if applicable):
Openshift 4.4

How reproducible:
Always

Steps to Reproduce:
1. Install CLO
2. Use the following spec for the ClusterLogging object
```
spec:
  collection:
    logs:
      fluentd: {}
      type: fluentd
```

Actual results:
No fluentd pods are started

Expected results:
fluentd pods would be started


Additional info:
I was able to get this to work using the following spec

```
spec:
  collection:
    logs:
      fluentd:
        resources:
          limits:
            memory: 736Mi
          requests:
            cpu: 100m
            memory: 736Mi
      type: fluentd
  logStore:
    elasticsearch:
      nodeCount: 0
    type: elasticsearch
  managementState: Managed
  visualization:
    kibana:
      replicas: 0
    type: kibana
```

Comment 2 Jeff Cantrill 2020-06-19 19:11:53 UTC

Bumping priority to urgent as this is desired for logforwarding

Comment 5 Periklis Tsirakidis 2020-06-22 11:10:37 UTC

@jcantrill & @ewolinetz

The fix for 4.6 and 4.5 is the attached PR. Basically it was working until we introduced the fluentd init container. For later we need to check also that a logstore of type elasticsearch is present in the ClusterLogging CR spec. However, 4.4 and 4.3 seem to be a different story, because the underlying fix for the crash looping fluentd pods is [1], where we transitioned CLBO to a Condition CollectorDeadEnd in ClusterLogging's status field. Do you think we should backport this to 4.4 and 4.3 although it is a LF TP minor improvement?

@Chad Scribner
Any prospect to update to 4.5? Besides, can you elaborate how you are going to configure the underlying fluentd? Do you intend to use Unmanaged mode?


[1] https://github.com/openshift/cluster-logging-operator/pull/422

Comment 7 Chad Scribner 2020-06-23 13:21:37 UTC

(In reply to Periklis Tsirakidis from comment #5)
> @jcantrill & @ewolinetz
> 
> The fix for 4.6 and 4.5 is the attached PR. Basically it was working until
> we introduced the fluentd init container. For later we need to check also
> that a logstore of type elasticsearch is present in the ClusterLogging CR
> spec. However, 4.4 and 4.3 seem to be a different story, because the
> underlying fix for the crash looping fluentd pods is [1], where we
> transitioned CLBO to a Condition CollectorDeadEnd in ClusterLogging's status
> field. Do you think we should backport this to 4.4 and 4.3 although it is a
> LF TP minor improvement?
> 
> @Chad Scribner
> Any prospect to update to 4.5? Besides, can you elaborate how you are going
> to configure the underlying fluentd? Do you intend to use Unmanaged mode?
> 
> 
> [1] https://github.com/openshift/cluster-logging-operator/pull/422

It'll be managed mode using secure forward to an external (customer managed) fluentd instance.  The customer only needs a fix for this in OCP 4.5 and later.

Comment 8 Periklis Tsirakidis 2020-06-23 13:49:33 UTC

Alright if we want it only in 4.5 and later, we can backport my fix. 4.6 needs to wait until we merge our LF GA.

Comment 13 Periklis Tsirakidis 2020-06-25 15:29:04 UTC

@jcantrill

This is sufficiently working on 4.6 after merging LF GA. I will send it to QA to enable backporting.

Comment 14 Anping Li 2020-07-01 14:00:20 UTC

Verified on master branch

Comment 19 errata-xmlrpc 2020-10-27 16:08:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196