Bug 1911477

Summary: Using legacy Log Forwarding is not sending logs to the internal Elasticsearch
Product: OpenShift Container Platform Reporter: Oscar Casal Sanchez <ocasalsa>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: medium    
Version: 4.5CC: aos-bugs, dkulkarn, fan-wxa, jcantril, jniu, mfuruta, mhatanak, naygupta, rh-container, sasagarw
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-core
Fixed In Version: Doc Type: Bug Fix
Doc Text:
* Previously, if you enabled legacy log forwarding, logs were not sent to managed storage. This issue occurred because the generated log forwarding configuration improperly chose between either log forwarding or legacy log forwarding. The current release fixes this issue. If the `ClusterLogging` CR defines a `logstore`, logs are sent to managed storage. Additionally, if legacy log forwarding is enabled, logs are sent to legacy log forwarding regardless of whether managed storage is enabled. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1911477[*1911477*])
Story Points: ---
Clone Of:
: 1921263 (view as bug list) Environment:
Last Closed: 2021-03-25 12:31:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1928949    
Bug Blocks:    

Description Oscar Casal Sanchez 2020-12-29 16:42:21 UTC
[Description of problem]
In previous version, for example 3.x and until 4.3. When using secure_forward following the documentation [1] the logs were sent to the internal Elasticsearch and to the external instance.

Now, in 4.5 it's only sending to the external instance and it doesn't send more to the internal Elasticsearch.

Verifying the documentation, it has not changed [2]. Then, it's expected that it works like it did in the past:

- Sending logs to the internal Elasticsearch
- Sending logs to the external instance configured in the secure-forward configmap

One thing has changed in the configuration generated for fluentd. In 4.3 the fluentd configuration after configuring secure_forward following the documentation is like this:

~~~
$ oc rsh <fluentd pod> cat /etc/fluent/fluent.conf
...
<label @_LOGS_APP>
        <match **>
        @type copy

                <store>
                @type relabel
                @label @CLO_DEFAULT_APP_PIPELINE
        </store>

                <store>
                @type relabel
                @label @_LEGACY_SECUREFORWARD
        </store>

</match>
<label @_LOGS_INFRA>
        <match **>
        @type copy

                <store>
                @type relabel
                @label @CLO_DEFAULT_INFRA_PIPELINE
        </store>

                <store>
                @type relabel
                @label @_LEGACY_SECUREFORWARD
        </store>

</match>
</label>

# Relabel specific pipelines to multiple, outputs (e.g. ES, kafka stores)

<label @CLO_DEFAULT_APP_PIPELINE>
        <match **>
        @type copy

                <store>
                @type relabel
                @label @CLO_DEFAULT_OUTPUT_ES
        </store>
</match>
</label>

<label @CLO_DEFAULT_INFRA_PIPELINE>
        <match **>
        @type copy

                <store>
                @type relabel
                @label @CLO_DEFAULT_OUTPUT_ES
        </store>
</match>
</label>
...
~~~

As we can see above, it's sending to the CLO_DEFAULT and to the LEGACY_SECUREFORWARD, but, the configuration in OCP 4.5 generated after configuring the secure forward is like this:

~~~
$ oc rsh <fluentd pod> cat /etc/fluent/fluent.conf
...
<label @_LOGS_APP>
  <match **>
    @type copy


    <store>
      @type relabel
      @label @_LEGACY_SECUREFORWARD
    </store>

  </match>
</label>
<label @_LOGS_AUDIT>
  <match **>
    @type copy


    <store>
      @type relabel
      @label @_LEGACY_SECUREFORWARD
    </store>

  </match>
</label>
<label @_LOGS_INFRA>
  <match **>
    @type copy


    <store>
      @type relabel
      @label @_LEGACY_SECUREFORWARD
    </store>

  </match>
</label>
...
~~~

As we can see, it's only relabeling like "_LEGACY_SECUREFORWARD", but it's not possible to see the relabeling to CLO_DEFAULT_XXX.


[Version-Release number of selected component (if applicable):]

Version used for OCP 4.5
~~~
$ oc version
Client Version: 4.5.23
Server Version: 4.5.23
$ oc get csv -n openshift-logging
NAME                                           DISPLAY                  VERSION                 REPLACES   PHASE
clusterlogging.4.5.0-202012120433.p0           Cluster Logging          4.5.0-202012120433.p0              Failed
elasticsearch-operator.4.5.0-202012120433.p0   Elasticsearch Operator   4.5.0-202012120433.p0              Succeeded
~~~

Version used for OCP 4.3:
~~~
$ oc version
Client Version: 4.3.38
Server Version: 4.3.40
Kubernetes Version: v1.16.2+853223d
$ oc get csv -n openshift-logging
NAME                                            DISPLAY                  VERSION                  REPLACES   PHASE
clusterlogging.4.3.40-202010141211.p0           Cluster Logging          4.3.40-202010141211.p0              Succeeded
elasticsearch-operator.4.3.40-202010141211.p0   Elasticsearch Operator   4.3.40-202010141211.p0              Succeeded
~~~

[How reproducible]
Always


Steps to Reproduce:
1. Install Cluster Logging
2. Configure secure_forward
3. Logs are not sent to the internal Elasticsearch

[Actual results]
Logs are not sent to the internal Elasticsearch

[Expected results]
Logs should be sent to the internal Elasticsearch at the same time that to the external instance configured in the secure-forward configmap


We are aware that this is deprecated, but in 4.3 the documentation is saying the same that in 4.4 and 4.5 and it was working in 4.3 and previous versions, the same that in 3.x. Then, it's expected that it continues working and the logs are sent in parallel to the internal Elasticsearch, to the external instance configured.


[1] https://docs.openshift.com/container-platform/4.3/logging/config/cluster-logging-external.html
[2] https://docs.openshift.com/container-platform/4.3/logging/config/cluster-logging-external.html#cluster-logging-collector-fluentd_cluster-logging-external

Comment 6 weiguo fan 2021-01-28 04:09:54 UTC
Hi, Team,

We verified that we can workaround the issue with the following steps.
Cloud Red Hat support this as an official workaround until the problem is fixed.

Step1: Set the clusterlogging's spec.managementState to "Unmanaged".

       $ oc patch clusterlogging instance -n openshift-logging --type='json' -p='[{"op": "replace", "path": "/spec/managementState", "value":"Unmanaged"}]'

Step2: Edit the fleutnd comfigmap as the following.

~~~~~~~~~~~~~~~~~~~~~~~~~~
$ oc edit configmap fleutnd -n openshift-logging
....
    <label @_LOGS_APP>
      <match **>
        @type copy
        <store>                             <====== Add those lines
          @type relabel                     <======
          @label @CLO_DEFAULT_APP_PIPELINE  <======
        </store>                            <======
        <store>
          @type relabel
          @label @_LEGACY_SYSLOG
        </store>
      </match>
    </label>
    <label @_LOGS_AUDIT>
      <match **>
        @type copy

        <store>
          @type relabel
          @label @_LEGACY_SYSLOG
        </store>
      </match>
    </label>
    <label @_LOGS_INFRA>
      <match **>
        @type copy
        <store>                              <====== Add those lines
          @type relabel                      <======
          @label @CLO_DEFAULT_INFRA_PIPELINE <======
        </store>                             <======
        <store>
          @type relabel
          @label @_LEGACY_SYSLOG
        </store>
      </match>
    </label>
...
~~~~~~~~~~~~~~~~~~~~~~~~~

Step3: Delete all existing fluentd Pods to restart fluentd.

       $ oc delete pods -n openshift-logging -l component=fluentd

Comment 7 Masaki Furuta ( RH ) 2021-01-28 07:38:11 UTC
(Reply from comment # 6 to weiguo fans)
Adding need info, as per NEC's double check to the engineering team to see if NEC's workaround is suitable from POV of the RH engineering team.

/Masaki

Comment 10 Jeff Cantrill 2021-01-29 14:44:46 UTC
(In reply to Masaki Furuta from comment #7)
> (Reply from comment # 6 to weiguo fans)
> Adding need info, as per NEC's double check to the engineering team to see
> if NEC's workaround is suitable from POV of the RH engineering team.

Yes.  This is exactly the same change as referenced in the associated fix.

Comment 18 Anping Li 2021-03-16 05:48:42 UTC
Verified on
clusterserviceversion.operators.coreos.com/clusterlogging.4.5.0-202103150243.p0
clusterserviceversion.operators.coreos.com/elasticsearch-operator.4.5.0-202103150243.p0

Comment 20 errata-xmlrpc 2021-03-25 12:31:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.36 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0842