Bug 1519679

Summary:	logging-fluentd not using output-ops-extra-localfile.conf after update from v3.6.173.0.21 to v3.6.173.0.49.
Product:	OpenShift Container Platform	Reporter:	Jatan Malde <jmalde>
Component:	Logging	Assignee:	Noriko Hosoi <nhosoi>
Status:	CLOSED ERRATA	QA Contact:	Anping Li <anli>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.6.1	CC:	aos-bugs, nhosoi, rmeggins, ronny.pettersen, rromerom
Target Milestone:	---
Target Release:	3.6.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: There was a logic error in the fluentd startup script and when an ops cluster was first disabled then enabled, the proper ops configuration file was not enabled. Consequence: Sub configuration files starting with output-ops-extra- did not have a chance to be called from the ops configuration file. Fix: The logic error was fixed. Result: When an ops cluster is first disabled then enabled, the proper ops configuration file is enabled and its sub configuration files are also enabled.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-01-23 17:58:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jatan Malde 2017-12-01 07:32:22 UTC

Description of problem:

logging-fluentd not using output-ops-extra-localfile.conf after update from v3.6.173.0.21 to v3.6.173.0.49. The logs where not written to the file inside the fluentd pod.

Version-Release number of selected component (if applicable):

-OCP v3.6
-RHEL 7.4

How reproducible:


Steps to Reproduce:
1. Depoly the logging project using ansible playbook files.
2. Initially do not use the variables 'openshift_logging_es_ops_host=logging-es-ops','openshift_logging_use_ops=true' in the inventory file
3. Once deployed check the environment variable value ES_HOST and OPS_HOST, both has the same value.

Actual results:

Due to same value the fluentd-ops file is not getting created inside the fluentd pod.

$ oc rsh logging-fluentd-jvt5h
sh-4.2# ls -ltr /var/fluentd-out/
total 3584
-rw-r--r--. 1 root root 1710120 Nov 29 13:25 fluentd.20171129.b55f1e3152b0f0b44

Expected results:

Both the files must get created inside the fluentd pod.

sh-4.2# ls -l /var/fluentd-out/
total 1792
-rw-r--r--.  1 root root 957428 Nov 29 03:43 fluentd-ops.20171129.b55f1b0aabe17a1c6
-rw-r--r--.  1 root root 235110 Nov 29 03:43 fluentd.20171129.b55f1b0aacb5bda36


Additional info:

The configmap file is attached.
- apiVersion: v1
  data:
    fluent.conf: |
      # This file is the fluentd configuration entrypoint. Edit with care.

      @include configs.d/openshift/system.conf

      # In each section below, pre- and post- includes don't include anything initially;
      # they exist to enable future additions to openshift conf as needed.

      ## sources
      ## ordered so that syslog always runs last...
      @include configs.d/openshift/input-pre-*.conf
      @include configs.d/dynamic/input-docker-*.conf
      @include configs.d/dynamic/input-syslog-*.conf
      @include configs.d/openshift/input-post-*.conf
      ##

      <label @INGRESS>
      ## filters
        @include configs.d/openshift/filter-pre-*.conf
        @include configs.d/openshift/filter-retag-journal.conf
        @include configs.d/openshift/filter-k8s-meta.conf
        @include configs.d/openshift/filter-kibana-transform.conf
        @include configs.d/openshift/filter-k8s-flatten-hash.conf
        @include configs.d/openshift/filter-k8s-record-transform.conf
        @include configs.d/openshift/filter-syslog-record-transform.conf
        @include configs.d/openshift/filter-viaq-data-model.conf
        @include configs.d/openshift/filter-post-*.conf
      ##
      </label>

      <label @OUTPUT>
      ## matches
        @include configs.d/openshift/output-pre-*.conf
        @include configs.d/openshift/output-operations.conf
        @include configs.d/openshift/output-applications.conf
        # no post - applications.conf matches everything left
      ##
      </label>
    output-extra-localfile.conf: |
      <store>
        @type file
        path /var/fluentd-out/fluentd
        format json
        time_slice_format %Y%m%d
        time_slice_wait 1m
        buffer_chunk_limit 256m
        time_format %Y%m%dT%H:%M:%S%z
        compress gzip
        utc
      </store>
    output-ops-extra-localfile.conf: |
      <store>
        @type file
        path /var/fluentd-out/fluentd-ops
        format json
        time_slice_format %Y%m%d
        time_slice_wait 1m
        buffer_chunk_limit 256m
        time_format %Y%m%dT%H:%M:%S%z
        compress gzip
        utc
      </store>
    secure-forward.conf: |
      # @type secure_forward

      # self_hostname ${HOSTNAME}
      # shared_key <SECRET_STRING>

      # secure yes
      # enable_strict_verification yes

      # ca_cert_path /etc/fluent/keys/your_ca_cert
      # ca_private_key_path /etc/fluent/keys/your_private_key
        # for private CA secret key
      # ca_private_key_passphrase passphrase

      # <server>
        # or IP
      #   host server.fqdn.example.com
      #   port 24284
      # </server>
      # <server>
        # ip address to connect
      #   host xxx.xx.xx.x
        # specify hostlabel for FQDN verification if ipaddress is used for host
      #   hostlabel server.fqdn.example.com
      # </server>
    throttle-config.yaml: |
      # Logging example fluentd throttling config file

      #example-project:
      #  read_lines_limit: 10
      #
      #.operations:
      #  read_lines_limit: 100
  kind: ConfigMap
  metadata:
    creationTimestamp: null
    name: logging-fluentd
kind: List
metadata: {}

Comment 2 Ruben Romero Montes 2017-12-01 08:58:30 UTC

The problem lays in the fact that once the daemonset/logging-fluent exists it is not updated or replaced (not even the env variables) as seen here:
  https://github.com/openshift/openshift-ansible/blob/release-3.6/roles/openshift_logging_fluentd/tasks/main.yaml#L154-L186

Therefore if the aggregated logging is deployed without OPS cluster and later on with the OPS cluster (i.e. `openshift_logging_use_ops=true`) the OPS_HOST env variable will remain with value `logging-es`. 

That will cause the fluentd start script to consider OPS is not deployed using the filter-post-z-retag-one.conf instead of the filter-post-z-retag-two.conf

The consequence is that all logs (ops and non-ops) will go to the non-ops outputs, ignoring the ops ones.

Comment 5 Anping Li 2017-12-18 09:16:48 UTC

verified with openshift3/logging-fluentd/images/v3.6.173.0.83-2
After added ops stack
1)The fluentd Environment OPS_HOST=logging-es-ops
2)The filter-post-z-retag-one.conf was replaced with filter-post-z-retag-two.conf

The following code are added to filter system level log to ops es stack.

<match journal.** system.var.log** **_default_** **_openshift_** **_openshift-infra_**>
  @type rewrite_tag_filter
  @label @OUTPUT
  rewriterule1 message .+ output_ops_tag
  rewriterule2 message !.+ output_ops_tag
</match>

3) The kibana can view the projects logs and prior operations logs.
   The kibana-ops can view post operations logs

Comment 6 Noriko Hosoi 2017-12-18 18:10:08 UTC

(In reply to Ruben Romero Montes from comment #2)
> The problem lays in the fact that once the daemonset/logging-fluent exists
> it is not updated or replaced (not even the env variables) as seen here:
>  
> https://github.com/openshift/openshift-ansible/blob/release-3.6/roles/
> openshift_logging_fluentd/tasks/main.yaml#L154-L186
> 
> Therefore if the aggregated logging is deployed without OPS cluster and
> later on with the OPS cluster (i.e. `openshift_logging_use_ops=true`) the
> OPS_HOST env variable will remain with value `logging-es`. 

Hi @Ruben,

I revisited your comment #c2 and am worried that the customer's case may not be addressed by the PR #774.

The customer's system is configured with OPS, but the both application logs and the system logs are both sent to the same Elasticsearch logging-es.  Right?

Now I wonder 1) deploying with no ops, then 2) redeploying with ops by ansible having `openshift_logging_use_ops=true`, but OPS_HOST value remains `logging-es` is the problem?

The customer expects it's updated to `logging-ops-es`, but it did not happen?
Thanks.

Comment 7 Noriko Hosoi 2017-12-19 01:41:12 UTC

(In reply to Anping Li from comment #5)
Thank you Anping, for the verification.  I'd assume the behaviour is acceptable for the customer.

Comment 16 errata-xmlrpc 2018-01-23 17:58:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0113