Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1848186

Summary: Web console generates bad YAML for default clusterlogging CR - results in bad retentionPolicy configuration where infra and app indices never get created.
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: LoggingAssignee: Periklis Tsirakidis <periklis>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: anli, aos-bugs, periklis, qitang
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1864284 (view as bug list) Environment:
Last Closed: 2020-10-27 16:08:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1864284    

Description Mike Fiedler 2020-06-17 21:24:09 UTC
Description of problem:

ELO: 4.5.0-202006161654  and CLO: 4.5.0-202006161654

On the latest ART builds the app-* and infra-* indices are never being created.   Only .security and .kibana:

# oc exec -n openshift-logging -c elasticsearch $POD -- curl --connect-timeout 2 -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices?v                                                               
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size                                                                                                                                                                                                                      
green  open   .security HAp4A55hQluOFGSDcZicQA   1   1          5            0     59.2kb         29.6kb                                                                                                                                                                                                                      
green  open   .kibana_1 ngZnLKi0TDif_KmKgpOPTA   1   1          0            0       460b           230b  

Operations and app pod logging is occurring and the fluentd logs look reasonably normal (initial connect failures while ES starts followed by the retry succeeded message) but the indices are never created.

fluentd log excerpt (full logs in must-gather):

  2020-06-17 21:07:26 +0000 [warn]: suppressed same stacktrace
2020-06-17 21:07:43 +0000 [warn]: [clo_default_output_es] failed to flush the buffer. retry_time=6 next_retry_seconds=2020-06-17 21:08:13 +0000 chunk="5a84e038f011085568ca17f6789532da" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:hos
t=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): Connection refused - connect(2) for 172.30.233.113:9200 (Errno::ECONNREFUSED)"
  2020-06-17 21:07:43 +0000 [warn]: suppressed same stacktrace
2020-06-17 21:07:49 +0000 [warn]: [clo_default_output_es] failed to flush the buffer. retry_time=7 next_retry_seconds=2020-06-17 21:08:57 +0000 chunk="5a84e038d9b8856f22109530caf228b2" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:hos
t=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): Connection refused - connect(2) for 172.30.233.113:9200 (Errno::ECONNREFUSED)"
  2020-06-17 21:07:49 +0000 [warn]: suppressed same stacktrace
2020-06-17 21:08:57 +0000 [warn]: [clo_default_output_es] retry succeeded. chunk_id="5a84e038d9b8856f22109530caf228b2"



Version-Release number of selected component (if applicable):
ELO: 4.5.0-202006161654  and CLO: 4.5.0-202006161654 on latest 4.5 nightly

How reproducible: Always so far


Steps to Reproduce:
1. Install latest 4.5 nightly on AWS cluster.   Install ELO and CLO at specified versions
2. Create clusterlogging with yaml below

spec:
  collection:
    logs:
      type: fluentd
  curation:
    curator:
      schedule: 30 3 * * *
    type: curator
  logStore:
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: SingleRedundancy
      resources:
        requests:
          cpu: 28
          memory: 61Gi
      storage:
        size: 400G
        storageClassName: gp2
    retentionPolicy:
      logs.app:
        maxAge: 1d
    type: elasticsearch
  managementState: Managed
  visualization:
    kibana:
      replicas: 1
    type: kibana


3. Start app logging traffic

Actual results:

ES cluster and all fluentd pods start and come Ready but only the .kibana and .security indices are created.



Additional info:

Will add link to must-gather

Comment 2 Qiaoling Tang 2020-06-18 00:56:30 UTC
Hi @Mike, I didn't hit this issue.

I noticed in your clusterlogging yaml file, it has:
    retentionPolicy:
      logs.app:
        maxAge: 1d

I'm afraid the format is not correct, could you please try again with the below format?
    retentionPolicy: 
      application:
        maxAge: 1d

Besides, if you only specify retentionPolicy for app logs, then only app logs will be received, details are in  https://bugzilla.redhat.com/show_bug.cgi?id=1845788#c0 and https://bugzilla.redhat.com/show_bug.cgi?id=1845788#c4 .

Comment 3 Mike Fiedler 2020-06-18 01:26:19 UTC
The default yaml generated in the console turned out to be the issue (as hinted at by Qiaoling in comment 2).  Changing this bz to reflect the root issue - let me know if you prefer a new bz.  I think this is mustfix for 4.5 GA

After installing CLO and ELO, going to Installed Operators -> Cluster Logging-> YAML view presents the user with this ClusterLogging yaml:

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  namespace: openshift-logging
  name: instance
  labels: {}
spec:
  collection:
    logs:
      type: fluentd
  curation:
    curator:
      schedule: 30 3 * * *
    type: curator
  logStore:
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: SingleRedundancy
      storage:
        size: 200G
        storageClassName: gp2
    retentionPolicy:
      logs.app:
        maxAge: 7d
    type: elasticsearch
  managementState: Managed
  visualization:
    kibana:
      replicas: 1
    type: kibana

The retentionPolicy is incorrect (API change since 4.4?  upgrade issue?) and results in a logging config where the infra-* and app-* indices never get created.   See  https://bugzilla.redhat.com/show_bug.cgi?id=1845788#c0 and https://bugzilla.redhat.com/show_bug.cgi?id=1845788#c4.

Changing the retentionPolicy to the one below allows logging to work correctly.


    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 3h
      audit:
        maxAge: 2w

Comment 6 Anping Li 2020-06-24 02:15:05 UTC
Test in 4.6. the spec created from console.
#oc get clusterlogging instance  -o json | jq '.spec'
{
  "collection": {
    "logs": {
      "type": "fluentd"
    }
  },
  "curation": {
    "curator": {
      "schedule": "30 3 * * *"
    },
    "type": "curator"
  },
  "logStore": {
    "elasticsearch": {
      "nodeCount": 3,
      "redundancyPolicy": "SingleRedundancy",
      "storage": {
        "size": "200G",
        "storageClassName": "gp2"
      }
    },
    "retentionPolicy": {
      "application": {
        "maxAge": "7d"
      }
    },
    "type": "elasticsearch"
  },
  "managementState": "Managed",
  "visualization": {
    "kibana": {
      "replicas": 1
    },
    "type": "kibana"
  }
}

Comment 8 errata-xmlrpc 2020-10-27 16:08:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196