Description of problem: ELO: 4.5.0-202006161654 and CLO: 4.5.0-202006161654 On the latest ART builds the app-* and infra-* indices are never being created. Only .security and .kibana: # oc exec -n openshift-logging -c elasticsearch $POD -- curl --connect-timeout 2 -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .security HAp4A55hQluOFGSDcZicQA 1 1 5 0 59.2kb 29.6kb green open .kibana_1 ngZnLKi0TDif_KmKgpOPTA 1 1 0 0 460b 230b Operations and app pod logging is occurring and the fluentd logs look reasonably normal (initial connect failures while ES starts followed by the retry succeeded message) but the indices are never created. fluentd log excerpt (full logs in must-gather): 2020-06-17 21:07:26 +0000 [warn]: suppressed same stacktrace 2020-06-17 21:07:43 +0000 [warn]: [clo_default_output_es] failed to flush the buffer. retry_time=6 next_retry_seconds=2020-06-17 21:08:13 +0000 chunk="5a84e038f011085568ca17f6789532da" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:hos t=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): Connection refused - connect(2) for 172.30.233.113:9200 (Errno::ECONNREFUSED)" 2020-06-17 21:07:43 +0000 [warn]: suppressed same stacktrace 2020-06-17 21:07:49 +0000 [warn]: [clo_default_output_es] failed to flush the buffer. retry_time=7 next_retry_seconds=2020-06-17 21:08:57 +0000 chunk="5a84e038d9b8856f22109530caf228b2" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:hos t=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): Connection refused - connect(2) for 172.30.233.113:9200 (Errno::ECONNREFUSED)" 2020-06-17 21:07:49 +0000 [warn]: suppressed same stacktrace 2020-06-17 21:08:57 +0000 [warn]: [clo_default_output_es] retry succeeded. chunk_id="5a84e038d9b8856f22109530caf228b2" Version-Release number of selected component (if applicable): ELO: 4.5.0-202006161654 and CLO: 4.5.0-202006161654 on latest 4.5 nightly How reproducible: Always so far Steps to Reproduce: 1. Install latest 4.5 nightly on AWS cluster. Install ELO and CLO at specified versions 2. Create clusterlogging with yaml below spec: collection: logs: type: fluentd curation: curator: schedule: 30 3 * * * type: curator logStore: elasticsearch: nodeCount: 3 redundancyPolicy: SingleRedundancy resources: requests: cpu: 28 memory: 61Gi storage: size: 400G storageClassName: gp2 retentionPolicy: logs.app: maxAge: 1d type: elasticsearch managementState: Managed visualization: kibana: replicas: 1 type: kibana 3. Start app logging traffic Actual results: ES cluster and all fluentd pods start and come Ready but only the .kibana and .security indices are created. Additional info: Will add link to must-gather
Hi @Mike, I didn't hit this issue. I noticed in your clusterlogging yaml file, it has: retentionPolicy: logs.app: maxAge: 1d I'm afraid the format is not correct, could you please try again with the below format? retentionPolicy: application: maxAge: 1d Besides, if you only specify retentionPolicy for app logs, then only app logs will be received, details are in https://bugzilla.redhat.com/show_bug.cgi?id=1845788#c0 and https://bugzilla.redhat.com/show_bug.cgi?id=1845788#c4 .
The default yaml generated in the console turned out to be the issue (as hinted at by Qiaoling in comment 2). Changing this bz to reflect the root issue - let me know if you prefer a new bz. I think this is mustfix for 4.5 GA After installing CLO and ELO, going to Installed Operators -> Cluster Logging-> YAML view presents the user with this ClusterLogging yaml: apiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: namespace: openshift-logging name: instance labels: {} spec: collection: logs: type: fluentd curation: curator: schedule: 30 3 * * * type: curator logStore: elasticsearch: nodeCount: 3 redundancyPolicy: SingleRedundancy storage: size: 200G storageClassName: gp2 retentionPolicy: logs.app: maxAge: 7d type: elasticsearch managementState: Managed visualization: kibana: replicas: 1 type: kibana The retentionPolicy is incorrect (API change since 4.4? upgrade issue?) and results in a logging config where the infra-* and app-* indices never get created. See https://bugzilla.redhat.com/show_bug.cgi?id=1845788#c0 and https://bugzilla.redhat.com/show_bug.cgi?id=1845788#c4. Changing the retentionPolicy to the one below allows logging to work correctly. retentionPolicy: application: maxAge: 1d infra: maxAge: 3h audit: maxAge: 2w
Test in 4.6. the spec created from console. #oc get clusterlogging instance -o json | jq '.spec' { "collection": { "logs": { "type": "fluentd" } }, "curation": { "curator": { "schedule": "30 3 * * *" }, "type": "curator" }, "logStore": { "elasticsearch": { "nodeCount": 3, "redundancyPolicy": "SingleRedundancy", "storage": { "size": "200G", "storageClassName": "gp2" } }, "retentionPolicy": { "application": { "maxAge": "7d" } }, "type": "elasticsearch" }, "managementState": "Managed", "visualization": { "kibana": { "replicas": 1 }, "type": "kibana" } }
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196