Created attachment 1626027 [details] Diagnostics output This bug was initially created as a copy of Bug #1676720 I am copying this bug because: Description of problem: The Diagnostics fails when checking the status of the logging curator. This appears to be due to the fact that the curator is now controlled with a cronjob in 3.11 and as a result pods may not be running during the time of the health check. How reproducible: Consistently Steps to Reproduce: # oc adm diagnostics AggregatedLogging on a 3.11.146 cluster. Actual results: Curator health check fails with error message even though no pod is currently scheduled to run. Expected results: Curator health check would pass based on the newer cronjob implementation. --- [root@master01 ~]# rpm -qa | grep openshift atomic-openshift-node-3.11.146-1.git.0.4aab273.el7.x86_64 atomic-openshift-docker-excluder-3.11.146-1.git.0.4aab273.el7.noarch atomic-openshift-hyperkube-3.11.146-1.git.0.4aab273.el7.x86_64 atomic-openshift-clients-3.11.146-1.git.0.4aab273.el7.x86_64 atomic-openshift-excluder-3.11.146-1.git.0.4aab273.el7.noarch atomic-openshift-3.11.146-1.git.0.4aab273.el7.x86_64 [root@master01 ~]# rpm -qa | grep ansible ansible-2.6.19-1.el7ae.noarch [root@master01 ~]# ---
This is infact not the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1676720 as indicated by the output of the diagnostics. Please provide a snapshot of the environment: https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/hack/logging-dump.sh
*** Bug 1776778 has been marked as a duplicate of this bug. ***
*** Bug 1736825 has been marked as a duplicate of this bug. ***
*** Bug 1801613 has been marked as a duplicate of this bug. ***
diagnostics pass although the last cronjob failed. # oc adm diagnostics AggregatedLogging [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' Info: Using context for cluster-admin access: 'openshift-logging/ip-172-18-13-190-ec2-internal:8443/system:admin' [Note] Running diagnostic: AggregatedLogging Description: Check aggregated logging integration for proper configuration Info: Did not find a DeploymentConfig to support optional component 'mux'. If you require this component, please re-install or update logging and specify the appropriate variable to enable it. Info: Looked for 'logging-mux' among the logging services for the project but did not find it. This optional component may not have been specified by logging install options. ERROR: [AGL0147 from diagnostic AggregatedLogging@openshift/origin/pkg/oc/cli/admin/diagnostics/diagnostics/cluster/aggregated_logging/diagnostic.go:138] OauthClient 'kibana-proxy' does not include a redirectURI for route 'logging-es' which is 'es.apps.0312-2ns.qe.rhcloud.com' [Note] Summary of diagnostics execution (version v3.11.187): [Note] Errors seen: 1 [root@ip-172-18-13-190 ~]# oc get pods NAME READY STATUS RESTARTS AGE logging-curator-1584021720-2mw58 0/1 Error 0 11m logging-curator-ops-1584021600-nztbh 0/1 Completed 0 13m logging-es-data-master-kdnkz1v2-4-s5v67 2/2 Running 0 4m logging-es-ops-data-master-ixzlwxsq-2-qlgph 2/2 Running 0 11m logging-fluentd-4t7ph 1/1 Running 0 12m logging-fluentd-6n9ss 1/1 Running 0 12m logging-fluentd-9pcnw 1/1 Running 0 12m logging-fluentd-msp7m 1/1 Running 0 12m logging-fluentd-tfxvw 1/1 Running 0 12m logging-kibana-1-kkvtr 2/2 Running 0 2h logging-kibana-ops-1-s4dm9 2/2 Running 0 36m rsyslogserver-6648c55975-vdrlt 1/1 Running 0 44m
(In reply to Anping Li from comment #11) > diagnostics pass although the last cronjob failed. > Diagnostics really only checks the topology of your cluster logging is correct, not necessarily that everything is functional, though maybe some rudimentary checks for certs. Curator specifically its only checking the existence of the cronjobs
Verified v3.11.188 as comment 12
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0793
I did not find anything similar, so I created and linked this Bug into KCS https://access.redhat.com/solutions/4919681 .
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days