Bug 1761930 - Diagnosticsl fails when checking curator status
Summary: Diagnosticsl fails when checking curator status
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard: groom
: 1736825 1776778 1801613 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-15 15:51 UTC by hgomes
Modified: 2024-01-06 04:26 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-20 00:12:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Diagnostics output (3.01 KB, text/plain)
2019-10-15 15:51 UTC, hgomes
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 24462 0 None closed Bug 1761930: Fix ClusterLogging curator diagnostic check 2020-08-25 11:36:29 UTC
Red Hat Knowledge Base (Solution) 4919681 0 None None None 2020-03-22 11:37:41 UTC
Red Hat Product Errata RHBA-2020:0793 0 None None None 2020-03-20 00:12:54 UTC

Description hgomes 2019-10-15 15:51:20 UTC
Created attachment 1626027 [details]
Diagnostics output

This bug was initially created as a copy of Bug #1676720

I am copying this bug because: 

Description of problem:

The Diagnostics fails when checking the status of the logging curator. This appears to be due to the fact that the curator is now controlled with a cronjob in 3.11 and as a result pods may not be running during the time of the health check.

How reproducible: Consistently

Steps to Reproduce:

# oc adm diagnostics AggregatedLogging  on a 3.11.146 cluster.

Actual results:

Curator health check fails with error message even though no pod is currently scheduled to run.

Expected results:

Curator health check would pass based on the newer cronjob implementation.


---
[root@master01 ~]#  rpm -qa | grep openshift
atomic-openshift-node-3.11.146-1.git.0.4aab273.el7.x86_64
atomic-openshift-docker-excluder-3.11.146-1.git.0.4aab273.el7.noarch
atomic-openshift-hyperkube-3.11.146-1.git.0.4aab273.el7.x86_64
atomic-openshift-clients-3.11.146-1.git.0.4aab273.el7.x86_64
atomic-openshift-excluder-3.11.146-1.git.0.4aab273.el7.noarch
atomic-openshift-3.11.146-1.git.0.4aab273.el7.x86_64
[root@master01 ~]# rpm -qa | grep ansible
ansible-2.6.19-1.el7ae.noarch
[root@master01 ~]# 
---

Comment 1 Jeff Cantrill 2019-10-15 18:05:40 UTC
This is infact not the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1676720 as indicated by the output of the diagnostics.  Please provide a snapshot of the environment: https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/hack/logging-dump.sh

Comment 4 Jeff Cantrill 2020-01-31 16:08:54 UTC
*** Bug 1776778 has been marked as a duplicate of this bug. ***

Comment 5 Jeff Cantrill 2020-01-31 17:09:17 UTC
*** Bug 1736825 has been marked as a duplicate of this bug. ***

Comment 7 Jeff Cantrill 2020-02-11 13:46:06 UTC
*** Bug 1801613 has been marked as a duplicate of this bug. ***

Comment 11 Anping Li 2020-03-12 14:16:12 UTC
diagnostics pass although the last cronjob failed.

# oc adm diagnostics AggregatedLogging
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'
Info:  Using context for cluster-admin access: 'openshift-logging/ip-172-18-13-190-ec2-internal:8443/system:admin'

[Note] Running diagnostic: AggregatedLogging
       Description: Check aggregated logging integration for proper configuration
       
Info:  Did not find a DeploymentConfig to support optional component 'mux'. If you require
       this component, please re-install or update logging and specify the appropriate
       variable to enable it.
       
Info:  Looked for 'logging-mux' among the logging services for the project but did not find it.
       This optional component may not have been specified by logging install options.
       
ERROR: [AGL0147 from diagnostic AggregatedLogging@openshift/origin/pkg/oc/cli/admin/diagnostics/diagnostics/cluster/aggregated_logging/diagnostic.go:138]
       OauthClient 'kibana-proxy' does not include a redirectURI for route 'logging-es' which is 'es.apps.0312-2ns.qe.rhcloud.com'
       
[Note] Summary of diagnostics execution (version v3.11.187):
[Note] Errors seen: 1


[root@ip-172-18-13-190 ~]# oc get pods
NAME                                          READY     STATUS      RESTARTS   AGE
logging-curator-1584021720-2mw58              0/1       Error       0          11m
logging-curator-ops-1584021600-nztbh          0/1       Completed   0          13m
logging-es-data-master-kdnkz1v2-4-s5v67       2/2       Running     0          4m
logging-es-ops-data-master-ixzlwxsq-2-qlgph   2/2       Running     0          11m
logging-fluentd-4t7ph                         1/1       Running     0          12m
logging-fluentd-6n9ss                         1/1       Running     0          12m
logging-fluentd-9pcnw                         1/1       Running     0          12m
logging-fluentd-msp7m                         1/1       Running     0          12m
logging-fluentd-tfxvw                         1/1       Running     0          12m
logging-kibana-1-kkvtr                        2/2       Running     0          2h
logging-kibana-ops-1-s4dm9                    2/2       Running     0          36m
rsyslogserver-6648c55975-vdrlt                1/1       Running     0          44m

Comment 12 Jeff Cantrill 2020-03-12 18:46:24 UTC
(In reply to Anping Li from comment #11)
> diagnostics pass although the last cronjob failed.
> 

Diagnostics really only checks the topology of your cluster logging is correct, not necessarily that everything is functional, though maybe some rudimentary checks for certs.  Curator specifically its only checking the existence of the cronjobs

Comment 13 Anping Li 2020-03-13 10:00:17 UTC
Verified v3.11.188 as comment 12

Comment 15 errata-xmlrpc 2020-03-20 00:12:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0793

Comment 16 hgomes 2020-03-22 11:39:07 UTC
I did not find anything similar, so I created and linked this Bug into KCS https://access.redhat.com/solutions/4919681 .

Comment 19 Red Hat Bugzilla 2024-01-06 04:26:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.