Bug 1761930

Summary: Diagnosticsl fails when checking curator status
Product: OpenShift Container Platform Reporter: hgomes
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, apurty, arghosh, cvogel, dahernan, fmarting, jcantril, lmartinh, rmeggins
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: groom
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-20 00:12:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Diagnostics output none

Description hgomes 2019-10-15 15:51:20 UTC
Created attachment 1626027 [details]
Diagnostics output

This bug was initially created as a copy of Bug #1676720

I am copying this bug because: 

Description of problem:

The Diagnostics fails when checking the status of the logging curator. This appears to be due to the fact that the curator is now controlled with a cronjob in 3.11 and as a result pods may not be running during the time of the health check.

How reproducible: Consistently

Steps to Reproduce:

# oc adm diagnostics AggregatedLogging  on a 3.11.146 cluster.

Actual results:

Curator health check fails with error message even though no pod is currently scheduled to run.

Expected results:

Curator health check would pass based on the newer cronjob implementation.


---
[root@master01 ~]#  rpm -qa | grep openshift
atomic-openshift-node-3.11.146-1.git.0.4aab273.el7.x86_64
atomic-openshift-docker-excluder-3.11.146-1.git.0.4aab273.el7.noarch
atomic-openshift-hyperkube-3.11.146-1.git.0.4aab273.el7.x86_64
atomic-openshift-clients-3.11.146-1.git.0.4aab273.el7.x86_64
atomic-openshift-excluder-3.11.146-1.git.0.4aab273.el7.noarch
atomic-openshift-3.11.146-1.git.0.4aab273.el7.x86_64
[root@master01 ~]# rpm -qa | grep ansible
ansible-2.6.19-1.el7ae.noarch
[root@master01 ~]# 
---

Comment 1 Jeff Cantrill 2019-10-15 18:05:40 UTC
This is infact not the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1676720 as indicated by the output of the diagnostics.  Please provide a snapshot of the environment: https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/hack/logging-dump.sh

Comment 4 Jeff Cantrill 2020-01-31 16:08:54 UTC
*** Bug 1776778 has been marked as a duplicate of this bug. ***

Comment 5 Jeff Cantrill 2020-01-31 17:09:17 UTC
*** Bug 1736825 has been marked as a duplicate of this bug. ***

Comment 7 Jeff Cantrill 2020-02-11 13:46:06 UTC
*** Bug 1801613 has been marked as a duplicate of this bug. ***

Comment 11 Anping Li 2020-03-12 14:16:12 UTC
diagnostics pass although the last cronjob failed.

# oc adm diagnostics AggregatedLogging
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'
Info:  Using context for cluster-admin access: 'openshift-logging/ip-172-18-13-190-ec2-internal:8443/system:admin'

[Note] Running diagnostic: AggregatedLogging
       Description: Check aggregated logging integration for proper configuration
       
Info:  Did not find a DeploymentConfig to support optional component 'mux'. If you require
       this component, please re-install or update logging and specify the appropriate
       variable to enable it.
       
Info:  Looked for 'logging-mux' among the logging services for the project but did not find it.
       This optional component may not have been specified by logging install options.
       
ERROR: [AGL0147 from diagnostic AggregatedLogging@openshift/origin/pkg/oc/cli/admin/diagnostics/diagnostics/cluster/aggregated_logging/diagnostic.go:138]
       OauthClient 'kibana-proxy' does not include a redirectURI for route 'logging-es' which is 'es.apps.0312-2ns.qe.rhcloud.com'
       
[Note] Summary of diagnostics execution (version v3.11.187):
[Note] Errors seen: 1


[root@ip-172-18-13-190 ~]# oc get pods
NAME                                          READY     STATUS      RESTARTS   AGE
logging-curator-1584021720-2mw58              0/1       Error       0          11m
logging-curator-ops-1584021600-nztbh          0/1       Completed   0          13m
logging-es-data-master-kdnkz1v2-4-s5v67       2/2       Running     0          4m
logging-es-ops-data-master-ixzlwxsq-2-qlgph   2/2       Running     0          11m
logging-fluentd-4t7ph                         1/1       Running     0          12m
logging-fluentd-6n9ss                         1/1       Running     0          12m
logging-fluentd-9pcnw                         1/1       Running     0          12m
logging-fluentd-msp7m                         1/1       Running     0          12m
logging-fluentd-tfxvw                         1/1       Running     0          12m
logging-kibana-1-kkvtr                        2/2       Running     0          2h
logging-kibana-ops-1-s4dm9                    2/2       Running     0          36m
rsyslogserver-6648c55975-vdrlt                1/1       Running     0          44m

Comment 12 Jeff Cantrill 2020-03-12 18:46:24 UTC
(In reply to Anping Li from comment #11)
> diagnostics pass although the last cronjob failed.
> 

Diagnostics really only checks the topology of your cluster logging is correct, not necessarily that everything is functional, though maybe some rudimentary checks for certs.  Curator specifically its only checking the existence of the cronjobs

Comment 13 Anping Li 2020-03-13 10:00:17 UTC
Verified v3.11.188 as comment 12

Comment 15 errata-xmlrpc 2020-03-20 00:12:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0793

Comment 16 hgomes 2020-03-22 11:39:07 UTC
I did not find anything similar, so I created and linked this Bug into KCS https://access.redhat.com/solutions/4919681 .

Comment 19 Red Hat Bugzilla 2024-01-06 04:26:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days