Bug 1394716

Summary: [Logging diagnostics]Error shows "no Pods found for DeploymentConfig 'logging-curator-ops'" after deployed with enabled ops cluster.
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.4.0CC: aos-bugs, jcantril, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: curator-ops was not in the list of dcs to investigate Consequence: Diagnostic Tool does not properly evaluate curator for an ops logging cluster Fix: Add curator-ops to the list of dcs to investigate Result: Diagnostic tool now properly evaluates curator for an ops logging cluster
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 19:16:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2016-11-14 10:02:24 UTC
Description of problem:
Diagnostics logging on a healthy openshift env with ops cluster enabled, a fake error about the absence of curator-ops pod was reported. This issue did not repro if logging is deployed without ops cluster.

Version-Release number of selected component (if applicable):
openshift v3.4.0.25+1f36858
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

logging images from ops registry:
ops.*/logging-deployer   3.4.0               08eaf2753130        2 days ago          764.3 MB

How reproducible:
Always

Steps to Reproduce:
1. Deployed with "enable-ops-cluster=true", make sure all the pods are running, especially  logging-curator-ops pod
# oc get pods
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-vbu8a           1/1       Running     1          1h
logging-curator-ops-1-usnqm       1/1       Running     0          38m
logging-deployer-07p82            0/1       Completed   0          1h
logging-es-ops-hol5nqp6-1-qik5z   1/1       Running     0          1h
logging-es-vx1qswbu-1-ca7g9       1/1       Running     0          1h
logging-fluentd-sbijb             1/1       Running     0          1h
logging-kibana-1-ynrtn            2/2       Running     0          1h
logging-kibana-ops-1-apym8        2/2       Running     0          1h

# oc get dc
NAME                      REVISION   DESIRED   CURRENT   TRIGGERED BY
logging-curator           1          1         1         
logging-curator-ops       1          1         1         
logging-es-ops-hol5nqp6   1          1         1         
logging-es-vx1qswbu       1          1         1         
logging-kibana            1          1         1         
logging-kibana-ops        1          1         1  

2. Diagnose aggregated logging.
command:oadm diagnostics AggregatedLogging.
3. Check the diagnostics results.

Actual results:
Error shows:
There were no Pods found for DeploymentConfig 'logging-curator-ops'

But logging-curator-ops pod is running and logging-curator-ops dc exists.

# oadm diagnostics AggregatedLogging
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'
Info:  Using context for cluster-admin access: 'logging/ip-xx:8443/system:admin'

[Note] Running diagnostic: AggregatedLogging
       Description: Check aggregated logging integration for proper configuration
       
Info:  Found route 'logging-kibana' matching logging URL 'kibana.xx.com' in project: 'logging'

ERROR: [AGL0095 from diagnostic
AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:96]
       There were no Pods found for DeploymentConfig 'logging-curator-ops'.  Try running
       the following commands for additional information:
       
         oc describe dc logging-curator-ops -n logging
         oc get events -n logging
       
WARN:  [AGL0425 from diagnostic
AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:104]
       There are some nodes that match the selector for DaemonSet 'logging-fluentd'.  
       A list of matching nodes can be discovered by running:
       
         oc get nodes -l logging-infra-fluentd=true




Expected results:
Aggregated logging diagnostics should report:
Logging system running healthy & no issue found.

Additional info:
This issue did not repro if logging is deployed without ops cluster.

Comment 1 Jeff Cantrill 2016-12-01 19:30:58 UTC
fixed in https://github.com/openshift/origin/pull/12099

Comment 2 Junqi Zhao 2016-12-14 10:26:11 UTC
Tested on origin env. This issue does not exist now.
Please merge to OSE, then we can verify and close this issue.

Image Id:
openshift/origin-logging-curator    e2acbe1e04b6
openshift/origin-logging-fluentd    0e106c37e804
openshift/origin-logging-auth-proxy    c4bb5b5d17cf
openshift/origin-logging-deployer    45e11bcdbc0a
openshift/origin-logging-elasticsearch    125e6f97435c
openshift/origin-logging-kibana    614d0c989e42

Comment 3 Troy Dawson 2017-01-20 22:57:06 UTC
This has been merged into ocp and is in OCP v3.5.0.7 or newer.

Comment 4 Junqi Zhao 2017-01-22 06:47:53 UTC
Verified on openshift v3.5.0.7 and logging 3.5.0, set enable-ops-cluster=true and deployed logging 3.5.0, error "no Pods found for DeploymentConfig 'logging-curator-ops'" don't throw out now.

oc get po
NAME                              READY     STATUS      RESTARTS   AGE
logging-curator-1-x42sd           1/1       Running     0          8m
logging-curator-ops-1-lmzc3       1/1       Running     0          8m
logging-deployer-x1vjj            0/1       Completed   0          9m
logging-es-ops-pr20rovo-1-lvfbk   1/1       Running     0          8m
logging-es-ywgs4cs7-1-40618       1/1       Running     0          8m
logging-fluentd-7bxg6             1/1       Running     0          8m
logging-kibana-1-j53p2            2/2       Running     0          8m
logging-kibana-ops-1-bgbk5        2/2       Running     0          8m

openshift version:
openshift v3.5.0.7+390ef18
kubernetes v1.5.2+43a9be4
etcd 3.1.0-rc.0

Image id:
openshift3/logging-deployer    1c7f8f5bb5cc
openshift3/logging-kibana    b5f8fe3fa247
openshift3/logging-auth-proxy    139f7943475e
openshift3/logging-fluentd    e0b004b486b4
openshift3/logging-elasticsearch    7015704dc0f8
openshift3/logging-curator    7f034fdf7702

Set it to VERIFIED and close it.

Comment 6 errata-xmlrpc 2017-04-12 19:16:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884

Comment 7 Junqi Zhao 2017-05-12 07:15:48 UTC
@Jeff,

Same issue happens on OCP 3.4.1, since this defect was fixed on OCP 3.5.0, I have one question, should we back port the fix to OCP 3.4.0 and OCP 3.4.1, although the severity of this defect is not serious.


Thanks

Comment 8 Junqi Zhao 2017-05-12 07:48:12 UTC
(In reply to Junqi Zhao from comment #7)
> @Jeff,
> 
> Same issue happens on OCP 3.4.1, since this defect was fixed on OCP 3.5.0, I
> have one question, should we back port the fix to OCP 3.4.0 and OCP 3.4.1,
> although the severity of this defect is not serious.
> 
> 
> Thanks

correct my word to:

Same issue happens on Logging 3.4.1, since this defect was fixed on Logging 3.5.0, I have one question, should we back port the fix to Logging 3.4.0 and Logging 3.4.1? although the severity of this defect is not serious.

Comment 9 Jeff Cantrill 2017-10-02 12:24:41 UTC
Given the severity of the issue and the fact we are moving into release for 3.7, we will only backport if directed by PM