Bug 1465718

Summary: Logging services are exist but "oadm diagnostics AggregatedLogging" reports the services are not found
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: ocAssignee: Luke Meyer <lmeyer>
Status: CLOSED CURRENTRELEASE QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, jcantril, jokerman, lmeyer, mmccomas, pportant, pweil, rcarvalh, rmeggins
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-10 13:07:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logging diagnostics info
none
logging diagnostics info
none
issue reproduced none

Description Junqi Zhao 2017-06-28 02:22:14 UTC
Description of problem:
Run "oadm diagnostics AggregatedLogging" command after deployed logging, it reports unable to find 'logging-es', 'logging-es-cluster',  'logging-kibana' services, but actually these services are exist
As for "Did not find ServiceAccounts: logging-deployer" error info, it is already reported in https://bugzilla.redhat.com/show_bug.cgi?id=1421623
# oadm diagnostics AggregatedLogging
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'
Info:  Using context for cluster-admin access: 'logging/host-8-175-206-host-centralci-eng-rdu2-redhat-com:8443/system:admin'

[Note] Running diagnostic: AggregatedLogging
       Description: Check aggregated logging integration for proper configuration
       
Info:  Found route 'logging-kibana' matching logging URL 'kibana.0627-z5z.qe.rhcloud.com' in project: 'logging'

ERROR: [AGL0515 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97]
       Did not find ServiceAccounts: logging-deployer.  The logging infrastructure will not function 
       properly without them.  You may need to re-run the installer.
       
ERROR: [AGL0217 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97]
       Expected to find 'logging-es' among the logging services for the project but did not.
       
ERROR: [AGL0217 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97]
       Expected to find 'logging-es-cluster' among the logging services for the project but did not.
       
WARN:  [AGL0215 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:105]
       Expected to find 'logging-es-ops' among the logging services for the project but did not. This
       may not matter if you chose not to install a separate logging stack to support operations.
       
WARN:  [AGL0215 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:105]
       Expected to find 'logging-es-ops-cluster' among the logging services for the project but did not. This
       may not matter if you chose not to install a separate logging stack to support operations.
       
ERROR: [AGL0217 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97]
       Expected to find 'logging-kibana' among the logging services for the project but did not.
       
WARN:  [AGL0215 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:105]
       Expected to find 'logging-kibana-ops' among the logging services for the project but did not. This
       may not matter if you chose not to install a separate logging stack to support operations.
       
[Note] Summary of diagnostics execution (version v3.6.122):
[Note] Warnings seen: 3
[Note] Errors seen: 4
# oc get svc
NAME                     CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
logging-es               172.30.201.14    <none>        9200/TCP   16h
logging-es-cluster       172.30.210.198   <none>        9300/TCP   16h
logging-es-ops           172.30.176.229   <none>        9200/TCP   39m
logging-es-ops-cluster   172.30.220.66    <none>        9300/TCP   39m
logging-kibana           172.30.214.18    <none>        443/TCP    16h
logging-kibana-ops       172.30.46.91     <none>        443/TCP    38m

Version-Release number of selected component (if applicable):
Images from brew registry
logging-elasticsearch      v3.6                19ad6f8e4738        29 minutes ago      404.2 MB
logging-auth-proxy         v3.6                d94bddb3dcba        8 hours ago         214.8 MB
logging-kibana             v3.6                4eabc3acd717        21 hours ago        342.4 MB
logging-fluentd            v3.6                08e8a59602fe        21 hours ago        232.5 MB
logging-curator            v3.6                a0148dd96b8d        2 weeks ago         221.5 MB

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging via ansible
2. Run "oadm diagnostics AggregatedLogging" command
3.

Actual results:
"oadm diagnostics AggregatedLogging" reports unable to find 'logging-es', 'logging-es-cluster',  'logging-kibana' services, but actually these services are exist

Expected results:
Logging system running healthy & no issue found.

Additional info:

Comment 1 Junqi Zhao 2017-06-28 03:26:18 UTC
# oc version
oc v3.6.122
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Comment 2 Rodolfo Carvalho 2017-06-29 10:14:54 UTC
Jeff, is there a difference in the logging components for 3.6 that would require changes to `oadm diagnostics`?

Comment 3 Rodolfo Carvalho 2017-06-29 10:16:07 UTC
Luke, could you please have a look at this too?

Comment 4 Luke Meyer 2017-06-29 13:08:27 UTC
Probably should not check for the logging-deployer service account anymore, but the other ERRORs seem like things that should be there.

Comment 5 Luke Meyer 2017-06-29 18:43:47 UTC
If I add `-d` I can see why it's not finding the services. The services are no longer created with the expected label.

============
debug: Checking for services in project 'logging' with selector 'logging-infra=support'

ERROR: [AGL0217 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:96]
       Expected to find 'logging-es' among the logging services for the project but did not.
===========

In fact the created services no longer have any labels whatsoever.

Comment 6 Luke Meyer 2017-06-29 20:01:14 UTC
The services not being found should be addressed by https://github.com/openshift/openshift-ansible/pull/4649

I suggest we leave https://bugzilla.redhat.com/show_bug.cgi?id=1421623 for the remaining items as those will be origin changes.

Comment 7 Rodolfo Carvalho 2017-06-30 08:36:47 UTC
Thanks, Luke. IIUC there is no released version where the labels are missing and this was a regression in 3.6, correct?

Comment 8 Luke Meyer 2017-06-30 11:20:01 UTC
Assuming the PR makes it into 3.6.0, yes.

Comment 9 Rodolfo Carvalho 2017-07-03 11:08:03 UTC
https://github.com/openshift/openshift-ansible/pull/4649 was merged, this is ready for testing.

Comment 10 Luke Meyer 2017-07-03 13:19:34 UTC
Not certain it's been built yet but probably by the time QE gets to this.

Comment 11 Junqi Zhao 2017-07-04 01:39:05 UTC
Tested, ops cluster was not enabled, but there were warn info showed it expected to find 'logging-es-ops', 'logging-es-ops-cluster', 'logging-kibana-ops' among the logging services for the project but did not.

See the attached file, if we decide to ignore these warn info, then I think we can close this defect.

Comment 12 Junqi Zhao 2017-07-04 01:40:08 UTC
Created attachment 1294019 [details]
logging diagnostics info

Comment 13 Junqi Zhao 2017-07-04 01:44:43 UTC
Created attachment 1294021 [details]
logging diagnostics info

Comment 14 Luke Meyer 2017-07-04 15:46:28 UTC
From your output it looks like this particular bug has been addressed. I'm using the other bug 1421623 and PR to downgrade the ops warnings to info and get rid of the error about the deployer service account.

Comment 15 Junqi Zhao 2017-07-07 09:02:15 UTC
It's reproduced, see the attached file
# openshift version
openshift v3.6.136
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

Comment 16 Junqi Zhao 2017-07-07 09:02:46 UTC
Created attachment 1295231 [details]
issue reproduced

Comment 17 Luke Meyer 2017-07-07 11:07:05 UTC
(In reply to Junqi Zhao from comment #15)
> It's reproduced

What was the version of openshift-ansible used to produce this?

The issue was that the services were created without the right metadata that the diagnostic is looking for. The latest openshift-ansible 3.6 should be creating these correctly.

Comment 18 Luke Meyer 2017-07-07 13:53:41 UTC
Not a blocker so moving to 3.6.1. But I think it's likely fixed anyway unless the fix got reverted somehow.

Comment 19 Junqi Zhao 2017-07-10 04:22:49 UTC
(In reply to Luke Meyer from comment #17)
> (In reply to Junqi Zhao from comment #15)
> > It's reproduced
> 
> What was the version of openshift-ansible used to produce this?
> 
> The issue was that the services were created without the right metadata that
> the diagnostic is looking for. The latest openshift-ansible 3.6 should be
> creating these correctly.

Tested with the latest openshift-ansible version 3.6.139-1, this issue does not exist.

# rpm -qa | grep openshift-ansible
openshift-ansible-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-roles-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-docs-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-lookup-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-callback-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-playbooks-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-filter-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch