Bug 1421623
| Summary: | Diagnostics for a healthy logging system failed via ansible installation | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||||
| Component: | Logging | Assignee: | Luke Meyer <lmeyer> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 3.5.0 | CC: | aos-bugs, knakayam, lmeyer, nhosoi, orhan.biyiklioglu, rmeggins, smunilla, xtian | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: |
Cause:
The AggregatedLogging diagnostic was not updated to reflect updates made to logging deployment.
Consequence:
The diagnostic incorrectly reported errors for an unnecessary ServiceAccount and (if present) the mux deployment.
Fix:
These errors are no longer reported. In addition, warnings about missing optional components were all downgraded to Info level.
Result:
The diagnostic no longer needlessly alarms the user for these issues.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-08-10 05:17:28 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Junqi Zhao
2017-02-13 09:44:29 UTC
Same issue with Logging 3.6.0 # oc version oc v3.6.65 kubernetes v1.6.1+5115d708d7 features: Basic-Auth GSSAPI Kerberos SPNEGO # docker images | grep logging openshift3/logging-kibana v3.6 dc571aa09d26 8 hours ago 342.4 MB openshift3/logging-elasticsearch v3.6 d2709cc1e16a 8 hours ago 404.5 MB openshift3/logging-fluentd v3.6 aafaf8787b29 8 hours ago 232.5 MB openshift3/logging-auth-proxy v3.6 11f731349ff9 2 days ago 229.6 MB openshift3/logging-curator v3.6 028e689a3276 6 days ago 211.1 MB The problem is still observed in the latest origin/origin-aggregated-logging code.
# oadm diagnostics AggregatedLogging --diaglevel=0
debug: Checking ServiceAccounts in project 'logging'...
ERROR: [AGL0515 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97]
Did not find ServiceAccounts: logging-deployer. The logging infrastructure will not function
properly without them. You may need to re-run the installer.
The problem is likely the ServiceAccountName for deployer is "deployer", not "logging-deployer".
# oc get serviceaccounts
NAME SECRETS AGE
aggregated-logging-curator 2 23h
aggregated-logging-elasticsearch 2 23h
aggregated-logging-fluentd 2 23h
aggregated-logging-kibana 2 23h
builder 2 23h
default 2 23h
deployer 2 23h
[origin]
diff --git a/pkg/diagnostics/cluster/aggregated_logging/serviceaccounts.go b/pkg/diagnostics/cluster/aggregated_logging/serviceaccounts.go
index 779ced8..a73e83d 100644
--- a/pkg/diagnostics/cluster/aggregated_logging/serviceaccounts.go
+++ b/pkg/diagnostics/cluster/aggregated_logging/serviceaccounts.go
@@ -8,7 +8,7 @@ import (
"k8s.io/apimachinery/pkg/util/sets"
)
-var serviceAccountNames = sets.NewString("logging-deployer", "aggregated-logging-kibana", "aggregated-logging-curator", "aggregated-logging-elasticsearch", fluentdServiceAccountName)
+var serviceAccountNames = sets.NewString("deployer", "aggregated-logging-kibana", "aggregated-logging-curator", "aggregated-logging-elasticsearch", fluentdServiceAccountName)
const serviceAccountsMissing = `
Did not find ServiceAccounts: %s. The logging infrastructure will not function
There is another error reported for logging-mux in the recent code.
# oadm diagnostics AggregatedLogging --diaglevel=0
[...]
debug: Checking for DeploymentConfigs in project 'logging' with selector 'logging-infra'
[...]
debug: Found DeploymentConfig 'logging-kibana-ops' for component 'kibana-ops'
debug: Found DeploymentConfig 'logging-mux' for component 'mux'
debug: Getting pods that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift'
debug: Checking status of Pod 'logging-curator-1-dm8bf'...
[...]
ERROR: [AGL0095 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97]
There were no Pods found for DeploymentConfig 'logging-mux'. Try running
the following commands for additional information:
$ oc describe dc logging-mux -n logging
$ oc get events -n logging
logging-mux is supposed to be generated from the fluentd daemonset(?)
But deploymentconfigs with --selector logging-infra returns logging-mux as follows.
# oc get dc --selector logging-infra
NAME REVISION DESIRED CURRENT TRIGGERED BY
logging-curator 1 1 1 config
logging-curator-ops 1 1 1 config
logging-es-ok6sold4 1 1 1 config
logging-es-ops-7cmmv0rj 1 1 1 config
logging-kibana 1 1 1 config
logging-kibana-ops 1 1 1 config
logging-mux 1 1 1 config
Should the behaviour be fixed or we could just add logging-mux to the loggingComponents in origin/pkg/diagnostics/cluster/aggregated_logging/deploymentconfigs.go?
In 3.5 and later there is no deployer any more. We should just get rid of all references to logging-deployer or logging-deployment or deployer.
> logging-mux is supposed to be generated from the fluentd daemonset(?)
Not exactly. setup-mux.sh will create a deploymentconfig (dc) based on the fluentd daemonset.
Is there a mux pod running? If so, then "There were no Pods found for DeploymentConfig 'logging-mux'." is correct.
I don't think we should add mux to the origin code.
(In reply to Rich Megginson from comment #4) > In 3.5 and later there is no deployer any more. We should just get rid of > all references to logging-deployer or logging-deployment or deployer. > > > logging-mux is supposed to be generated from the fluentd daemonset(?) > > Not exactly. setup-mux.sh will create a deploymentconfig (dc) based on the > fluentd daemonset. Ah, I see. > Is there a mux pod running? Yes, it is. > If so, then "There were no Pods found for > DeploymentConfig 'logging-mux'." is correct. > > I don't think we should add mux to the origin code. Ok. Now, could there be any way to downgrade this "ERROR" to "INFO" or something less scary? ERROR: [AGL0095 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97] There were no Pods found for DeploymentConfig 'logging-mux'. Try running the following commands for additional information: $ oc describe dc logging-mux -n logging $ oc get events -n logging Hmm - I would rather know why the code can find the mux dc, but cannot find the mux pod? (In reply to Rich Megginson from comment #6) > Hmm - I would rather know why the code can find the mux dc, but cannot find > the mux pod? Isn't it because the pods "that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' are retrieved? The mux pod is not in the list (the line 26 in deploymentconfigs.go below). debug: Getting pods that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' debug: Checking status of Pod 'logging-curator-1-dm8bf'... debug: Checking status of Pod 'logging-curator-ops-1-bng6s'... debug: Checking status of Pod 'logging-es-ok6sold4-1-xrx0l'... debug: Checking status of Pod 'logging-es-ops-7cmmv0rj-1-hfbvv'... debug: Checking status of Pod 'logging-kibana-1-358kz'... debug: Checking status of Pod 'logging-kibana-ops-1-sxczs'... ERROR: [AGL0095 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97] There were no Pods found for DeploymentConfig 'logging-mux'. "deploymentconfigs.go" 25 // loggingComponents are those 'managed' by rep controllers (e.g. fluentd is deployed with a DaemonSet) 26 var loggingComponents = sets.NewString(componentNameEs, componentNameEsOps, componentNameKibana, componentNameKibanaOps, componentNameCurator, componentNameCuratorOps) 27 > Isn't it because the pods "that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' are retrieved? The mux pod is not in the list (the line 26 in deploymentconfigs.go below).
OK. We need to add that to setup-mux.sh and the mux ansible code.
(In reply to Rich Megginson from comment #8) > > Isn't it because the pods "that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' are retrieved? The mux pod is not in the list (the line 26 in deploymentconfigs.go below). > > OK. We need to add that to setup-mux.sh and the mux ansible code. Do you have an idea how it could be done in setup-mux.sh/mux ansible code? In the current diagnostic code in origin, it looks to me the selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' is hardcoded in "deploymentconfigs.go"... This is the oc get pods output when the selector is given. (Note: no mux) $ oc get pods -l 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' NAME READY STATUS RESTARTS AGE logging-curator-1-dm8bf 1/1 Running 0 2d logging-curator-ops-1-bng6s 1/1 Running 0 2d logging-es-ok6sold4-1-xrx0l 1/1 Running 0 2d logging-es-ops-7cmmv0rj-1-hfbvv 1/1 Running 0 2d logging-kibana-1-358kz 2/2 Running 7 2d logging-kibana-ops-1-sxczs 2/2 Running 7 2d This selector returns what we want, but I'm not sure if this is always correct or not... And this change could be made in setup-mux.sh or not, either... $ oc get pods -l 'component!=fluentd,provider=openshift' NAME READY STATUS RESTARTS AGE logging-curator-1-dm8bf 1/1 Running 0 2d logging-curator-ops-1-bng6s 1/1 Running 0 2d logging-es-ok6sold4-1-xrx0l 1/1 Running 0 2d logging-es-ops-7cmmv0rj-1-hfbvv 1/1 Running 0 2d logging-kibana-1-358kz 2/2 Running 7 2d logging-kibana-ops-1-sxczs 2/2 Running 7 2d logging-mux-1-fb981 1/1 Running 0 2d Ok. This sounds like a bug in the go code. It shouldn't be looking for dcs using --selector logging-infra, then finding all of the pods in that list by using a different selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift'. I don't know why it can't just do a query "give me all of the pods for dc $dc". It should not have a hard coded selector for the pod query. Same error on logging 3.6.0 Did not find ServiceAccounts: logging-deployer. The logging infrastructure will not function properly without them. You may need to re-run the installer. PR merged Tested with the latest openshift-ansible version 3.6.139-1, issue was not fixed, "Did not find ServiceAccounts: logging-deployer" error info still threw out, see the attached file. Created attachment 1295702 [details]
logging diagnostics info
Output has: [Note] Summary of diagnostics execution (version v3.6.126.14): Seems like something bundled an older build of the openshift binaries. It's not clear to me that ose-ansible 3.6.139-1 ended up being an accepted build. It looks like it has the matching version of oc but somehow you ended up running the 3.6.126.14 client. The fix for this bug isn't on that branch. In any case 3.6.140-1 is available. Can you retest with that? Just make sure of the client version you end up running. You could even run a newer client against an older cluster if you have one handy. It's the client that runs the diagnostic so that's the version that matters here. Verified this issue with same OCP version and openshift-ansible version, issue was fixed, diagnostics info see the attached file Testing env: # openshift version openshift v3.6.140 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 # rpm -qa | grep openshift-ansible openshift-ansible-callback-plugins-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-playbooks-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-lookup-plugins-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-roles-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-docs-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-filter-plugins-3.6.140-1.git.0.4a02427.el7.noarch Created attachment 1296007 [details]
issue is fixed, logging diagnostics info
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716 |