Description of problem: Deploy logging 3.5.0 stacks with ansible scripts. After EFK pods are all running, run "oadm diagnostics AggregatedLogging" command, error "Did not find ServiceAccounts: logging-deployer" throw out. PS: Although I did not set openshift_logging_use_ops=true, there are warning info for ops logging services, I think these info should not pop out. Version-Release number of selected component (if applicable): # openshift version openshift v3.5.0.18+9a5d1aa kubernetes v1.5.2+43a9be4 etcd 3.1.0 Image id: openshift3/logging-elasticsearch d715f4d34ad4 openshift3/logging-kibana e0ab09c2cbeb openshift3/logging-fluentd 47057624ecab openshift3/logging-auth-proxy 139f7943475e openshift3/logging-curator 7f034fdf7702 How reproducible: Always Steps to Reproduce: 1. Deploy logging 3.5.0 stacks with ansible The inventory file: [oo_first_master] $master ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="~/libra.pem" openshift_public_hostname=$master [oo_first_master:vars] deployment_type=openshift-enterprise openshift_release=v3.5.0 openshift_logging_install_logging=true openshift_logging_kibana_hostname=kibana.$subdomain openshift_logging_kibana_ops_hostname=kibana-ops.$subdomain public_master_url=https://$master:8443 openshift_logging_fluentd_hosts=$node openshift_logging_image_prefix=$registry/openshift3/ openshift_logging_image_version=3.5.0 openshift_logging_namespace=logging openshift_logging_fluentd_use_journal=fasle 2. Wait until EFK pods are all running, and ES in green status 3. oadm diagnostics AggregatedLogging Actual results: # oadm diagnostics AggregatedLogging [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' Info: Using context for cluster-admin access: 'juzhao/ip-172-18-9-26-ec2-internal:8443/system:admin' [Note] Running diagnostic: AggregatedLogging Description: Check aggregated logging integration for proper configuration Info: Found route 'logging-kibana' matching logging URL 'kibana.0213-1ht.qe.rhcloud.com' in project: 'juzhao' WARN: [AGL0030 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:207] The project 'juzhao' was found with either a missing or non-empty node selector annotation. This could keep Fluentd from running on certain nodes and collecting logs from the entire cluster. You can correct it by editing the project: $ oc edit namespace juzhao and updating the annotation: 'openshift.io/node-selector' : "" ERROR: [AGL0515 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:96] Did not find ServiceAccounts: logging-deployer. The logging infrastructure will not function properly without them. You may need to re-run the installer. Info: Did not find a DeploymentConfig to support component 'curator-ops'. If you require a separate ElasticSearch cluster to aggregate operations logs, please re-install or update logging and specify the appropriate switch to enable the ops cluster. Info: Did not find a DeploymentConfig to support component 'es-ops'. If you require a separate ElasticSearch cluster to aggregate operations logs, please re-install or update logging and specify the appropriate switch to enable the ops cluster. Info: Did not find a DeploymentConfig to support component 'kibana-ops'. If you require a separate ElasticSearch cluster to aggregate operations logs, please re-install or update logging and specify the appropriate switch to enable the ops cluster. WARN: [AGL0425 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:104] There are some nodes that match the selector for DaemonSet 'logging-fluentd'. A list of matching nodes can be discovered by running: $ oc get nodes -l logging-infra-fluentd=true WARN: [AGL0215 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:104] Expected to find 'logging-es-ops' among the logging services for the project but did not. This may not matter if you chose not to install a separate logging stack to support operations. WARN: [AGL0215 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:104] Expected to find 'logging-es-ops-cluster' among the logging services for the project but did not. This may not matter if you chose not to install a separate logging stack to support operations. WARN: [AGL0215 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:104] Expected to find 'logging-kibana-ops' among the logging services for the project but did not. This may not matter if you chose not to install a separate logging stack to support operations. [Note] Summary of diagnostics execution (version v3.5.0.19+199197c): [Note] Warnings seen: 5 [Note] Errors seen: 1 Expected results: Logging system running healthy & no issue found. Additional info:
Same issue with Logging 3.6.0 # oc version oc v3.6.65 kubernetes v1.6.1+5115d708d7 features: Basic-Auth GSSAPI Kerberos SPNEGO # docker images | grep logging openshift3/logging-kibana v3.6 dc571aa09d26 8 hours ago 342.4 MB openshift3/logging-elasticsearch v3.6 d2709cc1e16a 8 hours ago 404.5 MB openshift3/logging-fluentd v3.6 aafaf8787b29 8 hours ago 232.5 MB openshift3/logging-auth-proxy v3.6 11f731349ff9 2 days ago 229.6 MB openshift3/logging-curator v3.6 028e689a3276 6 days ago 211.1 MB
The problem is still observed in the latest origin/origin-aggregated-logging code. # oadm diagnostics AggregatedLogging --diaglevel=0 debug: Checking ServiceAccounts in project 'logging'... ERROR: [AGL0515 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97] Did not find ServiceAccounts: logging-deployer. The logging infrastructure will not function properly without them. You may need to re-run the installer. The problem is likely the ServiceAccountName for deployer is "deployer", not "logging-deployer". # oc get serviceaccounts NAME SECRETS AGE aggregated-logging-curator 2 23h aggregated-logging-elasticsearch 2 23h aggregated-logging-fluentd 2 23h aggregated-logging-kibana 2 23h builder 2 23h default 2 23h deployer 2 23h [origin] diff --git a/pkg/diagnostics/cluster/aggregated_logging/serviceaccounts.go b/pkg/diagnostics/cluster/aggregated_logging/serviceaccounts.go index 779ced8..a73e83d 100644 --- a/pkg/diagnostics/cluster/aggregated_logging/serviceaccounts.go +++ b/pkg/diagnostics/cluster/aggregated_logging/serviceaccounts.go @@ -8,7 +8,7 @@ import ( "k8s.io/apimachinery/pkg/util/sets" ) -var serviceAccountNames = sets.NewString("logging-deployer", "aggregated-logging-kibana", "aggregated-logging-curator", "aggregated-logging-elasticsearch", fluentdServiceAccountName) +var serviceAccountNames = sets.NewString("deployer", "aggregated-logging-kibana", "aggregated-logging-curator", "aggregated-logging-elasticsearch", fluentdServiceAccountName) const serviceAccountsMissing = ` Did not find ServiceAccounts: %s. The logging infrastructure will not function
There is another error reported for logging-mux in the recent code. # oadm diagnostics AggregatedLogging --diaglevel=0 [...] debug: Checking for DeploymentConfigs in project 'logging' with selector 'logging-infra' [...] debug: Found DeploymentConfig 'logging-kibana-ops' for component 'kibana-ops' debug: Found DeploymentConfig 'logging-mux' for component 'mux' debug: Getting pods that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' debug: Checking status of Pod 'logging-curator-1-dm8bf'... [...] ERROR: [AGL0095 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97] There were no Pods found for DeploymentConfig 'logging-mux'. Try running the following commands for additional information: $ oc describe dc logging-mux -n logging $ oc get events -n logging logging-mux is supposed to be generated from the fluentd daemonset(?) But deploymentconfigs with --selector logging-infra returns logging-mux as follows. # oc get dc --selector logging-infra NAME REVISION DESIRED CURRENT TRIGGERED BY logging-curator 1 1 1 config logging-curator-ops 1 1 1 config logging-es-ok6sold4 1 1 1 config logging-es-ops-7cmmv0rj 1 1 1 config logging-kibana 1 1 1 config logging-kibana-ops 1 1 1 config logging-mux 1 1 1 config Should the behaviour be fixed or we could just add logging-mux to the loggingComponents in origin/pkg/diagnostics/cluster/aggregated_logging/deploymentconfigs.go?
In 3.5 and later there is no deployer any more. We should just get rid of all references to logging-deployer or logging-deployment or deployer. > logging-mux is supposed to be generated from the fluentd daemonset(?) Not exactly. setup-mux.sh will create a deploymentconfig (dc) based on the fluentd daemonset. Is there a mux pod running? If so, then "There were no Pods found for DeploymentConfig 'logging-mux'." is correct. I don't think we should add mux to the origin code.
(In reply to Rich Megginson from comment #4) > In 3.5 and later there is no deployer any more. We should just get rid of > all references to logging-deployer or logging-deployment or deployer. > > > logging-mux is supposed to be generated from the fluentd daemonset(?) > > Not exactly. setup-mux.sh will create a deploymentconfig (dc) based on the > fluentd daemonset. Ah, I see. > Is there a mux pod running? Yes, it is. > If so, then "There were no Pods found for > DeploymentConfig 'logging-mux'." is correct. > > I don't think we should add mux to the origin code. Ok. Now, could there be any way to downgrade this "ERROR" to "INFO" or something less scary? ERROR: [AGL0095 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97] There were no Pods found for DeploymentConfig 'logging-mux'. Try running the following commands for additional information: $ oc describe dc logging-mux -n logging $ oc get events -n logging
Hmm - I would rather know why the code can find the mux dc, but cannot find the mux pod?
(In reply to Rich Megginson from comment #6) > Hmm - I would rather know why the code can find the mux dc, but cannot find > the mux pod? Isn't it because the pods "that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' are retrieved? The mux pod is not in the list (the line 26 in deploymentconfigs.go below). debug: Getting pods that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' debug: Checking status of Pod 'logging-curator-1-dm8bf'... debug: Checking status of Pod 'logging-curator-ops-1-bng6s'... debug: Checking status of Pod 'logging-es-ok6sold4-1-xrx0l'... debug: Checking status of Pod 'logging-es-ops-7cmmv0rj-1-hfbvv'... debug: Checking status of Pod 'logging-kibana-1-358kz'... debug: Checking status of Pod 'logging-kibana-ops-1-sxczs'... ERROR: [AGL0095 from diagnostic AggregatedLogging@openshift/origin/pkg/diagnostics/cluster/aggregated_logging/diagnostic.go:97] There were no Pods found for DeploymentConfig 'logging-mux'. "deploymentconfigs.go" 25 // loggingComponents are those 'managed' by rep controllers (e.g. fluentd is deployed with a DaemonSet) 26 var loggingComponents = sets.NewString(componentNameEs, componentNameEsOps, componentNameKibana, componentNameKibanaOps, componentNameCurator, componentNameCuratorOps) 27
> Isn't it because the pods "that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' are retrieved? The mux pod is not in the list (the line 26 in deploymentconfigs.go below). OK. We need to add that to setup-mux.sh and the mux ansible code.
(In reply to Rich Megginson from comment #8) > > Isn't it because the pods "that match selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' are retrieved? The mux pod is not in the list (the line 26 in deploymentconfigs.go below). > > OK. We need to add that to setup-mux.sh and the mux ansible code. Do you have an idea how it could be done in setup-mux.sh/mux ansible code? In the current diagnostic code in origin, it looks to me the selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' is hardcoded in "deploymentconfigs.go"... This is the oc get pods output when the selector is given. (Note: no mux) $ oc get pods -l 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift' NAME READY STATUS RESTARTS AGE logging-curator-1-dm8bf 1/1 Running 0 2d logging-curator-ops-1-bng6s 1/1 Running 0 2d logging-es-ok6sold4-1-xrx0l 1/1 Running 0 2d logging-es-ops-7cmmv0rj-1-hfbvv 1/1 Running 0 2d logging-kibana-1-358kz 2/2 Running 7 2d logging-kibana-ops-1-sxczs 2/2 Running 7 2d This selector returns what we want, but I'm not sure if this is always correct or not... And this change could be made in setup-mux.sh or not, either... $ oc get pods -l 'component!=fluentd,provider=openshift' NAME READY STATUS RESTARTS AGE logging-curator-1-dm8bf 1/1 Running 0 2d logging-curator-ops-1-bng6s 1/1 Running 0 2d logging-es-ok6sold4-1-xrx0l 1/1 Running 0 2d logging-es-ops-7cmmv0rj-1-hfbvv 1/1 Running 0 2d logging-kibana-1-358kz 2/2 Running 7 2d logging-kibana-ops-1-sxczs 2/2 Running 7 2d logging-mux-1-fb981 1/1 Running 0 2d
Ok. This sounds like a bug in the go code. It shouldn't be looking for dcs using --selector logging-infra, then finding all of the pods in that list by using a different selector 'component in (curator,curator-ops,es,es-ops,kibana,kibana-ops),provider=openshift'. I don't know why it can't just do a query "give me all of the pods for dc $dc". It should not have a hard coded selector for the pod query.
Same error on logging 3.6.0 Did not find ServiceAccounts: logging-deployer. The logging infrastructure will not function properly without them. You may need to re-run the installer.
https://github.com/openshift/origin/pull/14991
PR merged
Tested with the latest openshift-ansible version 3.6.139-1, issue was not fixed, "Did not find ServiceAccounts: logging-deployer" error info still threw out, see the attached file.
Created attachment 1295702 [details] logging diagnostics info
Output has: [Note] Summary of diagnostics execution (version v3.6.126.14): Seems like something bundled an older build of the openshift binaries.
It's not clear to me that ose-ansible 3.6.139-1 ended up being an accepted build. It looks like it has the matching version of oc but somehow you ended up running the 3.6.126.14 client. The fix for this bug isn't on that branch. In any case 3.6.140-1 is available. Can you retest with that? Just make sure of the client version you end up running. You could even run a newer client against an older cluster if you have one handy. It's the client that runs the diagnostic so that's the version that matters here.
Verified this issue with same OCP version and openshift-ansible version, issue was fixed, diagnostics info see the attached file Testing env: # openshift version openshift v3.6.140 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 # rpm -qa | grep openshift-ansible openshift-ansible-callback-plugins-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-playbooks-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-lookup-plugins-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-roles-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-docs-3.6.140-1.git.0.4a02427.el7.noarch openshift-ansible-filter-plugins-3.6.140-1.git.0.4a02427.el7.noarch
Created attachment 1296007 [details] issue is fixed, logging diagnostics info
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716