Bug 1536651
| Summary: | logging-mux not working in 3.7.z when logging installed with openshift_logging_use_mux=true | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||
| Component: | Logging | Assignee: | Noriko Hosoi <nhosoi> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Qiaoling Tang <qitang> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 3.7.1 | CC: | anli, aos-bugs, jcantril, mifiedle, nhosoi, rmeggins | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.11.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
When MUX is configured, the mux client fluentd is supposed to set maxmal or minimal to the environment variable MUX_CLIENT_MODE. Without the environment variable set to either of the values, the client fluentd does not forward the logs to the MUX server, but sends them to the ElasticSearch.
The environment variable is set by the ansible variable openshift_logging_mux_client_mode. The variable did not have a default value and setting it was responsibility of the person who deploys the logging system. If it was not set to minimal or maximal, although the MUX server is installed, it is not used, but the client fluentd directly sends logs to the ElasticSearch.
To reduce the confusion, openshift_logging_mux_client_mode is set to maximal.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-10-08 18:09:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1502764 | ||||||
| Attachments: |
|
||||||
|
Description
Mike Fiedler
2018-01-19 20:33:05 UTC
Hi Mike, could you retry adding this to your inventory file? openshift_logging_mux_client_mode=maximal Setting openshift_logging_mux_client_mode=maximal worked. Maybe make that the default? Hi Rich, We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal default on the mux client fluentd. For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and USE_MUX is not set or false, do you think it's safe to assume the fluentd is a mux client and set MUX_CLIENT_MODE=maximal? (In reply to Noriko Hosoi from comment #3) > Hi Rich, > > We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal > default on the mux client fluentd. > > For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and > USE_MUX is not set or false, do you think it's safe to assume the fluentd is > a mux client and set MUX_CLIENT_MODE=maximal? Yes. If using mux, the default should be MUX_CLIENT_MODE=maximal (In reply to Rich Megginson from comment #4) > (In reply to Noriko Hosoi from comment #3) > > Hi Rich, > > > > We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal > > default on the mux client fluentd. > > > > For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and > > USE_MUX is not set or false, do you think it's safe to assume the fluentd is > > a mux client and set MUX_CLIENT_MODE=maximal? > > Yes. If using mux, the default should be MUX_CLIENT_MODE=maximal https://github.com/openshift/origin-aggregated-logging/pull/960 https://github.com/openshift/openshift-ansible/pull/7192 Correction. Updated this O_A pr: https://github.com/openshift/openshift-ansible/pull/7192 Closed this O_A_L pr since it's not necessary. https://github.com/openshift/origin-aggregated-logging/pull/960 Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/013da2143348dbd23761bcf9ac86912f9903181f Bug 1536651 - logging-mux not working in 3.7.z when logging installed with openshift_logging_use_mux=true Setting openshift_logging_use_mux=maximal by default. If the cluster is not configured with mux, this default value is going to be ignored. https://github.com/openshift/openshift-ansible/commit/1576f39dcf5865578e2baffd2a4af8120469f679 Merge pull request #7192 from nhosoi/bz1536651 Automatic merge from submit-queue. Bug 1536651 - logging-mux not working in 3.7.z when logging installed… … with openshift_logging_use_mux=true To set MUX_CLIENT_MODE to maximal by default for the mux client, changing the /etc/fluent/muxkeys mounting condition so that if openshift_logging_use_mux or openshift_logging_mux_allow_external is set to true, /etc/fluent/muxkeys is mounted on the collector fluentd. This openshift-ansible pr is needed for https://github.com/openshift/origin-aggregated-logging/pull/960 The fix isn't in the openshift3/logging-fluentd:3.7.42-3 (In reply to Qiaoling Tang from comment #9) > The fix isn't in the openshift3/logging-fluentd:3.7.42-3 Indeed, this pr/7562 failed in the ci-tests and was not merged into the openshift-ansible release-3.7 branch. https://github.com/openshift/openshift-ansible/pull/7562 Note: merged into the master and release-3.9 branches. Thanks to @ewolinetz, pr/7562 has been merged into the upstream git. Tested with logging v3.7.60. The logging-mux pod is stuck in CrashLoopBackoff 2018-07-30 11:06:15 -0400 [info]: reading config file path="/etc/fluent/fluent.conf" 2018-07-30 11:06:16 -0400 [error]: unexpected error error="No route to host - connect(2)" 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `initialize' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `open' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/timeout.rb:52:in `timeout' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:877:in `connect' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:862:in `do_start' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:851:in `start' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/resource.rb:51:in `get' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:328:in `block in api' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:58:in `handle_exception' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:327:in `api' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:62:in `block in configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `each' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `block in configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `each' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:129:in `configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:103:in `run_configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:489:in `run_configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:174:in `block in start' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `call' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `main_process' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:170:in `start' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/command/fluentd.rb:173:in `<top (required)>' 2018-07-30 11:06:16 -0400 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require' 2018-07-30 11:06:16 -0400 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/bin/fluentd:8:in `<top (required)>' 2018-07-30 11:06:16 -0400 [error]: /usr/bin/fluentd:23:in `load' 2018-07-30 11:06:16 -0400 [error]: /usr/bin/fluentd:23:in `<main>' Inventory: [OSEv3:children] masters etcd [masters] ip-172-18-10-80 [etcd] ip-172-18-10-80 [OSEv3:vars] deployment_type=openshift-enterprise openshift_deployment_type=openshift-enterprise openshift_release=v3.7 openshift_docker_additional_registries=registry.reg-aws.openshift.com openshift_logging_install_logging=true openshift_logging_master_url=https://ec2-34-230-25-109.compute-1.amazonaws.com:8443 openshift_logging_master_public_url=https://ec2-34-230-25-109.compute-1.amazonaws.com:8443 openshift_logging_kibana_hostname=kibana.apps.0730-osb.qe.rhcloud.com openshift_logging_namespace=logging openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_logging_image_version=v3.7 openshift_logging_es_cluster_size=1 openshift_logging_es_pvc_dynamic=true openshift_logging_es_pvc_size=25Gi openshift_logging_es_pvc_storage_class_name=gp2 openshift_logging_fluentd_read_from_head=false openshift_logging_use_mux=true This means the mux pod cannot talk to the kubernetes api server. Try this - oc rsh to the mux pod or oc debug if that doesn't work. Then echo $K8S_HOST_URL it is usually something like https://kubernetes.default.svc.cluster.local then getent hosts kubernetes.default.svc.cluster.local If that doesn't work, then this isn't a logging issue, it is a pod networking/dns issue. Then curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://kubernetes.default.svc.cluster.local If that doesn't work, then hopefully the -v output will give us a clue. sh-4.2# echo $K8S_HOST_URL https://ec2-54-175-214-36.compute-1.amazonaws.com:8443 That's the public URL of the cluster on the load balancer curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://kubernetes.default.svc.cluster.local works fine (returns the list of endpoints) curl-ing $K8S_HOST_URL fails sh-4.2# curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt $K8S_HOST_URL * About to connect() to ec2-54-175-214-36.compute-1.amazonaws.com port 8443 (#0) * Trying 172.18.15.142... * No route to host * Failed connect to ec2-54-175-214-36.compute-1.amazonaws.com:8443; No route to host * Closing connection 0 Is K8S_HOST_URL being set wrong? Should be the internal hostname? Created attachment 1471609 [details] inventory to install 3.7.60 with openshift-ansible 3.7.60 Comment 16 was a new cluster - different from comment 14. Attaching the inventory. (In reply to Mike Fiedler from comment #16) > sh-4.2# echo $K8S_HOST_URL > https://ec2-54-175-214-36.compute-1.amazonaws.com:8443 > > That's the public URL of the cluster on the load balancer > > curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt > https://kubernetes.default.svc.cluster.local > > works fine (returns the list of endpoints) > > curl-ing $K8S_HOST_URL fails > > sh-4.2# curl -s -v --cacert > /var/run/secrets/kubernetes.io/serviceaccount/ca.crt $K8S_HOST_URL > > * About to connect() to ec2-54-175-214-36.compute-1.amazonaws.com port 8443 > (#0) > * Trying 172.18.15.142... > * No route to host > * Failed connect to ec2-54-175-214-36.compute-1.amazonaws.com:8443; No route > to host > * Closing connection 0 > > Is K8S_HOST_URL being set wrong? Should be the internal hostname? I think it is being set wrong. The value of K8S_HOST_URL should almost always be https://kubernetes.default.svc.cluster.local, the internal hostname. (In reply to Rich Megginson from comment #18) > (In reply to Mike Fiedler from comment #16) > > Is K8S_HOST_URL being set wrong? Should be the internal hostname? > > I think it is being set wrong. The value of K8S_HOST_URL should almost > always be https://kubernetes.default.svc.cluster.local, the internal > hostname. Hi Mike, Did you happen to have a chance to retry with the suggested value? Any updates? Thanks! This works fine in 3.11. I am marking this fixed in 3.11 and if it needs to be cloned to a previous release, a copy can be made. |