Description of problem: Installed logging via openshift-ansible 3.7.23 with openshift_logging_use_mux=true in the inventory (full inventory below) The logging dc is created correctly and the logging-mux pod is running after the install completes. However, the logging-fluentd pods are not configured to forward through the logging-mux service. Pod logs still go directly to elasticsearch. Proof was setting the number of logging-mux replicas to 0 and verifying all pod logs were still indexed in ES. I was trying to run logging-mux in 3.7 to verify the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1502764. Verifying this fix in 3.9 is currently blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1531157 Version-Release number of selected component (if applicable): 3.7.23 How reproducible: Always Steps to Reproduce: 1. Install a cluster with 3.7.23. Install openshift-ansible 3.7.23 2. Install logging with the inventory below. Adjust hostnames as needed. 3. Verify the logging-mux pod is running. Verify no errors in the logging-mux pod. 4. oc scale --replicas=0 dc/logging-mux. Verify no mux pod running 5. Create pods which log stdout messages and verify the messages are indexed in elasticsearch Note: ss -tnpi in the logging-fluentd pods used to be a reliable way to view connections from fluentd to ES or fluentd to logging-mux but this seems to not work any longer. Investigating this issue separately. Actual results: Pod logs go directly from logging-fluentd to elasticsearch. logging-mux is bypassed. Expected results: Logs are forwarded through logging-mux to elasticsearch Additional info: [OSEv3:children] masters etcd [masters] ip-172-31-19-165 [etcd] ip-172-31-19-165 [OSEv3:vars] deployment_type=openshift-enterprise openshift_deployment_type=openshift-enterprise openshift_release=v3.8 openshift_docker_additional_registries=registry.reg-aws.openshift.com openshift_logging_install_logging=true openshift_logging_master_url=https://ec2-54-149-169-9.us-west-2.compute.amazonaws.com:8443 openshift_logging_master_public_url=https://ec2-54-149-169-9.us-west-2.compute.amazonaws.com:8443 openshift_logging_kibana_hostname=kibana.apps.0115-yc8.qe.rhcloud.com openshift_logging_namespace=logging openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_logging_image_version=v3.9 openshift_logging_es_cluster_size=1 openshift_logging_es_pvc_dynamic=true openshift_logging_es_pvc_size=50Gi openshift_logging_fluentd_read_from_head=false openshift_logging_use_mux=true
Hi Mike, could you retry adding this to your inventory file? openshift_logging_mux_client_mode=maximal
Setting openshift_logging_mux_client_mode=maximal worked. Maybe make that the default?
Hi Rich, We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal default on the mux client fluentd. For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and USE_MUX is not set or false, do you think it's safe to assume the fluentd is a mux client and set MUX_CLIENT_MODE=maximal?
(In reply to Noriko Hosoi from comment #3) > Hi Rich, > > We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal > default on the mux client fluentd. > > For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and > USE_MUX is not set or false, do you think it's safe to assume the fluentd is > a mux client and set MUX_CLIENT_MODE=maximal? Yes. If using mux, the default should be MUX_CLIENT_MODE=maximal
(In reply to Rich Megginson from comment #4) > (In reply to Noriko Hosoi from comment #3) > > Hi Rich, > > > > We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal > > default on the mux client fluentd. > > > > For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and > > USE_MUX is not set or false, do you think it's safe to assume the fluentd is > > a mux client and set MUX_CLIENT_MODE=maximal? > > Yes. If using mux, the default should be MUX_CLIENT_MODE=maximal https://github.com/openshift/origin-aggregated-logging/pull/960 https://github.com/openshift/openshift-ansible/pull/7192
Correction. Updated this O_A pr: https://github.com/openshift/openshift-ansible/pull/7192 Closed this O_A_L pr since it's not necessary. https://github.com/openshift/origin-aggregated-logging/pull/960
Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/013da2143348dbd23761bcf9ac86912f9903181f Bug 1536651 - logging-mux not working in 3.7.z when logging installed with openshift_logging_use_mux=true Setting openshift_logging_use_mux=maximal by default. If the cluster is not configured with mux, this default value is going to be ignored. https://github.com/openshift/openshift-ansible/commit/1576f39dcf5865578e2baffd2a4af8120469f679 Merge pull request #7192 from nhosoi/bz1536651 Automatic merge from submit-queue. Bug 1536651 - logging-mux not working in 3.7.z when logging installed… … with openshift_logging_use_mux=true To set MUX_CLIENT_MODE to maximal by default for the mux client, changing the /etc/fluent/muxkeys mounting condition so that if openshift_logging_use_mux or openshift_logging_mux_allow_external is set to true, /etc/fluent/muxkeys is mounted on the collector fluentd. This openshift-ansible pr is needed for https://github.com/openshift/origin-aggregated-logging/pull/960
The fix isn't in the openshift3/logging-fluentd:3.7.42-3
(In reply to Qiaoling Tang from comment #9) > The fix isn't in the openshift3/logging-fluentd:3.7.42-3 Indeed, this pr/7562 failed in the ci-tests and was not merged into the openshift-ansible release-3.7 branch. https://github.com/openshift/openshift-ansible/pull/7562 Note: merged into the master and release-3.9 branches.
Thanks to @ewolinetz, pr/7562 has been merged into the upstream git.
Tested with logging v3.7.60. The logging-mux pod is stuck in CrashLoopBackoff 2018-07-30 11:06:15 -0400 [info]: reading config file path="/etc/fluent/fluent.conf" 2018-07-30 11:06:16 -0400 [error]: unexpected error error="No route to host - connect(2)" 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `initialize' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `open' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/timeout.rb:52:in `timeout' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:877:in `connect' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:862:in `do_start' 2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:851:in `start' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/resource.rb:51:in `get' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:328:in `block in api' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:58:in `handle_exception' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:327:in `api' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:62:in `block in configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `each' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `block in configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `each' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:129:in `configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:103:in `run_configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:489:in `run_configure' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:174:in `block in start' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `call' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `main_process' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:170:in `start' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/command/fluentd.rb:173:in `<top (required)>' 2018-07-30 11:06:16 -0400 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require' 2018-07-30 11:06:16 -0400 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require' 2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/bin/fluentd:8:in `<top (required)>' 2018-07-30 11:06:16 -0400 [error]: /usr/bin/fluentd:23:in `load' 2018-07-30 11:06:16 -0400 [error]: /usr/bin/fluentd:23:in `<main>' Inventory: [OSEv3:children] masters etcd [masters] ip-172-18-10-80 [etcd] ip-172-18-10-80 [OSEv3:vars] deployment_type=openshift-enterprise openshift_deployment_type=openshift-enterprise openshift_release=v3.7 openshift_docker_additional_registries=registry.reg-aws.openshift.com openshift_logging_install_logging=true openshift_logging_master_url=https://ec2-34-230-25-109.compute-1.amazonaws.com:8443 openshift_logging_master_public_url=https://ec2-34-230-25-109.compute-1.amazonaws.com:8443 openshift_logging_kibana_hostname=kibana.apps.0730-osb.qe.rhcloud.com openshift_logging_namespace=logging openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_logging_image_version=v3.7 openshift_logging_es_cluster_size=1 openshift_logging_es_pvc_dynamic=true openshift_logging_es_pvc_size=25Gi openshift_logging_es_pvc_storage_class_name=gp2 openshift_logging_fluentd_read_from_head=false openshift_logging_use_mux=true
This means the mux pod cannot talk to the kubernetes api server. Try this - oc rsh to the mux pod or oc debug if that doesn't work. Then echo $K8S_HOST_URL it is usually something like https://kubernetes.default.svc.cluster.local then getent hosts kubernetes.default.svc.cluster.local If that doesn't work, then this isn't a logging issue, it is a pod networking/dns issue. Then curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://kubernetes.default.svc.cluster.local If that doesn't work, then hopefully the -v output will give us a clue.
sh-4.2# echo $K8S_HOST_URL https://ec2-54-175-214-36.compute-1.amazonaws.com:8443 That's the public URL of the cluster on the load balancer curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://kubernetes.default.svc.cluster.local works fine (returns the list of endpoints) curl-ing $K8S_HOST_URL fails sh-4.2# curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt $K8S_HOST_URL * About to connect() to ec2-54-175-214-36.compute-1.amazonaws.com port 8443 (#0) * Trying 172.18.15.142... * No route to host * Failed connect to ec2-54-175-214-36.compute-1.amazonaws.com:8443; No route to host * Closing connection 0 Is K8S_HOST_URL being set wrong? Should be the internal hostname?
Created attachment 1471609 [details] inventory to install 3.7.60 with openshift-ansible 3.7.60 Comment 16 was a new cluster - different from comment 14. Attaching the inventory.
(In reply to Mike Fiedler from comment #16) > sh-4.2# echo $K8S_HOST_URL > https://ec2-54-175-214-36.compute-1.amazonaws.com:8443 > > That's the public URL of the cluster on the load balancer > > curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt > https://kubernetes.default.svc.cluster.local > > works fine (returns the list of endpoints) > > curl-ing $K8S_HOST_URL fails > > sh-4.2# curl -s -v --cacert > /var/run/secrets/kubernetes.io/serviceaccount/ca.crt $K8S_HOST_URL > > * About to connect() to ec2-54-175-214-36.compute-1.amazonaws.com port 8443 > (#0) > * Trying 172.18.15.142... > * No route to host > * Failed connect to ec2-54-175-214-36.compute-1.amazonaws.com:8443; No route > to host > * Closing connection 0 > > Is K8S_HOST_URL being set wrong? Should be the internal hostname? I think it is being set wrong. The value of K8S_HOST_URL should almost always be https://kubernetes.default.svc.cluster.local, the internal hostname.
(In reply to Rich Megginson from comment #18) > (In reply to Mike Fiedler from comment #16) > > Is K8S_HOST_URL being set wrong? Should be the internal hostname? > > I think it is being set wrong. The value of K8S_HOST_URL should almost > always be https://kubernetes.default.svc.cluster.local, the internal > hostname. Hi Mike, Did you happen to have a chance to retry with the suggested value? Any updates? Thanks!
This works fine in 3.11. I am marking this fixed in 3.11 and if it needs to be cloned to a previous release, a copy can be made.