Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1536651

Summary:

logging-mux not working in 3.7.z when logging installed with openshift_logging_use_mux=true

Product:

OpenShift Container Platform

Reporter:

Mike Fiedler <mifiedle>

Component:

Logging

Assignee:

Noriko Hosoi <nhosoi>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Qiaoling Tang <qitang>

Severity:

medium

Docs Contact:

Priority:

low

Version:

3.7.1

CC:

anli, aos-bugs, jcantril, mifiedle, nhosoi, rmeggins

Target Milestone:

---

Target Release:

3.11.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

When MUX is configured, the mux client fluentd is supposed to set maxmal or minimal to the environment variable MUX_CLIENT_MODE. Without the environment variable set to either of the values, the client fluentd does not forward the logs to the MUX server, but sends them to the ElasticSearch. The environment variable is set by the ansible variable openshift_logging_mux_client_mode. The variable did not have a default value and setting it was responsibility of the person who deploys the logging system. If it was not set to minimal or maximal, although the MUX server is installed, it is not used, but the client fluentd directly sends logs to the ElasticSearch. To reduce the confusion, openshift_logging_mux_client_mode is set to maximal.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-10-08 18:09:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1502764

Attachments:

Description	Flags
inventory to install 3.7.60 with openshift-ansible 3.7.60	none

Description Mike Fiedler 2018-01-19 20:33:05 UTC

Description of problem:

Installed logging via openshift-ansible 3.7.23 with openshift_logging_use_mux=true in the inventory (full inventory below)

The logging dc is created correctly and the logging-mux pod is running after the install completes. However, the logging-fluentd pods are not configured to forward through the logging-mux service. Pod logs still go directly to elasticsearch. Proof was setting the number of logging-mux replicas to 0 and verifying all pod logs were still indexed in ES.

I was trying to run logging-mux in 3.7 to verify the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1502764. Verifying this fix in 3.9 is currently blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1531157

Version-Release number of selected component (if applicable): 3.7.23

How reproducible: Always

Steps to Reproduce:
1. Install a cluster with 3.7.23. Install openshift-ansible 3.7.23
2. Install logging with the inventory below. Adjust hostnames as needed.
3. Verify the logging-mux pod is running. Verify no errors in the logging-mux pod.
4. oc scale --replicas=0 dc/logging-mux. Verify no mux pod running
5. Create pods which log stdout messages and verify the messages are indexed in elasticsearch

Note: ss -tnpi in the logging-fluentd pods used to be a reliable way to view connections from fluentd to ES or fluentd to logging-mux but this seems to not work any longer. Investigating this issue separately.

Actual results:

Pod logs go directly from logging-fluentd to elasticsearch. logging-mux is bypassed.

Expected results:

Logs are forwarded through logging-mux to elasticsearch

Additional info:

[OSEv3:children]
masters
etcd

[masters]
ip-172-31-19-165

[etcd]
ip-172-31-19-165

[OSEv3:vars]
deployment_type=openshift-enterprise

openshift_deployment_type=openshift-enterprise
openshift_release=v3.8
openshift_docker_additional_registries=registry.reg-aws.openshift.com

openshift_logging_install_logging=true
openshift_logging_master_url=https://ec2-54-149-169-9.us-west-2.compute.amazonaws.com:8443
openshift_logging_master_public_url=https://ec2-54-149-169-9.us-west-2.compute.amazonaws.com:8443
openshift_logging_kibana_hostname=kibana.apps.0115-yc8.qe.rhcloud.com
openshift_logging_namespace=logging
openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_logging_image_version=v3.9
openshift_logging_es_cluster_size=1
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_size=50Gi
openshift_logging_fluentd_read_from_head=false
openshift_logging_use_mux=true

Comment 1 Noriko Hosoi 2018-01-20 00:33:28 UTC

Hi Mike, could you retry adding this to your inventory file?
openshift_logging_mux_client_mode=maximal

Comment 2 Mike Fiedler 2018-01-22 18:45:24 UTC

Setting openshift_logging_mux_client_mode=maximal worked.   Maybe make that the default?

Comment 3 Noriko Hosoi 2018-02-14 21:32:11 UTC

Hi Rich,

We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal default on the mux client fluentd.

For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and USE_MUX is not set or false, do you think it's safe to assume the fluentd is a mux client and set MUX_CLIENT_MODE=maximal?

Comment 4 Rich Megginson 2018-02-14 21:54:31 UTC

(In reply to Noriko Hosoi from comment #3)
> Hi Rich,
> 
> We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal
> default on the mux client fluentd.
> 
> For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and
> USE_MUX is not set or false, do you think it's safe to assume the fluentd is
> a mux client and set MUX_CLIENT_MODE=maximal?

Yes.  If using mux, the default should be MUX_CLIENT_MODE=maximal

Comment 5 Noriko Hosoi 2018-02-16 22:16:38 UTC

(In reply to Rich Megginson from comment #4)
> (In reply to Noriko Hosoi from comment #3)
> > Hi Rich,
> > 
> > We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal
> > default on the mux client fluentd.
> > 
> > For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and
> > USE_MUX is not set or false, do you think it's safe to assume the fluentd is
> > a mux client and set MUX_CLIENT_MODE=maximal?
> 
> Yes.  If using mux, the default should be MUX_CLIENT_MODE=maximal

https://github.com/openshift/origin-aggregated-logging/pull/960
https://github.com/openshift/openshift-ansible/pull/7192

Comment 6 Noriko Hosoi 2018-02-20 01:46:53 UTC

Correction.

Updated this O_A pr:
https://github.com/openshift/openshift-ansible/pull/7192

Closed this O_A_L pr since it's not necessary.
https://github.com/openshift/origin-aggregated-logging/pull/960

Comment 7 openshift-github-bot 2018-02-26 23:53:34 UTC

Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/013da2143348dbd23761bcf9ac86912f9903181f
Bug 1536651 - logging-mux not working in 3.7.z when logging installed with openshift_logging_use_mux=true

Setting openshift_logging_use_mux=maximal by default.
If the cluster is not configured with mux, this default value is going to be ignored.

https://github.com/openshift/openshift-ansible/commit/1576f39dcf5865578e2baffd2a4af8120469f679
Merge pull request #7192 from nhosoi/bz1536651

Automatic merge from submit-queue.

Bug 1536651 - logging-mux not working in 3.7.z when logging installed…

… with openshift_logging_use_mux=true

To set MUX_CLIENT_MODE to maximal by default for the mux client, changing the
/etc/fluent/muxkeys mounting condition so that if openshift_logging_use_mux
or openshift_logging_mux_allow_external is set to true, /etc/fluent/muxkeys
is mounted on the collector fluentd.

This openshift-ansible pr is needed for https://github.com/openshift/origin-aggregated-logging/pull/960

Comment 9 Qiaoling Tang 2018-04-17 09:21:14 UTC

The fix isn't in the openshift3/logging-fluentd:3.7.42-3

Comment 10 Noriko Hosoi 2018-04-17 17:28:41 UTC

(In reply to Qiaoling Tang from comment #9)
> The fix isn't in the openshift3/logging-fluentd:3.7.42-3

Indeed, this pr/7562 failed in the ci-tests and was not merged into the openshift-ansible release-3.7 branch.
https://github.com/openshift/openshift-ansible/pull/7562

Note: merged into the master and release-3.9 branches.

Comment 11 Noriko Hosoi 2018-04-17 20:28:25 UTC

Thanks to @ewolinetz, pr/7562 has been merged into the upstream git.

Comment 14 Mike Fiedler 2018-07-30 15:11:26 UTC

Tested with logging v3.7.60.  The logging-mux pod is stuck in CrashLoopBackoff 

2018-07-30 11:06:15 -0400 [info]: reading config file path="/etc/fluent/fluent.conf"
2018-07-30 11:06:16 -0400 [error]: unexpected error error="No route to host - connect(2)"
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `initialize'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `open'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/timeout.rb:52:in `timeout'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:877:in `connect'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:862:in `do_start'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:851:in `start'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/resource.rb:51:in `get'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:328:in `block in api'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:58:in `handle_exception'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:327:in `api'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure'                                                                                                              
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:62:in `block in configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `each'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `block in configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `each'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:129:in `configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:103:in `run_configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:489:in `run_configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:174:in `block in start'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `call'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `main_process'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:170:in `start'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/command/fluentd.rb:173:in `<top (required)>'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/bin/fluentd:8:in `<top (required)>'
  2018-07-30 11:06:16 -0400 [error]: /usr/bin/fluentd:23:in `load'
  2018-07-30 11:06:16 -0400 [error]: /usr/bin/fluentd:23:in `<main>'

Inventory:

[OSEv3:children]                                                      
masters                                                               
etcd                                                                  


[masters]                                                             
ip-172-18-10-80

[etcd]                                                                
ip-172-18-10-80



[OSEv3:vars]                                                          
deployment_type=openshift-enterprise                                  

openshift_deployment_type=openshift-enterprise                        
openshift_release=v3.7                                                
openshift_docker_additional_registries=registry.reg-aws.openshift.com 


openshift_logging_install_logging=true                                
openshift_logging_master_url=https://ec2-34-230-25-109.compute-1.amazonaws.com:8443
openshift_logging_master_public_url=https://ec2-34-230-25-109.compute-1.amazonaws.com:8443
openshift_logging_kibana_hostname=kibana.apps.0730-osb.qe.rhcloud.com
openshift_logging_namespace=logging                                   
openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/                                                                
openshift_logging_image_version=v3.7                                  
openshift_logging_es_cluster_size=1                                   
openshift_logging_es_pvc_dynamic=true                                 
openshift_logging_es_pvc_size=25Gi                                    
openshift_logging_es_pvc_storage_class_name=gp2                       
openshift_logging_fluentd_read_from_head=false                        
openshift_logging_use_mux=true

Comment 15 Rich Megginson 2018-07-30 15:55:57 UTC

This means the mux pod cannot talk to the kubernetes api server.

Try this - oc rsh to the mux pod or oc debug if that doesn't work.

Then

echo $K8S_HOST_URL 

it is usually something like

https://kubernetes.default.svc.cluster.local

then

getent hosts kubernetes.default.svc.cluster.local

If that doesn't work, then this isn't a logging issue, it is a pod networking/dns issue.

Then

curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://kubernetes.default.svc.cluster.local

If that doesn't work, then hopefully the -v output will give us a clue.

Comment 16 Mike Fiedler 2018-07-30 17:40:44 UTC

sh-4.2# echo $K8S_HOST_URL
https://ec2-54-175-214-36.compute-1.amazonaws.com:8443

That's the public URL of the cluster on the load balancer

curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://kubernetes.default.svc.cluster.local

works fine (returns the list of endpoints)

curl-ing $K8S_HOST_URL fails

sh-4.2# curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt $K8S_HOST_URL                               
* About to connect() to ec2-54-175-214-36.compute-1.amazonaws.com port 8443 (#0)
*   Trying 172.18.15.142...
* No route to host
* Failed connect to ec2-54-175-214-36.compute-1.amazonaws.com:8443; No route to host
* Closing connection 0

Is K8S_HOST_URL being set wrong?   Should be the internal hostname?

Comment 17 Mike Fiedler 2018-07-30 17:44:20 UTC

Created attachment 1471609 [details]
inventory to install 3.7.60 with openshift-ansible 3.7.60

Comment 16 was a new cluster - different from comment 14.   Attaching the inventory.

Comment 18 Rich Megginson 2018-07-30 17:52:45 UTC

(In reply to Mike Fiedler from comment #16)
> sh-4.2# echo $K8S_HOST_URL
> https://ec2-54-175-214-36.compute-1.amazonaws.com:8443
> 
> That's the public URL of the cluster on the load balancer
> 
> curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
> https://kubernetes.default.svc.cluster.local
> 
> works fine (returns the list of endpoints)
> 
> curl-ing $K8S_HOST_URL fails
> 
> sh-4.2# curl -s -v --cacert
> /var/run/secrets/kubernetes.io/serviceaccount/ca.crt $K8S_HOST_URL          
> 
> * About to connect() to ec2-54-175-214-36.compute-1.amazonaws.com port 8443
> (#0)
> *   Trying 172.18.15.142...
> * No route to host
> * Failed connect to ec2-54-175-214-36.compute-1.amazonaws.com:8443; No route
> to host
> * Closing connection 0
> 
> Is K8S_HOST_URL being set wrong?   Should be the internal hostname?

I think it is being set wrong.  The value of K8S_HOST_URL should almost always be https://kubernetes.default.svc.cluster.local, the internal hostname.

Comment 20 Noriko Hosoi 2018-08-10 15:19:32 UTC

(In reply to Rich Megginson from comment #18)
> (In reply to Mike Fiedler from comment #16)
> > Is K8S_HOST_URL being set wrong?   Should be the internal hostname?
> 
> I think it is being set wrong.  The value of K8S_HOST_URL should almost
> always be https://kubernetes.default.svc.cluster.local, the internal
> hostname.

Hi Mike,

Did you happen to have a chance to retry with the suggested value?  Any updates?

Thanks!

Comment 21 Mike Fiedler 2018-10-08 18:09:29 UTC

This works fine in 3.11.  I am marking this fixed in 3.11 and if it needs to be cloned to a previous release, a copy can be made.