Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1536651

Summary: logging-mux not working in 3.7.z when logging installed with openshift_logging_use_mux=true
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: LoggingAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED CURRENTRELEASE QA Contact: Qiaoling Tang <qitang>
Severity: medium Docs Contact:
Priority: low    
Version: 3.7.1CC: anli, aos-bugs, jcantril, mifiedle, nhosoi, rmeggins
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When MUX is configured, the mux client fluentd is supposed to set maxmal or minimal to the environment variable MUX_CLIENT_MODE. Without the environment variable set to either of the values, the client fluentd does not forward the logs to the MUX server, but sends them to the ElasticSearch. The environment variable is set by the ansible variable openshift_logging_mux_client_mode. The variable did not have a default value and setting it was responsibility of the person who deploys the logging system. If it was not set to minimal or maximal, although the MUX server is installed, it is not used, but the client fluentd directly sends logs to the ElasticSearch. To reduce the confusion, openshift_logging_mux_client_mode is set to maximal.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-08 18:09:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1502764    
Attachments:
Description Flags
inventory to install 3.7.60 with openshift-ansible 3.7.60 none

Description Mike Fiedler 2018-01-19 20:33:05 UTC
Description of problem:

Installed logging via openshift-ansible 3.7.23 with openshift_logging_use_mux=true in the inventory (full inventory below)

The logging dc is created correctly and the logging-mux pod is running after the install completes.   However, the logging-fluentd pods are not configured to forward through the logging-mux service.   Pod logs still go directly to elasticsearch.   Proof was setting the number of logging-mux replicas to 0 and verifying all pod logs were still indexed in ES.

I was trying to run logging-mux in 3.7 to verify the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1502764.   Verifying this fix in 3.9 is currently blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1531157

Version-Release number of selected component (if applicable): 3.7.23


How reproducible: Always


Steps to Reproduce:
1. Install a cluster with 3.7.23.  Install openshift-ansible 3.7.23
2. Install logging with the inventory below.  Adjust hostnames as needed.
3. Verify the logging-mux pod is running.  Verify no errors in the logging-mux pod.
4. oc scale --replicas=0 dc/logging-mux.   Verify no mux pod running
5. Create pods which log stdout messages and verify the messages are indexed in elasticsearch

Note:  ss -tnpi in the logging-fluentd pods used to be a reliable way to view connections from fluentd to ES or fluentd to logging-mux but this seems to not work any longer.  Investigating this issue separately.


Actual results:

Pod logs go directly from logging-fluentd to elasticsearch.  logging-mux is bypassed.


Expected results:

Logs are forwarded through logging-mux to elasticsearch

Additional info:


[OSEv3:children]                             
masters                                      
etcd                                         

[masters]                                    
ip-172-31-19-165                             

[etcd]                                       
ip-172-31-19-165                             

[OSEv3:vars]                                 
deployment_type=openshift-enterprise         

openshift_deployment_type=openshift-enterprise                                             
openshift_release=v3.8                       
openshift_docker_additional_registries=registry.reg-aws.openshift.com                      


openshift_logging_install_logging=true       
openshift_logging_master_url=https://ec2-54-149-169-9.us-west-2.compute.amazonaws.com:8443 
openshift_logging_master_public_url=https://ec2-54-149-169-9.us-west-2.compute.amazonaws.com:8443                                                                                     
openshift_logging_kibana_hostname=kibana.apps.0115-yc8.qe.rhcloud.com                      
openshift_logging_namespace=logging          
openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/              
openshift_logging_image_version=v3.9         
openshift_logging_es_cluster_size=1          
openshift_logging_es_pvc_dynamic=true        
openshift_logging_es_pvc_size=50Gi                                                    
openshift_logging_fluentd_read_from_head=false                                             
openshift_logging_use_mux=true

Comment 1 Noriko Hosoi 2018-01-20 00:33:28 UTC
Hi Mike, could you retry adding this to your inventory file?
openshift_logging_mux_client_mode=maximal

Comment 2 Mike Fiedler 2018-01-22 18:45:24 UTC
Setting openshift_logging_mux_client_mode=maximal worked.   Maybe make that the default?

Comment 3 Noriko Hosoi 2018-02-14 21:32:11 UTC
Hi Rich,

We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal default on the mux client fluentd.

For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and USE_MUX is not set or false, do you think it's safe to assume the fluentd is a mux client and set MUX_CLIENT_MODE=maximal?

Comment 4 Rich Megginson 2018-02-14 21:54:31 UTC
(In reply to Noriko Hosoi from comment #3)
> Hi Rich,
> 
> We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal
> default on the mux client fluentd.
> 
> For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and
> USE_MUX is not set or false, do you think it's safe to assume the fluentd is
> a mux client and set MUX_CLIENT_MODE=maximal?

Yes.  If using mux, the default should be MUX_CLIENT_MODE=maximal

Comment 5 Noriko Hosoi 2018-02-16 22:16:38 UTC
(In reply to Rich Megginson from comment #4)
> (In reply to Noriko Hosoi from comment #3)
> > Hi Rich,
> > 
> > We'd like to set openshift_logging_mux_client_mode/MUX_CLIENT_MODE=maximal
> > default on the mux client fluentd.
> > 
> > For instance, if LOGGING_MUX_SERVICE_HOST (and/or _PORT, etc.) is set and
> > USE_MUX is not set or false, do you think it's safe to assume the fluentd is
> > a mux client and set MUX_CLIENT_MODE=maximal?
> 
> Yes.  If using mux, the default should be MUX_CLIENT_MODE=maximal

https://github.com/openshift/origin-aggregated-logging/pull/960
https://github.com/openshift/openshift-ansible/pull/7192

Comment 6 Noriko Hosoi 2018-02-20 01:46:53 UTC
Correction.

Updated this O_A pr:
https://github.com/openshift/openshift-ansible/pull/7192

Closed this O_A_L pr since it's not necessary.
https://github.com/openshift/origin-aggregated-logging/pull/960

Comment 7 openshift-github-bot 2018-02-26 23:53:34 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/013da2143348dbd23761bcf9ac86912f9903181f
Bug 1536651 - logging-mux not working in 3.7.z when logging installed with openshift_logging_use_mux=true

Setting openshift_logging_use_mux=maximal by default.
If the cluster is not configured with mux, this default value is going to be ignored.

https://github.com/openshift/openshift-ansible/commit/1576f39dcf5865578e2baffd2a4af8120469f679
Merge pull request #7192 from nhosoi/bz1536651

Automatic merge from submit-queue.

Bug 1536651 - logging-mux not working in 3.7.z when logging installed…

… with openshift_logging_use_mux=true

To set MUX_CLIENT_MODE to maximal by default for the mux client, changing the
/etc/fluent/muxkeys mounting condition so that if openshift_logging_use_mux
or openshift_logging_mux_allow_external is set to true, /etc/fluent/muxkeys
is mounted on the collector fluentd.

This openshift-ansible pr is needed for https://github.com/openshift/origin-aggregated-logging/pull/960

Comment 9 Qiaoling Tang 2018-04-17 09:21:14 UTC
The fix isn't in the openshift3/logging-fluentd:3.7.42-3

Comment 10 Noriko Hosoi 2018-04-17 17:28:41 UTC
(In reply to Qiaoling Tang from comment #9)
> The fix isn't in the openshift3/logging-fluentd:3.7.42-3

Indeed, this pr/7562 failed in the ci-tests and was not merged into the openshift-ansible release-3.7 branch.
https://github.com/openshift/openshift-ansible/pull/7562

Note: merged into the master and release-3.9 branches.

Comment 11 Noriko Hosoi 2018-04-17 20:28:25 UTC
Thanks to @ewolinetz, pr/7562 has been merged into the upstream git.

Comment 14 Mike Fiedler 2018-07-30 15:11:26 UTC
Tested with logging v3.7.60.  The logging-mux pod is stuck in CrashLoopBackoff 

2018-07-30 11:06:15 -0400 [info]: reading config file path="/etc/fluent/fluent.conf"
2018-07-30 11:06:16 -0400 [error]: unexpected error error="No route to host - connect(2)"
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `initialize'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `open'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/timeout.rb:52:in `timeout'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:877:in `connect'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:862:in `do_start'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/ruby/net/http.rb:851:in `start'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/rest-client-2.0.2/lib/restclient/resource.rb:51:in `get'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:328:in `block in api'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:58:in `handle_exception'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:327:in `api'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/kubeclient-1.1.4/lib/kubeclient/common.rb:322:in `api_valid?'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-1.0.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:227:in `configure'                                                                                                              
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:145:in `add_filter'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:62:in `block in configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `each'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/agent.rb:57:in `configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `block in configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `each'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/root_agent.rb:83:in `configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:129:in `configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/engine.rb:103:in `run_configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:489:in `run_configure'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:174:in `block in start'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `call'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:366:in `main_process'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/supervisor.rb:170:in `start'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/lib/fluent/command/fluentd.rb:173:in `<top (required)>'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-07-30 11:06:16 -0400 [error]: /usr/share/gems/gems/fluentd-0.12.42/bin/fluentd:8:in `<top (required)>'
  2018-07-30 11:06:16 -0400 [error]: /usr/bin/fluentd:23:in `load'
  2018-07-30 11:06:16 -0400 [error]: /usr/bin/fluentd:23:in `<main>'

Inventory:

[OSEv3:children]                                                      
masters                                                               
etcd                                                                  


[masters]                                                             
ip-172-18-10-80

[etcd]                                                                
ip-172-18-10-80



[OSEv3:vars]                                                          
deployment_type=openshift-enterprise                                  

openshift_deployment_type=openshift-enterprise                        
openshift_release=v3.7                                                
openshift_docker_additional_registries=registry.reg-aws.openshift.com 


openshift_logging_install_logging=true                                
openshift_logging_master_url=https://ec2-34-230-25-109.compute-1.amazonaws.com:8443
openshift_logging_master_public_url=https://ec2-34-230-25-109.compute-1.amazonaws.com:8443
openshift_logging_kibana_hostname=kibana.apps.0730-osb.qe.rhcloud.com
openshift_logging_namespace=logging                                   
openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/                                                                
openshift_logging_image_version=v3.7                                  
openshift_logging_es_cluster_size=1                                   
openshift_logging_es_pvc_dynamic=true                                 
openshift_logging_es_pvc_size=25Gi                                    
openshift_logging_es_pvc_storage_class_name=gp2                       
openshift_logging_fluentd_read_from_head=false                        
openshift_logging_use_mux=true

Comment 15 Rich Megginson 2018-07-30 15:55:57 UTC
This means the mux pod cannot talk to the kubernetes api server.

Try this - oc rsh to the mux pod or oc debug if that doesn't work.

Then

echo $K8S_HOST_URL 

it is usually something like

https://kubernetes.default.svc.cluster.local

then

getent hosts kubernetes.default.svc.cluster.local

If that doesn't work, then this isn't a logging issue, it is a pod networking/dns issue.

Then

curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://kubernetes.default.svc.cluster.local

If that doesn't work, then hopefully the -v output will give us a clue.

Comment 16 Mike Fiedler 2018-07-30 17:40:44 UTC
sh-4.2# echo $K8S_HOST_URL
https://ec2-54-175-214-36.compute-1.amazonaws.com:8443

That's the public URL of the cluster on the load balancer

curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://kubernetes.default.svc.cluster.local

works fine (returns the list of endpoints)

curl-ing $K8S_HOST_URL fails

sh-4.2# curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt $K8S_HOST_URL                               
* About to connect() to ec2-54-175-214-36.compute-1.amazonaws.com port 8443 (#0)
*   Trying 172.18.15.142...
* No route to host
* Failed connect to ec2-54-175-214-36.compute-1.amazonaws.com:8443; No route to host
* Closing connection 0

Is K8S_HOST_URL being set wrong?   Should be the internal hostname?

Comment 17 Mike Fiedler 2018-07-30 17:44:20 UTC
Created attachment 1471609 [details]
inventory to install 3.7.60 with openshift-ansible 3.7.60

Comment 16 was a new cluster - different from comment 14.   Attaching the inventory.

Comment 18 Rich Megginson 2018-07-30 17:52:45 UTC
(In reply to Mike Fiedler from comment #16)
> sh-4.2# echo $K8S_HOST_URL
> https://ec2-54-175-214-36.compute-1.amazonaws.com:8443
> 
> That's the public URL of the cluster on the load balancer
> 
> curl -s -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
> https://kubernetes.default.svc.cluster.local
> 
> works fine (returns the list of endpoints)
> 
> curl-ing $K8S_HOST_URL fails
> 
> sh-4.2# curl -s -v --cacert
> /var/run/secrets/kubernetes.io/serviceaccount/ca.crt $K8S_HOST_URL          
> 
> * About to connect() to ec2-54-175-214-36.compute-1.amazonaws.com port 8443
> (#0)
> *   Trying 172.18.15.142...
> * No route to host
> * Failed connect to ec2-54-175-214-36.compute-1.amazonaws.com:8443; No route
> to host
> * Closing connection 0
> 
> Is K8S_HOST_URL being set wrong?   Should be the internal hostname?

I think it is being set wrong.  The value of K8S_HOST_URL should almost always be https://kubernetes.default.svc.cluster.local, the internal hostname.

Comment 20 Noriko Hosoi 2018-08-10 15:19:32 UTC
(In reply to Rich Megginson from comment #18)
> (In reply to Mike Fiedler from comment #16)
> > Is K8S_HOST_URL being set wrong?   Should be the internal hostname?
> 
> I think it is being set wrong.  The value of K8S_HOST_URL should almost
> always be https://kubernetes.default.svc.cluster.local, the internal
> hostname.

Hi Mike,

Did you happen to have a chance to retry with the suggested value?  Any updates?

Thanks!

Comment 21 Mike Fiedler 2018-10-08 18:09:29 UTC
This works fine in 3.11.  I am marking this fixed in 3.11 and if it needs to be cloned to a previous release, a copy can be made.