The test cluster was enabled HTTP_PROXY, and there was no trustedCA in the proxy. About the trustedCA, seems this field would always be in the proxy, I checked a cluster without any proxy enabled, there still have a trustedCA field. $ oc get proxy cluster -oyaml apiVersion: config.openshift.io/v1 kind: Proxy metadata: creationTimestamp: "2019-11-05T23:23:02Z" generation: 1 name: cluster resourceVersion: "433" selfLink: /apis/config.openshift.io/v1/proxies/cluster uid: d47d9096-65c2-409c-ba94-989a034d6caf spec: trustedCA: name: "" status: {}
The fluentd could connected to the elasticsearch in the expected times. I didn't see the HTTP_PROXY Env are enabled in fluentd pod, and I didn't see confirm map fluentd-trusted-ca-bundle was mounted into the pod. [anli@preserve-docker-slave anli43gcp]$ oc exec fluentd-p4jlb -- logs 2019-11-25 12:40:21 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.73.24:9200 (Errno::ECONNREFUSED) 2019-11-25 12:40:21 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 14. Retry to communicate after 2 second(s). 2019-11-25 12:40:26 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.73.24:9200 (Errno::ECONNREFUSED) 2019-11-25 12:40:26 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 13. Retry to communicate after 4 second(s). 2019-11-25 12:40:35 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.73.24:9200 (Errno::ECONNREFUSED) 2019-11-25 12:40:35 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 12. Retry to communicate after 8 second(s). 2019-11-25 12:40:52 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.73.24:9200 (Errno::ECONNREFUSED) 2019-11-25 12:40:52 +0000 [warn]: [retry_clo_default_output_es] Remaining retry: 11. Retry to communicate after 16 second(s). 2019-11-25 12:42:13 +0000 [warn]: dump an error event: error_class=Fluent::Plugin::ConcatFilter::TimeoutError error="Timeout flush: kubernetes.var.log.containers.kube-controller-manager-qeanli-zg8tp-m-1.c.openshift-qe.internal_openshift-kube-controller-manager_wait-for-cpc-host-port-943a07f12db859b475b8cab9138dd89ee45fa3c7471e1d565b0a09c71eab9036.log:default" location=nil tag="kubernetes.var.log.containers.kube-controller-manager-qeanli-zg8tp-m-1.c.openshift-qe.internal_openshift-kube-controller-manager_wait-for-cpc-host-port-943a07f12db859b475b8cab9138dd89ee45fa3c7471e1d565b0a09c71eab9036.log" time=2019-11-25 12:42:13.066705727 +0000 record={"time"=>"2019-11-25T12:16:22.288733868+00:00", "stream"=>"stdout", "logtag"=>"P", "log"=>"Waiting for port :10357 to be released."} 2019-11-25 12:42:14 +0000 [warn]: dump an error event: error_class=Fluent::Plugin::ConcatFilter::TimeoutError error="Timeout flush: kubernetes.var.log.containers.kube-apiserver-qeanli-zg8tp-m-1.c.openshift-qe.internal_openshift-kube-apiserver_setup-061062f41b239d2501c350c533ef9b88302d061d70f999092c54a9cceb520d55.log:default" location=nil tag="kubernetes.var.log.containers.kube-apiserver-qeanli-zg8tp-m-1.c.openshift-qe.internal_openshift-kube-apiserver_setup-061062f41b239d2501c350c533ef9b88302d061d70f999092c54a9cceb520d55.log" time=2019-11-25 12:42:14.065368339 +0000 record={"time"=>"2019-11-25T12:29:16.346910916+00:00", "stream"=>"stdout", "logtag"=>"P", "log"=>"Fixing audit permissions.\\'\\'Waiting for port :6443 and :6080 to be released."} 2019-11-25 12:42:14 +0000 [warn]: dump an error event: error_class=Fluent::Plugin::ConcatFilter::TimeoutError error="Timeout flush: kubernetes.var.log.containers.openshift-kube-scheduler-qeanli-zg8tp-m-1.c.openshift-qe.internal_openshift-kube-scheduler_wait-for-host-port-bf98d93bd2c648a0d4e4a2fc6ff3711ea7d62c1cb06e666ce9ffd998adb860f7.log:default" location=nil tag="kubernetes.var.log.containers.openshift-kube-scheduler-qeanli-zg8tp-m-1.c.openshift-qe.internal_openshift-kube-scheduler_wait-for-host-port-bf98d93bd2c648a0d4e4a2fc6ff3711ea7d62c1cb06e666ce9ffd998adb860f7.log" time=2019-11-25 12:42:14.065368339 +0000 record={"time"=>"2019-11-25T12:15:23.698894271+00:00", "stream"=>"stdout", "logtag"=>"P", "log"=>"Waiting for port :10259 and :10251 to be released."} 2019-11-25 12:42:14 +0000 [warn]: dump an error event: error_class=Fluent::Plugin::ConcatFilter::TimeoutError error="Timeout flush: kubernetes.var.log.containers.kube-controller-manager-qeanli-zg8tp-m-1.c.openshift-qe.internal_openshift-kube-controller-manager_wait-for-host-port-c411e5e2c2fa1ded3e330a22829d69cb5ccbf0629c92b72ce95bb9255b7d58cd.log:default" location=nil tag="kubernetes.var.log.containers.kube-controller-manager-qeanli-zg8tp-m-1.c.openshift-qe.internal_openshift-kube-controller-manager_wait-for-host-port-c411e5e2c2fa1ded3e330a22829d69cb5ccbf0629c92b72ce95bb9255b7d58cd.log" time=2019-11-25 12:42:14.065368339 +0000 record={"time"=>"2019-11-25T12:15:34.254766719+00:00", "stream"=>"stdout", "logtag"=>"P", "log"=>"Waiting for port :10257 to be released."}
Created attachment 1639470 [details] The logs and resource out
When shall we configure HTTP_PROXY and mount the configmap fluentd-trusted-ca-bundle?
> 2019-11-25 12:40:21 +0000 [warn]: [retry_clo_default_output_es] Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 172.30.73.24:9200 (Errno::ECONNREFUSED) this connection should not be going through the proxy. there should be a NO_PROXY env var set on the fluentd pods that includes the 172.30.0.0/16 network. 1) Need to confirm this is getting set on the fluentd pods correctly (the CL operator gets the value from the proxy config resource, so you can also confirm the proxy config resource on the cluster has the correct value in status.noProxy) 2) Need to confirm the fluentd transport libraries support CIDR format noproxy values. If they do not, we are going to need to find another way to ensure the requests from fluentd to elastic are not proxied. The fact that you didn't see an HTTP_PROXY var or configmap mounted is definitely a problem though. Can you share the proxy config resource from your cluster? (oc get proxy -o yaml)
The proxy is as following. By the ways, The proxy Envs have been configured in the CLO. [anli@preserve-docker-slave anli43gcp]$ oc get proxy -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: Proxy metadata: creationTimestamp: "2019-11-25T11:57:59Z" generation: 1 name: cluster resourceVersion: "431" selfLink: /apis/config.openshift.io/v1/proxies/cluster uid: a5f9e2c6-d55f-4deb-96a4-6995fe1cb6a3 spec: httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 trustedCA: name: "" status: httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qeanlip2.qe.gcp.devcluster.openshift.com,etcd-0.qeanlip2.qe.gcp.devcluster.openshift.com,etcd-1.qeanlip2.qe.gcp.devcluster.openshift.com,etcd-2.qeanlip2.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.
Verified with ose-cluster-logging-operator-v4.4.0-201912080435
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581