Bug 1866156
| Summary: | Fluentd can't connect to default ES after running for a few minutes. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> | ||||
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Anping Li <anli> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.6 | CC: | aos-bugs, mifiedle | ||||
| Target Milestone: | --- | Keywords: | TestBlocker | ||||
| Target Release: | 4.6.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-08-05 19:06:13 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1710453 [details]
ES logs
Please provide more information and a complete configuration [1]. Is there a cluster proxy enabled possibly? [1] https://github.com/openshift/cluster-logging-operator/tree/master/must-gather#usage Likely duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1864397 *** This bug has been marked as a duplicate of bug 1864397 *** No cluster proxy enabled. |
Description of problem: The fluentd can't connect to ES cluster after running for a few minutes: $ oc logs fluentd-jqz9b Setting each total_size_limit for 2 buffers to 9621286195 bytes Setting queued_chunks_limit_size for each buffer to 1146 Setting chunk_limit_size for each buffer to 8388608 2020-08-05 02:34:40 +0000 [warn]: parameter 'pos_file_compaction_interval' in <source> @type tail @id container-input path "/var/log/containers/*.log" exclude_path ["/var/log/containers/fluentd-*_openshift-logging_*.log","/var/log/containers/elasticsearch-*_openshift-logging_*.log","/var/log/containers/kibana-*_openshift-logging_*.log"] pos_file "/var/log/es-containers.log.pos" pos_file_compaction_interval 1800 refresh_interval 5 rotate_wait 5 tag "kubernetes.*" read_from_head true @label @CONCAT <parse> @type "multi_format" <pattern> format json time_format "%Y-%m-%dT%H:%M:%S.%N%Z" keep_time_key true time_type string </pattern> <pattern> format regexp expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/ time_format "%Y-%m-%dT%H:%M:%S.%N%:z" keep_time_key true </pattern> </parse> </source> is not used. 2020-08-05 02:34:40 +0000 [warn]: parameter 'undefined_to_replace_char' in <filter **> @type viaq_data_model elasticsearch_index_prefix_field "viaq_index_name" default_keep_fields CEE,time,@timestamp,aushape,ci_job,collectd,docker,fedora-ci,file,foreman,geoip,hostname,ipaddr4,ipaddr6,kubernetes,level,message,namespace_name,namespace_uuid,offset,openstack,ovirt,pid,pipeline_metadata,rsyslog,service,systemd,tags,testcase,tlog,viaq_msg_id extra_keep_fields keep_empty_fields message use_undefined false undefined_name "undefined" rename_time true rename_time_if_missing false src_time_name "time" dest_time_name "@timestamp" pipeline_type collector undefined_to_string false undefined_to_replace_char UNUSED undefined_max_num_fields -1 process_kubernetes_events false <formatter> tag "system.var.log**" type sys_var_log remove_keys "host,pid,ident" </formatter> <formatter> tag "journal.system**" type sys_journal remove_keys "log,stream,MESSAGE,_SOURCE_REALTIME_TIMESTAMP,__REALTIME_TIMESTAMP,CONTAINER_ID,CONTAINER_ID_FULL,CONTAINER_NAME,PRIORITY,_BOOT_ID,_CAP_EFFECTIVE,_CMDLINE,_COMM,_EXE,_GID,_HOSTNAME,_MACHINE_ID,_PID,_SELINUX_CONTEXT,_SYSTEMD_CGROUP,_SYSTEMD_SLICE,_SYSTEMD_UNIT,_TRANSPORT,_UID,_AUDIT_LOGINUID,_AUDIT_SESSION,_SYSTEMD_OWNER_UID,_SYSTEMD_SESSION,_SYSTEMD_USER_UNIT,CODE_FILE,CODE_FUNCTION,CODE_LINE,ERRNO,MESSAGE_ID,RESULT,UNIT,_KERNEL_DEVICE,_KERNEL_SUBSYSTEM,_UDEV_SYSNAME,_UDEV_DEVNODE,_UDEV_DEVLINK,SYSLOG_FACILITY,SYSLOG_IDENTIFIER,SYSLOG_PID" </formatter> <formatter> tag "kubernetes.journal.container**" type k8s_journal remove_keys "log,stream,MESSAGE,_SOURCE_REALTIME_TIMESTAMP,__REALTIME_TIMESTAMP,CONTAINER_ID,CONTAINER_ID_FULL,CONTAINER_NAME,PRIORITY,_BOOT_ID,_CAP_EFFECTIVE,_CMDLINE,_COMM,_EXE,_GID,_HOSTNAME,_MACHINE_ID,_PID,_SELINUX_CONTEXT,_SYSTEMD_CGROUP,_SYSTEMD_SLICE,_SYSTEMD_UNIT,_TRANSPORT,_UID,_AUDIT_LOGINUID,_AUDIT_SESSION,_SYSTEMD_OWNER_UID,_SYSTEMD_SESSION,_SYSTEMD_USER_UNIT,CODE_FILE,CODE_FUNCTION,CODE_LINE,ERRNO,MESSAGE_ID,RESULT,UNIT,_KERNEL_DEVICE,_KERNEL_SUBSYSTEM,_UDEV_SYSNAME,_UDEV_DEVNODE,_UDEV_DEVLINK,SYSLOG_FACILITY,SYSLOG_IDENTIFIER,SYSLOG_PID" </formatter> <formatter> tag "kubernetes.var.log.containers.eventrouter-** kubernetes.var.log.containers.cluster-logging-eventrouter-** k8s-audit.log** openshift-audit.log**" type k8s_json_file remove_keys "log,stream,CONTAINER_ID_FULL,CONTAINER_NAME" process_kubernetes_events true </formatter> <formatter> tag "kubernetes.var.log.containers**" type k8s_json_file remove_keys "log,stream,CONTAINER_ID_FULL,CONTAINER_NAME" </formatter> <elasticsearch_index_name> enabled true tag "journal.system** system.var.log** **_default_** **_kube-*_** **_openshift-*_** **_openshift_**" name_type static static_index_name "infra-write" </elasticsearch_index_name> <elasticsearch_index_name> enabled true tag "linux-audit.log** k8s-audit.log** openshift-audit.log**" name_type static static_index_name "audit-write" </elasticsearch_index_name> <elasticsearch_index_name> enabled true tag "**" name_type static static_index_name "app-write" </elasticsearch_index_name> </filter> is not used. 2020-08-05 02:36:00 +0000 [warn]: [default] failed to flush the buffer. retry_time=0 next_retry_seconds=2020-08-05 02:36:01 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.131.0.69\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:1015:in `rescue in send_bulk' 2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:977:in `send_bulk' 2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:804:in `block in write' 2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `each' 2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `write' 2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush' 2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run' 2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start' 2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create' 2020-08-05 02:36:01 +0000 [warn]: [default] retry succeeded. chunk_id="5ac16f25308b7394671c8ba4274c76ac" 2020-08-05 02:36:01 +0000 [warn]: [default] failed to flush the buffer. retry_time=0 next_retry_seconds=2020-08-05 02:36:02 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.128.2.35\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:01 +0000 [warn]: suppressed same stacktrace 2020-08-05 02:36:02 +0000 [warn]: [default] failed to flush the buffer. retry_time=1 next_retry_seconds=2020-08-05 02:36:03 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.131.0.69\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:1015:in `rescue in send_bulk' 2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:977:in `send_bulk' 2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:804:in `block in write' 2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `each' 2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `write' 2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush' 2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run' 2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start' 2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create' 2020-08-05 02:36:02 +0000 [warn]: [default] failed to flush the buffer. retry_time=2 next_retry_seconds=2020-08-05 02:36:04 +0000 chunk="5ac16f24d44fa557a1ec167826f1ec4a" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.129.2.72\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:02 +0000 [warn]: suppressed same stacktrace 2020-08-05 02:36:04 +0000 [warn]: [default] failed to flush the buffer. retry_time=3 next_retry_seconds=2020-08-05 02:36:09 +0000 chunk="5ac16f24d44fa557a1ec167826f1ec4a" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.128.2.35\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:04 +0000 [warn]: suppressed same stacktrace 2020-08-05 02:36:04 +0000 [warn]: [default] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-08-05 02:36:13 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.129.2.72\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:04 +0000 [warn]: suppressed same stacktrace 2020-08-05 02:36:13 +0000 [warn]: [default] failed to flush the buffer. retry_time=5 next_retry_seconds=2020-08-05 02:36:27 +0000 chunk="5ac16f24d44fa557a1ec167826f1ec4a" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.131.0.69\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:13 +0000 [warn]: suppressed same stacktrace 2020-08-05 02:36:13 +0000 [warn]: [default] failed to flush the buffer. retry_time=6 next_retry_seconds=2020-08-05 02:36:41 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.128.2.35\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:13 +0000 [warn]: suppressed same stacktrace 2020-08-05 02:36:41 +0000 [warn]: [default] failed to flush the buffer. retry_time=7 next_retry_seconds=2020-08-05 02:37:49 +0000 chunk="5ac16f24d44fa557a1ec167826f1ec4a" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.129.2.72\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:41 +0000 [warn]: suppressed same stacktrace 2020-08-05 02:36:41 +0000 [warn]: [default] failed to flush the buffer. retry_time=8 next_retry_seconds=2020-08-05 02:38:59 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.131.0.69\" does not match the server certificate (OpenSSL::SSL::SSLError)" 2020-08-05 02:36:41 +0000 [warn]: suppressed same stacktrace Version-Release number of selected component (if applicable): $ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.4.6.0-202008040110.p0 Cluster Logging 4.6.0-202008040110.p0 Succeeded elasticsearch-operator.4.6.0-202008040915.p0 Elasticsearch Operator 4.6.0-202008040915.p0 Succeeded cluster version: 4.6.0-0.nightly-2020-08-04-210224 How reproducible: Always Steps to Reproduce: 1. deploy logging, the clusterlogging CR is: apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" retentionPolicy: application: maxAge: 1d infra: maxAge: 3h audit: maxAge: 2w elasticsearch: nodeCount: 3 redundancyPolicy: "SingleRedundancy" resources: requests: memory: "2Gi" storage: storageClassName: "gp2" size: "20Gi" visualization: type: "kibana" kibana: proxy: resources: limits: memory: "1Gi" requests: cpu: "100m" memory: "1Gi" resources: limits: cpu: "1000m" memory: "4Gi" requests: cpu: "800m" memory: "2Gi" replicas: 1 collection: logs: type: "fluentd" fluentd: {} 2. wait for several minutes, check data in ES 3. check fluentd pod log Actual results: Expected results: Additional info: