Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1866156

Summary: Fluentd can't connect to default ES after running for a few minutes.
Product: OpenShift Container Platform Reporter: Qiaoling Tang <qitang>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED DUPLICATE QA Contact: Anping Li <anli>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.6CC: aos-bugs, mifiedle
Target Milestone: ---Keywords: TestBlocker
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-05 19:06:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ES logs none

Description Qiaoling Tang 2020-08-05 02:42:04 UTC
Description of problem:
The fluentd can't connect to ES cluster after running for a few minutes:
$ oc logs fluentd-jqz9b
Setting each total_size_limit for 2 buffers to 9621286195 bytes
Setting queued_chunks_limit_size for each buffer to 1146
Setting chunk_limit_size for each buffer to 8388608
2020-08-05 02:34:40 +0000 [warn]: parameter 'pos_file_compaction_interval' in <source>
  @type tail
  @id container-input
  path "/var/log/containers/*.log"
  exclude_path ["/var/log/containers/fluentd-*_openshift-logging_*.log","/var/log/containers/elasticsearch-*_openshift-logging_*.log","/var/log/containers/kibana-*_openshift-logging_*.log"]
  pos_file "/var/log/es-containers.log.pos"
  pos_file_compaction_interval 1800
  refresh_interval 5
  rotate_wait 5
  tag "kubernetes.*"
  read_from_head true
  @label @CONCAT
  <parse>
    @type "multi_format"
    <pattern>
      format json
      time_format "%Y-%m-%dT%H:%M:%S.%N%Z"
      keep_time_key true
      time_type string
    </pattern>
    <pattern>
      format regexp
      expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
      time_format "%Y-%m-%dT%H:%M:%S.%N%:z"
      keep_time_key true
    </pattern>
  </parse>
</source> is not used.
2020-08-05 02:34:40 +0000 [warn]: parameter 'undefined_to_replace_char' in <filter **>
  @type viaq_data_model
  elasticsearch_index_prefix_field "viaq_index_name"
  default_keep_fields CEE,time,@timestamp,aushape,ci_job,collectd,docker,fedora-ci,file,foreman,geoip,hostname,ipaddr4,ipaddr6,kubernetes,level,message,namespace_name,namespace_uuid,offset,openstack,ovirt,pid,pipeline_metadata,rsyslog,service,systemd,tags,testcase,tlog,viaq_msg_id
  extra_keep_fields 
  keep_empty_fields message
  use_undefined false
  undefined_name "undefined"
  rename_time true
  rename_time_if_missing false
  src_time_name "time"
  dest_time_name "@timestamp"
  pipeline_type collector
  undefined_to_string false
  undefined_to_replace_char UNUSED
  undefined_max_num_fields -1
  process_kubernetes_events false
  <formatter>
    tag "system.var.log**"
    type sys_var_log
    remove_keys "host,pid,ident"
  </formatter>
  <formatter>
    tag "journal.system**"
    type sys_journal
    remove_keys "log,stream,MESSAGE,_SOURCE_REALTIME_TIMESTAMP,__REALTIME_TIMESTAMP,CONTAINER_ID,CONTAINER_ID_FULL,CONTAINER_NAME,PRIORITY,_BOOT_ID,_CAP_EFFECTIVE,_CMDLINE,_COMM,_EXE,_GID,_HOSTNAME,_MACHINE_ID,_PID,_SELINUX_CONTEXT,_SYSTEMD_CGROUP,_SYSTEMD_SLICE,_SYSTEMD_UNIT,_TRANSPORT,_UID,_AUDIT_LOGINUID,_AUDIT_SESSION,_SYSTEMD_OWNER_UID,_SYSTEMD_SESSION,_SYSTEMD_USER_UNIT,CODE_FILE,CODE_FUNCTION,CODE_LINE,ERRNO,MESSAGE_ID,RESULT,UNIT,_KERNEL_DEVICE,_KERNEL_SUBSYSTEM,_UDEV_SYSNAME,_UDEV_DEVNODE,_UDEV_DEVLINK,SYSLOG_FACILITY,SYSLOG_IDENTIFIER,SYSLOG_PID"
  </formatter>
  <formatter>
    tag "kubernetes.journal.container**"
    type k8s_journal
    remove_keys "log,stream,MESSAGE,_SOURCE_REALTIME_TIMESTAMP,__REALTIME_TIMESTAMP,CONTAINER_ID,CONTAINER_ID_FULL,CONTAINER_NAME,PRIORITY,_BOOT_ID,_CAP_EFFECTIVE,_CMDLINE,_COMM,_EXE,_GID,_HOSTNAME,_MACHINE_ID,_PID,_SELINUX_CONTEXT,_SYSTEMD_CGROUP,_SYSTEMD_SLICE,_SYSTEMD_UNIT,_TRANSPORT,_UID,_AUDIT_LOGINUID,_AUDIT_SESSION,_SYSTEMD_OWNER_UID,_SYSTEMD_SESSION,_SYSTEMD_USER_UNIT,CODE_FILE,CODE_FUNCTION,CODE_LINE,ERRNO,MESSAGE_ID,RESULT,UNIT,_KERNEL_DEVICE,_KERNEL_SUBSYSTEM,_UDEV_SYSNAME,_UDEV_DEVNODE,_UDEV_DEVLINK,SYSLOG_FACILITY,SYSLOG_IDENTIFIER,SYSLOG_PID"
  </formatter>
  <formatter>
    tag "kubernetes.var.log.containers.eventrouter-** kubernetes.var.log.containers.cluster-logging-eventrouter-** k8s-audit.log** openshift-audit.log**"
    type k8s_json_file
    remove_keys "log,stream,CONTAINER_ID_FULL,CONTAINER_NAME"
    process_kubernetes_events true
  </formatter>
  <formatter>
    tag "kubernetes.var.log.containers**"
    type k8s_json_file
    remove_keys "log,stream,CONTAINER_ID_FULL,CONTAINER_NAME"
  </formatter>
  <elasticsearch_index_name>
    enabled true
    tag "journal.system** system.var.log** **_default_** **_kube-*_** **_openshift-*_** **_openshift_**"
    name_type static
    static_index_name "infra-write"
  </elasticsearch_index_name>
  <elasticsearch_index_name>
    enabled true
    tag "linux-audit.log** k8s-audit.log** openshift-audit.log**"
    name_type static
    static_index_name "audit-write"
  </elasticsearch_index_name>
  <elasticsearch_index_name>
    enabled true
    tag "**"
    name_type static
    static_index_name "app-write"
  </elasticsearch_index_name>
</filter> is not used.
2020-08-05 02:36:00 +0000 [warn]: [default] failed to flush the buffer. retry_time=0 next_retry_seconds=2020-08-05 02:36:01 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.131.0.69\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:1015:in `rescue in send_bulk'
  2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:977:in `send_bulk'
  2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:804:in `block in write'
  2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `each'
  2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `write'
  2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2020-08-05 02:36:00 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-08-05 02:36:01 +0000 [warn]: [default] retry succeeded. chunk_id="5ac16f25308b7394671c8ba4274c76ac"
2020-08-05 02:36:01 +0000 [warn]: [default] failed to flush the buffer. retry_time=0 next_retry_seconds=2020-08-05 02:36:02 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.128.2.35\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:01 +0000 [warn]: suppressed same stacktrace
2020-08-05 02:36:02 +0000 [warn]: [default] failed to flush the buffer. retry_time=1 next_retry_seconds=2020-08-05 02:36:03 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.131.0.69\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:1015:in `rescue in send_bulk'
  2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:977:in `send_bulk'
  2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:804:in `block in write'
  2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `each'
  2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluent-plugin-elasticsearch-4.1.1/lib/fluent/plugin/out_elasticsearch.rb:803:in `write'
  2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2020-08-05 02:36:02 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-08-05 02:36:02 +0000 [warn]: [default] failed to flush the buffer. retry_time=2 next_retry_seconds=2020-08-05 02:36:04 +0000 chunk="5ac16f24d44fa557a1ec167826f1ec4a" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.129.2.72\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:02 +0000 [warn]: suppressed same stacktrace
2020-08-05 02:36:04 +0000 [warn]: [default] failed to flush the buffer. retry_time=3 next_retry_seconds=2020-08-05 02:36:09 +0000 chunk="5ac16f24d44fa557a1ec167826f1ec4a" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.128.2.35\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:04 +0000 [warn]: suppressed same stacktrace
2020-08-05 02:36:04 +0000 [warn]: [default] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-08-05 02:36:13 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.129.2.72\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:04 +0000 [warn]: suppressed same stacktrace
2020-08-05 02:36:13 +0000 [warn]: [default] failed to flush the buffer. retry_time=5 next_retry_seconds=2020-08-05 02:36:27 +0000 chunk="5ac16f24d44fa557a1ec167826f1ec4a" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.131.0.69\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:13 +0000 [warn]: suppressed same stacktrace
2020-08-05 02:36:13 +0000 [warn]: [default] failed to flush the buffer. retry_time=6 next_retry_seconds=2020-08-05 02:36:41 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.128.2.35\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:13 +0000 [warn]: suppressed same stacktrace
2020-08-05 02:36:41 +0000 [warn]: [default] failed to flush the buffer. retry_time=7 next_retry_seconds=2020-08-05 02:37:49 +0000 chunk="5ac16f24d44fa557a1ec167826f1ec4a" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.129.2.72\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:41 +0000 [warn]: suppressed same stacktrace
2020-08-05 02:36:41 +0000 [warn]: [default] failed to flush the buffer. retry_time=8 next_retry_seconds=2020-08-05 02:38:59 +0000 chunk="5ac16f257911daea9f5412ad9bd8b4c8" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.openshift-logging.svc.cluster.local\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): hostname \"10.131.0.69\" does not match the server certificate (OpenSSL::SSL::SSLError)"
  2020-08-05 02:36:41 +0000 [warn]: suppressed same stacktrace


Version-Release number of selected component (if applicable):
$ oc get csv
NAME                                           DISPLAY                  VERSION                 REPLACES   PHASE
clusterlogging.4.6.0-202008040110.p0           Cluster Logging          4.6.0-202008040110.p0              Succeeded
elasticsearch-operator.4.6.0-202008040915.p0   Elasticsearch Operator   4.6.0-202008040915.p0              Succeeded

cluster version: 4.6.0-0.nightly-2020-08-04-210224 

How reproducible:
Always

Steps to Reproduce:
1. deploy logging, the clusterlogging CR is:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 3h
      audit:
        maxAge: 2w
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: "SingleRedundancy"
      resources:
        requests:
          memory: "2Gi"
      storage:
        storageClassName: "gp2"
        size: "20Gi"
  visualization:
    type: "kibana"
    kibana:
      proxy:
        resources:
          limits:
            memory: "1Gi"
          requests:
            cpu: "100m"
            memory: "1Gi"
      resources:
        limits:
          cpu: "1000m"
          memory: "4Gi"
        requests:
          cpu: "800m"
          memory: "2Gi"
      replicas: 1
  collection:
    logs:
      type: "fluentd"
      fluentd: {}

2. wait for several minutes, check data in ES
3. check fluentd pod log

Actual results:


Expected results:


Additional info:

Comment 1 Qiaoling Tang 2020-08-05 02:54:06 UTC
Created attachment 1710453 [details]
ES logs

Comment 2 Jeff Cantrill 2020-08-05 15:14:06 UTC
Please provide more information and a complete configuration [1].  Is there a cluster proxy enabled possibly?

[1] https://github.com/openshift/cluster-logging-operator/tree/master/must-gather#usage

Comment 3 Mike Fiedler 2020-08-05 18:29:51 UTC
Likely duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1864397

Comment 4 Jeff Cantrill 2020-08-05 19:06:13 UTC

*** This bug has been marked as a duplicate of bug 1864397 ***

Comment 5 Qiaoling Tang 2020-08-06 01:31:42 UTC
No cluster proxy enabled.