Bug 1914408
| Summary: | Fluentd Pods remain Init:CrashLoopBackOff status when deploying fluentd standalone | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> |
| Status: | CLOSED ERRATA | QA Contact: | Giriyamma <gkarager> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.5 | CC: | aos-bugs, gkarager, jcantril |
| Target Milestone: | --- | ||
| Target Release: | 4.6.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | logging-core | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-22 13:46:16 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1903912 | ||
| Bug Blocks: | |||
(In reply to Giriyamma from comment #2) > Verifying this issue using: > Cluster version: 4.6.0-0.nightly-2021-01-28-234643 > CLO: clusterlogging.4.6.0-202101290552.p0 > EO: elasticsearch-operator.4.6.0-202101290552.p0 > > The fluentd nodes are stuck in CrashLoopBackOff state. Please provide the fluentd.conf as you may have unrelated issues. > 3.oc get pods > NAME READY STATUS > RESTARTS AGE > cluster-logging-operator-58c74cc595-ndgvv 1/1 Running 0 > 101m > fluentd-64jlx 0/1 CrashLoopBackOff 12 > 37m > fluentd-67xk9 0/1 CrashLoopBackOff 12 > 37m > fluentd-79cg6 0/1 CrashLoopBackOff 12 > 37m > fluentd-9pp8k 0/1 CrashLoopBackOff 12 > 37m > fluentd-cdbjs 0/1 CrashLoopBackOff 12 > 37m > fluentd-nzd7h 0/1 CrashLoopBackOff 12 > 37m Note the reported issue is 'Init:CrashLoopBackOff' which is the init container is in crashloop which does not exist here. This is the main container which may be a separate issue but on the surface the original issue was addressed Here is the fluentd.conf:
## CLO GENERATED CONFIGURATION ###
# This file is a copy of the fluentd configuration entrypoint
# which should normally be supplied in a configmap.
<system>
log_level "#{ENV['LOG_LEVEL'] || 'warn'}"
</system>
# In each section below, pre- and post- includes don't include anything initially;
# they exist to enable future additions to openshift conf as needed.
## sources
## ordered so that syslog always runs last...
<source>
@type prometheus
bind "#{ENV['POD_IP']}"
<ssl>
enable true
certificate_path "#{ENV['METRICS_CERT'] || '/etc/fluent/metrics/tls.crt'}"
private_key_path "#{ENV['METRICS_KEY'] || '/etc/fluent/metrics/tls.key'}"
</ssl>
</source>
<source>
@type prometheus_monitor
<labels>
hostname ${hostname}
</labels>
</source>
# excluding prometheus_tail_monitor
# since it leaks namespace/pod info
# via file paths
# This is considered experimental by the repo
<source>
@type prometheus_output_monitor
<labels>
hostname ${hostname}
</labels>
</source>
#journal logs to gather node
<source>
@type systemd
@id systemd-input
@label @MEASURE
path '/var/log/journal'
<storage>
@type local
persistent true
# NOTE: if this does not end in .json, fluentd will think it
# is the name of a directory - see fluentd storage_local.rb
path '/var/log/journal_pos.json'
</storage>
matches "#{ENV['JOURNAL_FILTERS_JSON'] || '[]'}"
tag journal
read_from_head "#{if (val = ENV.fetch('JOURNAL_READ_FROM_HEAD','')) && (val.length > 0); val; else 'false'; end}"
</source>
# container logs
<source>
@type tail
@id container-input
path "/var/log/containers/*.log"
exclude_path ["/var/log/containers/fluentd-*_openshift-logging_*.log", "/var/log/containers/elasticsearch-*_openshift-logging_*.log", "/var/log/containers/kibana-*_openshift-logging_*.log"]
pos_file "/var/log/es-containers.log.pos"
refresh_interval 5
rotate_wait 5
tag kubernetes.*
read_from_head "true"
@label @MEASURE
<parse>
@type multi_format
<pattern>
format json
time_format '%Y-%m-%dT%H:%M:%S.%N%Z'
keep_time_key true
</pattern>
<pattern>
format regexp
expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
keep_time_key true
</pattern>
</parse>
</source>
# linux audit logs
<source>
@type tail
@id audit-input
@label @MEASURE
path "#{ENV['AUDIT_FILE'] || '/var/log/audit/audit.log'}"
pos_file "#{ENV['AUDIT_POS_FILE'] || '/var/log/audit/audit.log.pos'}"
tag linux-audit.log
<parse>
@type viaq_host_audit
</parse>
</source>
# k8s audit logs
<source>
@type tail
@id k8s-audit-input
@label @MEASURE
path "#{ENV['K8S_AUDIT_FILE'] || '/var/log/kube-apiserver/audit.log'}"
pos_file "#{ENV['K8S_AUDIT_POS_FILE'] || '/var/log/kube-apiserver/audit.log.pos'}"
tag k8s-audit.log
<parse>
@type json
time_key requestReceivedTimestamp
# In case folks want to parse based on the requestReceivedTimestamp key
keep_time_key true
time_format %Y-%m-%dT%H:%M:%S.%N%z
</parse>
</source>
# Openshift audit logs
<source>
@type tail
@id openshift-audit-input
@label @MEASURE
path /var/log/oauth-apiserver/audit.log,/var/log/openshift-apiserver/audit.log
pos_file /var/log/oauth-apiserver.audit.log
tag openshift-audit.log
<parse>
@type json
time_key requestReceivedTimestamp
# In case folks want to parse based on the requestReceivedTimestamp key
keep_time_key true
time_format %Y-%m-%dT%H:%M:%S.%N%z
</parse>
</source>
<label @MEASURE>
<filter **>
@type record_transformer
enable_ruby
<record>
msg_size ${record.to_s.length}
</record>
</filter>
<filter **>
@type prometheus
<metric>
name cluster_logging_collector_input_record_total
type counter
desc The total number of incoming records
<labels>
tag ${tag}
hostname ${hostname}
</labels>
</metric>
</filter>
<filter **>
@type prometheus
<metric>
name cluster_logging_collector_input_record_bytes
type counter
desc The total bytes of incoming records
key msg_size
<labels>
tag ${tag}
hostname ${hostname}
</labels>
</metric>
</filter>
<filter **>
@type record_transformer
remove_keys msg_size
</filter>
<match journal>
@type relabel
@label @INGRESS
</match>
<match *audit.log>
@type relabel
@label @INGRESS
</match>
<match kubernetes.**>
@type relabel
@label @CONCAT
</match>
</label>
<label @CONCAT>
<filter kubernetes.**>
@type concat
key log
partial_key logtag
partial_value P
separator ''
</filter>
<match kubernetes.**>
@type relabel
@label @INGRESS
</match>
</label>
#syslog input config here
<label @INGRESS>
## filters
<filter **>
@type record_modifier
char_encoding utf-8
</filter>
<filter journal>
@type grep
<exclude>
key PRIORITY
pattern ^7$
</exclude>
</filter>
<match journal>
@type rewrite_tag_filter
# skip to @INGRESS label section
@label @INGRESS
# see if this is a kibana container for special log handling
# looks like this:
# k8s_kibana.a67f366_logging-kibana-1-d90e3_logging_26c51a61-2835-11e6-ad29-fa163e4944d5_f0db49a2
# we filter these logs through the kibana_transform.conf filter
<rule>
key CONTAINER_NAME
pattern ^k8s_kibana\.
tag kubernetes.journal.container.kibana
</rule>
<rule>
key CONTAINER_NAME
pattern ^k8s_[^_]+_logging-eventrouter-[^_]+_
tag kubernetes.journal.container._default_.kubernetes-event
</rule>
# mark logs from default namespace for processing as k8s logs but stored as system logs
<rule>
key CONTAINER_NAME
pattern ^k8s_[^_]+_[^_]+_default_
tag kubernetes.journal.container._default_
</rule>
# mark logs from kube-* namespaces for processing as k8s logs but stored as system logs
<rule>
key CONTAINER_NAME
pattern ^k8s_[^_]+_[^_]+_kube-(.+)_
tag kubernetes.journal.container._kube-$1_
</rule>
# mark logs from openshift-* namespaces for processing as k8s logs but stored as system logs
<rule>
key CONTAINER_NAME
pattern ^k8s_[^_]+_[^_]+_openshift-(.+)_
tag kubernetes.journal.container._openshift-$1_
</rule>
# mark logs from openshift namespace for processing as k8s logs but stored as system logs
<rule>
key CONTAINER_NAME
pattern ^k8s_[^_]+_[^_]+_openshift_
tag kubernetes.journal.container._openshift_
</rule>
# mark fluentd container logs
<rule>
key CONTAINER_NAME
pattern ^k8s_.*fluentd
tag kubernetes.journal.container.fluentd
</rule>
# this is a kubernetes container
<rule>
key CONTAINER_NAME
pattern ^k8s_
tag kubernetes.journal.container
</rule>
# not kubernetes - assume a system log or system container log
<rule>
key _TRANSPORT
pattern .+
tag journal.system
</rule>
</match>
<filter kubernetes.**>
@type kubernetes_metadata
kubernetes_url 'https://kubernetes.default.svc'
cache_size '1000'
watch 'false'
use_journal 'nil'
ssl_partial_chain 'true'
</filter>
<filter kubernetes.journal.**>
@type parse_json_field
merge_json_log 'false'
preserve_json_log 'true'
json_fields 'log,MESSAGE'
</filter>
<filter kubernetes.var.log.containers.**>
@type parse_json_field
merge_json_log 'false'
preserve_json_log 'true'
json_fields 'log,MESSAGE'
</filter>
<filter kubernetes.var.log.containers.eventrouter-** kubernetes.var.log.containers.cluster-logging-eventrouter-**>
@type parse_json_field
merge_json_log true
preserve_json_log true
json_fields 'log,MESSAGE'
</filter>
<filter **kibana**>
@type record_transformer
enable_ruby
<record>
log ${record['err'] || record['msg'] || record['MESSAGE'] || record['log']}
</record>
remove_keys req,res,msg,name,level,v,pid,err
</filter>
<filter k8s-audit.log**>
@type record_transformer
enable_ruby
<record>
k8s_audit_level ${record['level']}
level info
</record>
</filter>
<filter **>
@type viaq_data_model
elasticsearch_index_prefix_field 'viaq_index_name'
default_keep_fields CEE,time,@timestamp,aushape,ci_job,collectd,docker,fedora-ci,file,foreman,geoip,hostname,ipaddr4,ipaddr6,kubernetes,level,message,namespace_name,namespace_uuid,offset,openstack,ovirt,pid,pipeline_metadata,rsyslog,service,systemd,tags,testcase,tlog,viaq_msg_id
extra_keep_fields ''
keep_empty_fields 'message'
use_undefined false
undefined_name 'undefined'
rename_time true
rename_time_if_missing false
src_time_name 'time'
dest_time_name '@timestamp'
pipeline_type 'collector'
undefined_to_string 'false'
undefined_dot_replace_char 'UNUSED'
undefined_max_num_fields '-1'
process_kubernetes_events 'false'
<formatter>
tag "system.var.log**"
type sys_var_log
remove_keys host,pid,ident
</formatter>
<formatter>
tag "journal.system**"
type sys_journal
remove_keys log,stream,MESSAGE,_SOURCE_REALTIME_TIMESTAMP,__REALTIME_TIMESTAMP,CONTAINER_ID,CONTAINER_ID_FULL,CONTAINER_NAME,PRIORITY,_BOOT_ID,_CAP_EFFECTIVE,_CMDLINE,_COMM,_EXE,_GID,_HOSTNAME,_MACHINE_ID,_PID,_SELINUX_CONTEXT,_SYSTEMD_CGROUP,_SYSTEMD_SLICE,_SYSTEMD_UNIT,_TRANSPORT,_UID,_AUDIT_LOGINUID,_AUDIT_SESSION,_SYSTEMD_OWNER_UID,_SYSTEMD_SESSION,_SYSTEMD_USER_UNIT,CODE_FILE,CODE_FUNCTION,CODE_LINE,ERRNO,MESSAGE_ID,RESULT,UNIT,_KERNEL_DEVICE,_KERNEL_SUBSYSTEM,_UDEV_SYSNAME,_UDEV_DEVNODE,_UDEV_DEVLINK,SYSLOG_FACILITY,SYSLOG_IDENTIFIER,SYSLOG_PID
</formatter>
<formatter>
tag "kubernetes.journal.container**"
type k8s_journal
remove_keys 'log,stream,MESSAGE,_SOURCE_REALTIME_TIMESTAMP,__REALTIME_TIMESTAMP,CONTAINER_ID,CONTAINER_ID_FULL,CONTAINER_NAME,PRIORITY,_BOOT_ID,_CAP_EFFECTIVE,_CMDLINE,_COMM,_EXE,_GID,_HOSTNAME,_MACHINE_ID,_PID,_SELINUX_CONTEXT,_SYSTEMD_CGROUP,_SYSTEMD_SLICE,_SYSTEMD_UNIT,_TRANSPORT,_UID,_AUDIT_LOGINUID,_AUDIT_SESSION,_SYSTEMD_OWNER_UID,_SYSTEMD_SESSION,_SYSTEMD_USER_UNIT,CODE_FILE,CODE_FUNCTION,CODE_LINE,ERRNO,MESSAGE_ID,RESULT,UNIT,_KERNEL_DEVICE,_KERNEL_SUBSYSTEM,_UDEV_SYSNAME,_UDEV_DEVNODE,_UDEV_DEVLINK,SYSLOG_FACILITY,SYSLOG_IDENTIFIER,SYSLOG_PID'
</formatter>
<formatter>
tag "kubernetes.var.log.containers.eventrouter-** kubernetes.var.log.containers.cluster-logging-eventrouter-** k8s-audit.log** openshift-audit.log**"
type k8s_json_file
remove_keys log,stream,CONTAINER_ID_FULL,CONTAINER_NAME
process_kubernetes_events 'true'
</formatter>
<formatter>
tag "kubernetes.var.log.containers**"
type k8s_json_file
remove_keys log,stream,CONTAINER_ID_FULL,CONTAINER_NAME
</formatter>
<elasticsearch_index_name>
enabled 'true'
tag "journal.system** system.var.log** **_default_** **_kube-*_** **_openshift-*_** **_openshift_**"
name_type static
static_index_name infra-write
</elasticsearch_index_name>
<elasticsearch_index_name>
enabled 'true'
tag "linux-audit.log** k8s-audit.log** openshift-audit.log**"
name_type static
static_index_name audit-write
</elasticsearch_index_name>
<elasticsearch_index_name>
enabled 'true'
tag "**"
name_type static
static_index_name app-write
</elasticsearch_index_name>
</filter>
<filter **>
@type elasticsearch_genid_ext
hash_id_key viaq_msg_id
alt_key kubernetes.event.metadata.uid
alt_tags 'kubernetes.var.log.containers.logging-eventrouter-*.** kubernetes.var.log.containers.eventrouter-*.** kubernetes.var.log.containers.cluster-logging-eventrouter-*.** kubernetes.journal.container._default_.kubernetes-event'
</filter>
#flatten labels to prevent field explosion in ES
<filter ** >
@type record_transformer
enable_ruby true
<record>
kubernetes ${!record['kubernetes'].nil? ? record['kubernetes'].merge({"flat_labels": (record['kubernetes']['labels']||{}).map{|k,v| "#{k}=#{v}"}}) : {} }
</record>
remove_keys $.kubernetes.labels
</filter>
# Relabel specific source tags to specific intermediary labels for copy processing
# Earlier matchers remove logs so they don't fall through to later ones.
# A log source matcher may be null if no pipeline wants that type of log.
<match **_default_** **_kube-*_** **_openshift-*_** **_openshift_** journal.** system.var.log**>
@type relabel
@label @_INFRASTRUCTURE
</match>
<match kubernetes.**>
@type relabel
@label @_APPLICATION
</match>
<match linux-audit.log** k8s-audit.log** openshift-audit.log**>
@type relabel
@label @_AUDIT
</match>
<match **>
@type stdout
</match>
</label>
# Relabel specific sources (e.g. logs.apps) to multiple pipelines
<label @_APPLICATION>
<match **>
@type copy
<store>
@type relabel
@label @_LEGACY_SYSLOG
</store>
</match>
</label>
<label @_AUDIT>
<match **>
@type copy
<store>
@type relabel
@label @_LEGACY_SYSLOG
</store>
</match>
</label>
<label @_INFRASTRUCTURE>
<match **>
@type copy
<store>
@type relabel
@label @_LEGACY_SYSLOG
</store>
</match>
</label>
# Relabel specific pipelines to multiple, outputs (e.g. ES, kafka stores)
<label @PIPELINE_0_>
<match **>
@type copy
<store>
@type relabel
@label @DEFAULT
</store>
</match>
</label>
# Ship logs to specific outputs
<label @DEFAULT>
<match retry_default>
@type copy
<store>
@type elasticsearch
@id retry_default
host elasticsearch.openshift-logging.svc.cluster.local
port 9200
verify_es_version_at_startup false
scheme https
ssl_version TLSv1_2
target_index_key viaq_index_name
id_key viaq_msg_id
remove_keys viaq_index_name
client_key '/var/run/ocp-collector/secrets/fluentd/tls.key'
client_cert '/var/run/ocp-collector/secrets/fluentd/tls.crt'
ca_file '/var/run/ocp-collector/secrets/fluentd/ca-bundle.crt'
type_name _doc
http_backend typhoeus
write_operation create
reload_connections 'true'
# https://github.com/uken/fluent-plugin-elasticsearch#reload-after
reload_after '200'
# https://github.com/uken/fluent-plugin-elasticsearch#sniffer-class-name
sniffer_class_name 'Fluent::Plugin::ElasticsearchSimpleSniffer'
reload_on_failure false
# 2 ^ 31
request_timeout 2147483648
<buffer>
@type file
path '/var/lib/fluentd/retry_default'
flush_mode interval
flush_interval 1s
flush_thread_count 2
flush_at_shutdown true
retry_type exponential_backoff
retry_wait 1s
retry_max_interval 60s
retry_forever true
queued_chunks_limit_size "#{ENV['BUFFER_QUEUE_LIMIT'] || '32' }"
total_limit_size "#{ENV['TOTAL_LIMIT_SIZE'] || 8589934592 }" #8G
chunk_limit_size "#{ENV['BUFFER_SIZE_LIMIT'] || '8m'}"
overflow_action block
</buffer>
</store>
</match>
<match **>
@type copy
<store>
@type elasticsearch
@id default
host elasticsearch.openshift-logging.svc.cluster.local
port 9200
verify_es_version_at_startup false
scheme https
ssl_version TLSv1_2
target_index_key viaq_index_name
id_key viaq_msg_id
remove_keys viaq_index_name
client_key '/var/run/ocp-collector/secrets/fluentd/tls.key'
client_cert '/var/run/ocp-collector/secrets/fluentd/tls.crt'
ca_file '/var/run/ocp-collector/secrets/fluentd/ca-bundle.crt'
type_name _doc
retry_tag retry_default
http_backend typhoeus
write_operation create
reload_connections 'true'
# https://github.com/uken/fluent-plugin-elasticsearch#reload-after
reload_after '200'
# https://github.com/uken/fluent-plugin-elasticsearch#sniffer-class-name
sniffer_class_name 'Fluent::Plugin::ElasticsearchSimpleSniffer'
reload_on_failure false
# 2 ^ 31
request_timeout 2147483648
<buffer>
@type file
path '/var/lib/fluentd/default'
flush_mode interval
flush_interval 1s
flush_thread_count 2
flush_at_shutdown true
retry_type exponential_backoff
retry_wait 1s
retry_max_interval 60s
retry_forever true
queued_chunks_limit_size "#{ENV['BUFFER_QUEUE_LIMIT'] || '32' }"
total_limit_size "#{ENV['TOTAL_LIMIT_SIZE'] || 8589934592 }" #8G
chunk_limit_size "#{ENV['BUFFER_SIZE_LIMIT'] || '8m'}"
overflow_action block
</buffer>
</store>
</match>
</label>
<label @_LEGACY_SYSLOG>
<match **>
@type copy
#include legacy Syslog
@include /etc/fluent/configs.d/syslog/syslog.conf
</match>
</label>
(In reply to Jeff Cantrill from comment #3) > Note the reported issue is 'Init:CrashLoopBackOff' which is the init > container is in crashloop which does not exist here. This is the main > container which may be a separate issue but on the surface the original > issue was addressed Yes, it reports 'CrashLoopBackOff' not 'Init:CrashLoopBackOff'. Shall we create a separate bug for this by closing the current bug? (In reply to Giriyamma from comment #5) > (In reply to Jeff Cantrill from comment #3) > > > Note the reported issue is 'Init:CrashLoopBackOff' which is the init > > container is in crashloop which does not exist here. This is the main > > container which may be a separate issue but on the surface the original > > issue was addressed > > Yes, it reports 'CrashLoopBackOff' not 'Init:CrashLoopBackOff'. > Shall we create a separate bug for this by closing the current bug? If the init issue is resolved then yes we should close this issue and open something new to address the other The issue is resolved. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.18 extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0512 |
Verifying this issue using: Cluster version: 4.6.0-0.nightly-2021-01-28-234643 CLO: clusterlogging.4.6.0-202101290552.p0 EO: elasticsearch-operator.4.6.0-202101290552.p0 The fluentd nodes are stuck in CrashLoopBackOff state. Steps followed: 1. Created configmap/syslog to use legacy syslog method syslog.conf: <store> @type syslog remote_syslog rsyslogserver.openshift-logging.svc.cluster.local port 514 hostname $hostname remove_tag_prefix tag facility local0 severity info use_record true payload_key message rfc 3164 </store> $ oc create configmap syslog --from-file=syslog.conf 2. Created ClusterLogging/instance to deploy fluentd standalone 3.oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-58c74cc595-ndgvv 1/1 Running 0 101m fluentd-64jlx 0/1 CrashLoopBackOff 12 37m fluentd-67xk9 0/1 CrashLoopBackOff 12 37m fluentd-79cg6 0/1 CrashLoopBackOff 12 37m fluentd-9pp8k 0/1 CrashLoopBackOff 12 37m fluentd-cdbjs 0/1 CrashLoopBackOff 12 37m fluentd-nzd7h 0/1 CrashLoopBackOff 12 37m