Bug 1525415 - A lot of error messages in fluentd pod logs for deleted namespaces
Summary: A lot of error messages in fluentd pod logs for deleted namespaces
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.6.z
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On: 1493022
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-13 09:52 UTC by Anping Li
Modified: 2021-09-09 12:56 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Messages for which the unique namespace id could not be determined could not be properly indexed. Consequence: Messages could be lost and the error message appears in the logs Fix: Modify the cache algorithm to provide the necessary data or default the value to 'orphaned'. Result: The error message is resolved and messages are stored in an 'orphaned' index when a unique namespace id can not be determined.
Clone Of: 1493022
Environment:
Last Closed: 2018-04-12 05:59:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logging dump file (74.16 KB, application/x-gzip)
2017-12-13 10:03 UTC, Anping Li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3211281 0 None None None 2017-12-13 09:52:40 UTC
Red Hat Product Errata RHBA-2018:1106 0 None None None 2018-04-12 06:00:45 UTC

Description Anping Li 2017-12-13 09:52:40 UTC
+++ This bug was initially created as a clone of Bug #1493022 +++

Description of problem:
Deploy a few projects and populate logs before logging is deployed, delete these projects and  then deploy logging 3.7 via ansible, after all the pods get ready,
there are a lot of error messages indicate the deleted namespace is missing, more info see the attached file.
# oc logs logging-fluentd-jf3bf
2017-09-19 02:34:49 -0400 [error]: record cannot use elasticsearch index name type project_full: record is missing kubernetes.namespace_id field: {"docker"=>{"container_id"=>"a6ef5e685b23f01bce0b2007d9afce35a479665d9431174ddee803cbdce46d09"}, "kubernetes"=>{"container_name"=>"java-mainclass", "namespace_name"=>"java", "pod_name"=>"java-mainclass-1-0xv2x"}, "systemd"=>{"t"=>{"BOOT_ID"=>"ee4d11fab528402da9e706e765b13897", "CAP_EFFECTIVE"=>"1fffffffff", "CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --selinux-enabled --log-driver=journald --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/rhel-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --mtu=1350 --add-registry registry.ops.openshift.com --add-registry registry.access.redhat.com --block-registry registry.hacker.com --insecure-registry brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 --insecure-registry virt-openshift-05.lab.eng.nay.redhat.com:5000 --insecure-registry virt-openshift-05.lab.eng.nay.redhat.com:5001 --insecure-registry registry.ops.openshift.com --insecure-registry registry.access.stage.redhat.com --add-registry registry.access.redhat.com", "COMM"=>"dockerd-current", "EXE"=>"/usr/bin/dockerd-current", "GID"=>"0", "MACHINE_ID"=>"6ef6dbb635684161a0a994a678c7c5fc", "PID"=>"15572", "SELINUX_CONTEXT"=>"system_u:system_r:container_runtime_t:s0", "SYSTEMD_CGROUP"=>"/system.slice/docker.service", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"docker.service", "TRANSPORT"=>"journal", "UID"=>"0"}}, "level"=>"info", "message"=>"Hello Fabric8! Here's your random string: VKmkh", "hostname"=>"host-8-241-102.host.centralci.eng.rdu2.redhat.com", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>
"10.129.0.54", "ipaddr6"=>"fe80::858:aff:fe81:36", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2017-09-19T06:34:49.670969+00:00", "version"=>"0.12.39 1.6.0"}}, "@timestamp"=>"2017-09-19T05:36:33.905616+00:00"}

Version-Release number of selected component (if applicable):
# rpm -qa | grep openshift-ansible
openshift-ansible-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch
openshift-ansible-roles-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch
openshift-ansible-docs-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch
openshift-ansible-callback-plugins-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch
openshift-ansible-filter-plugins-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch
openshift-ansible-playbooks-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch
openshift-ansible-lookup-plugins-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch

Image:
logging-fluentd:v3.7.0-0.126.4.0

How reproducible:
Always

Steps to Reproduce:
1. Deploy a few projects and populate logs before logging is deployed.
   # oc new-project java
   # oc new-app --docker-image=docker.io/chunyunchen/java-mainclass:2.2.94-SNAPSHOT
2. Delete these projects
   # oc delete project java
3. Deploy logging 3.7 via ansible, inventory file see the [Additional info] part

Actual results:
There are a lot of error messages indicate namespace is missed.

Expected results:
The log should not be so noisy.

Additional info:
[OSEv3:children]
masters
etcd

[masters]
${MASTER_URL} openshift_public_hostname=${MASTER_URL}

[etcd]
${MASTER_URL} openshift_public_hostname=${MASTER_URL}


[OSEv3:vars]
ansible_ssh_user=root
ansible_ssh_private_key_file="~/libra.pem"
deployment_type=openshift-enterprise
openshift_docker_additional_registries=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888

# Logging
openshift_logging_install_logging=true
openshift_logging_kibana_hostname=kibana.0919-tgp.qe.rhcloud.com
openshift_logging_kibana_ops_hostname=kibana-ops.0919-tgp.qe.rhcloud.com
public_master_url=https://${MASTER_URL}:8443
openshift_logging_image_prefix=${IMAGE_PREFIX}
openshift_logging_image_version=v3.7
openshift_logging_namespace=logging
openshift_logging_elasticsearch_cpu_limit=500m
openshift_logging_elasticsearch_memory_limit=1Gi
openshift_logging_es_ops_cpu_limit=300m
openshift_logging_es_ops_memory_limit=512Mi
openshift_logging_use_ops=true

--- Additional comment from Rich Megginson on 2017-09-19 09:10:13 EDT ---

I think this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1486493 and possibly a dup.

--- Additional comment from Peter Portante on 2017-09-19 09:30:59 EDT ---

Should this message be a warning, and the data indexed into an index using the namespace_id value?

This way, we don't lose the data, and we'd have an index that the data could be recovered from if necessary.

--- Additional comment from Rich Megginson on 2017-09-19 10:18:51 EDT ---

(In reply to Peter Portante from comment #2)
> Should this message be a warning, and the data indexed into an index using
> the namespace_id value?
> 
> This way, we don't lose the data, and we'd have an index that the data could
> be recovered from if necessary.

I don't understand - the problem is that there is no namespace_id:

2017-09-19 02:34:49 -0400 [error]: record cannot use elasticsearch index name type project_full: record is missing kubernetes.namespace_id field: {"docker"=>{"container_id"=>"a6ef5e685b23f01bce0b2007d9afce35a479665d9431174ddee803cbdce46d09"}, "kubernetes"=>{"container_name"=>"java-mainclass", "namespace_name"=>"java", "pod_name"=>"java-mainclass-1-0xv2x"},

Do you mean namespace_name?

--- Additional comment from Junqi Zhao on 2017-09-19 20:18:28 EDT ---

(In reply to Rich Megginson from comment #3)
> (In reply to Peter Portante from comment #2)
> > Should this message be a warning, and the data indexed into an index using
> > the namespace_id value?
> > 
> > This way, we don't lose the data, and we'd have an index that the data could
> > be recovered from if necessary.
> 
> I don't understand - the problem is that there is no namespace_id:
> 
> 2017-09-19 02:34:49 -0400 [error]: record cannot use elasticsearch index
> name type project_full: record is missing kubernetes.namespace_id field:
> {"docker"=>{"container_id"=>"a6ef5e685b23f01bce0b2007d9afce35a479665d9431174d
> dee803cbdce46d09"}, "kubernetes"=>{"container_name"=>"java-mainclass",
> "namespace_name"=>"java", "pod_name"=>"java-mainclass-1-0xv2x"},
> 
> Do you mean namespace_name?

Yes, it is namespace_name, in this defect, I created namespace java and populate logs, then deleted this namespace and then deployed logging

--- Additional comment from Peter Portante on 2017-09-19 21:31:57 EDT ---

(In reply to Rich Megginson from comment #3)
> Do you mean namespace_name?

Well, I did not mean namespace_name, but I was wrong to consider namespace_id.

How did we get kubernetes.namespace_name, thnough?  Was that an attribute coming from journal directly?

In either case, if we are missing a field that prevents us from picking the proper project index, can we afford to discard the data?  It does not seem to be the case.

Saving that data into an index capturing partial data would be preferred, where it could be accessed by operations if necessary.

--- Additional comment from Rich Megginson on 2017-09-19 21:45:18 EDT ---

(In reply to Peter Portante from comment #5)
> (In reply to Rich Megginson from comment #3)
> > Do you mean namespace_name?
> 
> Well, I did not mean namespace_name, but I was wrong to consider
> namespace_id.
> 
> How did we get kubernetes.namespace_name, thnough?  Was that an attribute
> coming from journal directly?

Yes, from the CONTAINER_NAME journal field.  Also in the json-file filename.

> 
> In either case, if we are missing a field that prevents us from picking the
> proper project index, can we afford to discard the data?  It does not seem
> to be the case.
> 
> Saving that data into an index capturing partial data would be preferred,
> where it could be accessed by operations if necessary.

Would that be the .operations.* indices, or will we have to create another namespace?

--- Additional comment from Peter Portante on 2017-09-20 06:18:49 EDT ---

(In reply to Rich Megginson from comment #6)
> Yes, from the CONTAINER_NAME journal field.  Also in the json-file filename.

So the journal's CONTAINER_NAME field is mapped to namespace_name?
 
> Would that be the .operations.* indices, or will we have to create another
> namespace?

I think we have to create another index name for orphans like this.  Perhaps .orphans.

--- Additional comment from Rich Megginson on 2017-09-20 09:25:02 EDT ---

(In reply to Peter Portante from comment #7)
> (In reply to Rich Megginson from comment #6)
> > Yes, from the CONTAINER_NAME journal field.  Also in the json-file filename.
> 
> So the journal's CONTAINER_NAME field is mapped to namespace_name?

The CONTAINER_NAME field contains all of the k8s metadata like this:

https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/configs.d/openshift/filter-k8s-meta.conf#L8

^(?<name_prefix>[^_]+)_(?<container_name>[^\._]+)(\.(?<container_hash>[^_]+))?_(?<pod_name>[^_]+)_(?<namespace>[^_]+)_[^_]+_[^_]+$

namespace is really the namespace name

e.g.

https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/configs.d/openshift/filter-retag-journal.conf#L8

CONTAINER_NAME=k8s_registry.a49f5318_docker-registry-1-hhoj0_default_ae3a9bdc-1f66-11e6-80a2-fa163e2fff3a_799e4035


>  
> > Would that be the .operations.* indices, or will we have to create another
> > namespace?
> 
> I think we have to create another index name for orphans like this.  Perhaps
> .orphans.

Is there some reason it shouldn't be in .operations?  Adding another index is going to be a lot of work (dev/qe/docs/support) and that work needs to be justified.

--- Additional comment from Rich Megginson on 2017-09-20 10:51:32 EDT ---

We also need to change https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter

The way it works now:

When a record is processed, we look up the namespace_name in the @namespace_cache.  If it is not found, we make an API call to get it and store it in the cache.

https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L249

There is also a namespace event watcher.  This listens only for MODIFIED and DELETED events:

https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L409

When a namespace is deleted, it is immediately removed from the cache.  This is the cause of our problems, because there may be a lot of records left to process.

The cache is an LRU TTL cache.  What we could do instead is listen for CREATED events, and not listen for DELETED events.  For deleted namespaces, we would just rely on the LRU cache to evict the record after the TTL (default 1 hour).  That would give us an hour to process any records remaining for that namespace.  If a namespace is CREATED with the same name, the old one gets evicted from the cache and replaced by the new one.

Here is where it gets really tricky.  There is a race condition which will cause either records from the old namespace to end up in the indices for the new namespace, or vice versa.

What we need to do is cache a _list_ of namespaces with the same name, distinguished by their create timestamp (or cache by namespace name + timestamp).  Then, we would consult the original timestamp from the record, and figure out which namespace to use based on the timestamp.  There is always a chance that the clocks are not in sync, but at least this gives us a much smaller window of time in which the record could be assigned to the wrong index.

--- Additional comment from Peter Portante on 2017-09-20 16:48:39 EDT ---

(In reply to Rich Megginson from comment #8)
> Is there some reason it shouldn't be in .operations?

Yes, there are situations where the operators should not be allowed to see sensitive log data.  This is the case for how we run OpenShift as a service for Dedicated customers.

--- Additional comment from Peter Portante on 2017-09-20 17:18:19 EDT ---

(In reply to Rich Megginson from comment #9)
> We also need to change
> https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter
> 
> The way it works now:
> 
> When a record is processed, we look up the namespace_name in the
> @namespace_cache.  If it is not found, we make an API call to get it and
> store it in the cache.

Hmm, should we consider storing the cache a bit differently?  Here is a proposal:

cache['container_hash'] = {
  'container_name': <container_name>  # from CONTAINER_NAME field
  'pod_name': <pod_name>   # from CONTAINER_NAME field
  'namespace_name': <namespace>  # from CONTAINER_NAME field
  'namespace_id': <namespace_id>  # from API return
  'first_seen/created': <timestamp>  # from time of first API call
  'deleted': <timestamp>  # from event
  'modified': <timestamp>  # from event
  'last seen': <timestamp>  # from message processed
  'count-after-deleted': number  # number of messages seen past deleted time, report count-after-deleted in a warning message if 'last seen' > 'deleted' when entry is reaped 
}

All logs with a timestamp (as received by fluentd, not from message payload) before deleted timestamp that have the same container hash flow through decorated with namespace_id.

Any log with the same container hash after the deleted timestamp should fall into the same namespace, since the user can't spoof the hash.

Cache entries age out based on deleted timestamp older than 5 minutes, any log received with an old hash gets looked up again, negative entry is created for that hash ID if it is not found, and logs entered into the .orphan index.

--- Additional comment from Rich Megginson on 2017-09-20 17:45:54 EDT ---

(In reply to Peter Portante from comment #10)
> (In reply to Rich Megginson from comment #8)
> > Is there some reason it shouldn't be in .operations?
> 
> Yes, there are situations where the operators should not be allowed to see
> sensitive log data.  This is the case for how we run OpenShift as a service
> for Dedicated customers.

Note that the .orphans index may contain records from _all_ namespaces/projects - that's ok?

--- Additional comment from Peter Portante on 2017-09-21 09:29:54 EDT ---

Yes, we'll want to be sure operations folks can see it, but should not create any special kibana setup for it explicitly by the product, just documenting the "sink" is there, and if logs are missing, they can be recovered.

--- Additional comment from Jeff Cantrill on 2017-12-04 13:44:34 EST ---

Comment 1 Anping Li 2017-12-13 10:03:43 UTC
Created attachment 1367223 [details]
logging dump file

Hit similar issue in v3.5.  Note that, in my Env, only logging-curator/images/v3.5.5.31.48 is stage image.

OCP: atomic-openshift-3.5.5.31.48
Logging:
logging-kibana/images/3.5.0-51
logging-fluentd/images/3.5.0-46
logging-elasticsearch/images/3.5.0-58
logging-curator/images/v3.5.5.31.48
logging-auth-proxy/images/3.5.0-45

Comment 4 Jeff Cantrill 2018-02-23 20:56:31 UTC
I don't expect we would backport this fix to 3.5

Comment 5 Noriko Hosoi 2018-02-23 21:45:01 UTC
You are right, @Jeff.  Thanks!

The environment variables in Dockerfile are not used for choosing the gem files.
 14 ENV DATA_VERSION=1.6.0 \
        ....
 17     FLUENTD_FLATTEN_HASH=0.4.0-1 \
 18     FLUENTD_KUBE_METADATA=0.29.0-1 \
 19     FLUENTD_REWRITE_TAG=1.5.6-1 \
        ....

Probably, we could remove them...

Comment 7 Anping Li 2018-03-06 11:05:58 UTC
No error message about deleted namespaces with /openshift3/logging-fluentd: v3.6.173.0.104. so  move bug to verified.

Comment 10 errata-xmlrpc 2018-04-12 05:59:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1106


Note You need to log in before you can comment on or make changes to this bug.