Bug 1561510 - There is no guarantee a fluentd pod will start running on a node because Kube does not support pod preemption and priority
Summary: There is no guarantee a fluentd pod will start running on a node because Kube...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.9.0
Hardware: All
OS: Linux
high
urgent
Target Milestone: ---
: 3.11.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-28 13:34 UTC by Peter Portante
Modified: 2018-05-14 18:16 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-14 18:16:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Peter Portante 2018-03-28 13:34:03 UTC
We have no way to guarantee a fluentd pod starts running on a host, because Kube does not support pod priority and preemption.

We need to consider providing a playbook that will work to ensure every node in a cluster labeled to run fluentd will that pod running there.

Comment 1 Peter Portante 2018-04-09 02:10:15 UTC
Here is a gist for a script that works to get fluentd running on a given node of the cluster it is labeled for currently: https://gist.github.com/portante/2b91dd7d49636c7e40fa53fb7ed1388b

We should either add this script to the product, or stop using fluentd under daemonsets and use system containers deployed from ansible instead.

Comment 3 Peter Portante 2018-04-21 00:29:38 UTC
After considering this further, pod priority and preemption won't help if there are other issues starting the fluentd pod that affect the OpenShift node or docker.

We need to consider running the log collectors *out side* of OpenShift as part of the base RHEL services.

We also need a work-around until then based on the gist from comment #1.

This is a data-loss situation for customers.

Comment 4 Peter Portante 2018-04-21 00:37:28 UTC
Another BZ describing how fluentd pods fail to start: https://bugzilla.redhat.com/show_bug.cgi?id=1560428


Note You need to log in before you can comment on or make changes to this bug.