1561510 – There is no guarantee a fluentd pod will start running on a node because Kube does not support pod preemption and priority

Bug 1561510 - There is no guarantee a fluentd pod will start running on a node because Kube does not support pod preemption and priority

Summary: There is no guarantee a fluentd pod will start running on a node because Kube...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.9.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	3.11.0
Assignee:	ewolinet
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-28 13:34 UTC by Peter Portante
Modified:	2018-05-14 18:16 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-14 18:16:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Peter Portante 2018-03-28 13:34:03 UTC

We have no way to guarantee a fluentd pod starts running on a host, because Kube does not support pod priority and preemption.

We need to consider providing a playbook that will work to ensure every node in a cluster labeled to run fluentd will that pod running there.

Comment 1 Peter Portante 2018-04-09 02:10:15 UTC

Here is a gist for a script that works to get fluentd running on a given node of the cluster it is labeled for currently: https://gist.github.com/portante/2b91dd7d49636c7e40fa53fb7ed1388b

We should either add this script to the product, or stop using fluentd under daemonsets and use system containers deployed from ansible instead.

Comment 3 Peter Portante 2018-04-21 00:29:38 UTC

After considering this further, pod priority and preemption won't help if there are other issues starting the fluentd pod that affect the OpenShift node or docker.

We need to consider running the log collectors *out side* of OpenShift as part of the base RHEL services.

We also need a work-around until then based on the gist from comment #1.

This is a data-loss situation for customers.

Comment 4 Peter Portante 2018-04-21 00:37:28 UTC

Another BZ describing how fluentd pods fail to start: https://bugzilla.redhat.com/show_bug.cgi?id=1560428

Note You need to log in before you can comment on or make changes to this bug.