1291866 – Create a better deployment strategy for fluentd than simply scaling the pod.

Bug 1291866 - Create a better deployment strategy for fluentd than simply scaling the pod.

Summary: Create a better deployment strategy for fluentd than simply scaling the pod.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	ewolinet
QA Contact:	chunchen
Docs Contact:
URL:
Whiteboard:
Depends On:	1291786 1337329
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-15 18:37 UTC by Eric Rich
Modified:	2019-10-10 10:42 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1291786
Environment:
Last Closed:	2016-09-27 09:34:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1933	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.3 Release Advisory	2016-09-27 13:24:36 UTC

Description Eric Rich 2015-12-15 18:37:43 UTC

+++ This bug was initially created as a clone of Bug #1291786 +++

Description of problem:

The configuration for Logging and metrics should be more automated, and there should be a better way to deploy the configuration and ensure that it is deployed to all "nodes" in the environment. 

Using the Kuberneties DaemonSets would be optimal.

Comment 1 Eric Rich 2015-12-16 13:50:55 UTC

Another usecase for this is the Docker Registry

Comment 2 Eric Rich 2015-12-16 13:51:30 UTC

Another usecase for this is the Router

Comment 3 Eric Rich 2016-01-19 22:15:15 UTC

The current solution or "hack" for ensuring that fluend is deployed to all nodes in the environment (https://docs.openshift.com/enterprise/3.1/install_config/aggregate_logging.html#fluentd)[scale the pod to the number of nodes] does not *ensure* that this component is on all nodes in the infrastructure. 

As a result it is possible that new nodes or nodes that were stopped and restarted (examples) do not get a fluentd cartridge deployed (or have there pods re-scheduled to nodes that have fluentd running on them) to them and thus logs are not aggregated from these nodes. 

The use of DaemonSets (Bug #1291786) for logging and metics collection is a logical fit for this as the DaemonSet functionality provided in Kubernetes was designed to meet/fulfill this purpose.

Comment 5 Luke Meyer 2016-01-21 15:23:53 UTC

(In reply to Eric Rich from comment #3)

> As a result it is possible that new nodes or nodes that were stopped and
> restarted (examples) do not get a fluentd cartridge deployed

As long as fluentd is scaled to match the number of nodes, this *shouldn't* happen - have you seen a node restart come back with no fluentd on it for more than a few minutes?

> (or have there
> pods re-scheduled to nodes that have fluentd running on them) 

They might be scheduled, but they'll fail to run due to the hack (port conflict) so the situation should resolve itself in time.

All of this is not how we would like things to stay, of course. I'm just saying I don't think it's as bad as you describe.

> The use of DaemonSets (Bug #1291786) for logging and metics collection is a
> logical fit for this as the DaemonSet functionality provided in Kubernetes
> was designed to meet/fulfill this purpose.

Absolutely. We're waiting for this to be enabled in the product and considered stable enough for production.

Comment 8 Luke Meyer 2016-02-08 20:04:47 UTC

With the advent of DaemonSets https://github.com/openshift/origin/pull/6854 this is unblocked and being worked on with https://trello.com/c/jjIFKzNU

Comment 13 errata-xmlrpc 2016-09-27 09:34:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.