Bug 1616171

Summary: logging-fluentd requires its own priority class to be scheduled on infra nodes
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, jcantril, jupierce, rmeggins
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: If logging was not in a namespace that began with 'openshift-' Fluentd was not able to use the "system-cluster-critical" priority class. Consequence: Fluentd would not be able to start up. Fix: We create a priority class for Cluster Logging and configure Fluentd to use that instead. Result: Fluentd is able to start up, even if not installed to an 'openshift-*' namespace.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:24:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2018-08-15 07:43:42 UTC
Description of problem:
free-int cluster, logging-fluentd image version is v3.10.15, it should be v3.11.0
image: registry.reg-aws.openshift.com:443/openshift3/logging-fluentd:v3.10.15

other component version is v3.11.0-0.10.0
for example:
registry.reg-aws.openshift.com:443/openshift3/ose-logging-curator5:v3.11.0-0.10.0


Version-Release number of selected component (if applicable):

OpenShift Master:v3.11.0-0.10.0 
Kubernetes Master:v1.11.0+d4cacc0 
OpenShift Web Console:v3.11.0-0.10.0

logging version: v3.11.0-0.10.0
 

How reproducible:
Always

Steps to Reproduce:
1. Check logging component images version
2.
3.

Actual results:
logging-fluentd image version is v3.10.15

Expected results:
logging-fluentd image version should be v3.11.0

Additional info:

Comment 3 Jeff Cantrill 2018-08-18 01:04:25 UTC
Do we need a special provision here or move logging to the default namespace for new deployments of 'openshift-logging' ?

Comment 4 Justin Pierce 2018-08-20 13:20:59 UTC
@liggitt seems like your name in this area of code. Can you provide some guidance?

Comment 5 Jordan Liggitt 2018-08-20 13:57:22 UTC
system-priority level pods are allowed in kube-system and openshift-... namespaces only

Comment 6 Jeff Cantrill 2018-08-21 13:17:37 UTC
Modifying this bug to reflect the need of a priority class for scheduling.  The original fact the version was wrong I'm certain was related to the deployment timing out.  Per #c5, fluent is unable to be scheduled because this is an upgrade where the logging stack still lives in a namespace that is not 'openshift-'.  At the recommendation of D.Carr, logging needs a priority class that is below system-priority level:

* reason 1 - the restriction for priority class you are hitting is specific to cluster critical
* reason 2 - fluentd should be second tier to other cluster components

We should create a priority class for cluster-logging.  We need this class regardless but is a stop gap until we can migrate deployments to 'openshift-logging' in the 4.0 uprades.

Example from avesh:
apiVersion: scheduling.k8s.io/v1beta1
kind: PriorityClass
metadata:
   name: high-priority
 value: 100000
 globalDefault: false
 description: "This priority class for high priority Applications"

ref: https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/

Comment 9 Anping Li 2018-09-07 08:29:25 UTC
Move the verified. The logging applications in logging namespace can be upgrade to v3.11. It is using priorityClassName: cluster-logging.

$ oc get ds -o yaml |grep priorityClassName
        priorityClassName: cluster-logging

Comment 11 errata-xmlrpc 2018-10-11 07:24:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652