Bug 1399761

Summary: Logging fluentD daemon set does not set Memory limit for the pods
Product: OpenShift Container Platform Reporter: Boris Kurktchiev <kurktchiev>
Component: LoggingAssignee: ewolinet
Status: CLOSED CURRENTRELEASE QA Contact: Xia Zhao <xiazhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.1CC: aos-bugs, erjones, ewolinet, tdawson
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: Fluentd memory limit set by default to 512m Reason: Fluentd will run unbound by memory and can grow to upwards of 3G on a node Result: Fluentd by default will be bound to 512m as part of the template change in the deployer
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-16 21:03:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Boris Kurktchiev 2016-11-29 16:48:51 UTC
Description of problem:
The logging deployer does not set a memory limit in the DaemonSet for the fluentD pods, which means the pods can consume as much memory as they like.

Version-Release number of selected component (if applicable):
3.3.1.5

How reproducible:
Deploy Logging and let fluentD run on a busy system and watch its memory usage grow

Additional info:
Spoke with jcantril in #openshift-dev about this and basically it seems to be no real technical reason fluentd needs to have no memory limit. The DaemonSet that gets created by the logging-deployer sets a CPU limit but does not set one for memory.

The problem with not setting one is that fluentd can basically end up consuming the entirety of a node's RAM (unlikely but possible). I currently have pods that are sitting at 3GB utilization.

The daemonset needs to set a "sane" (whatever that may be) Memory limit in the spec as otherwise unpleasantries can happen.

Comment 4 Xia Zhao 2016-12-01 08:09:31 UTC
Verified with the latest deployer pod on ops registry:
openshift3/logging-deployer        3.4.0               5b8c3c9eb40d   

For fluentd daemonset:

        name: fluentd-elasticsearch
        resources:
          limits:
            cpu: 100m
            memory: 512Mi

For each fluentd pods:

    name: fluentd-elasticsearch
    resources:
      limits:
        cpu: 100m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 512Mi

# openshift version
openshift v3.4.0.32+d349492
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 5 Troy Dawson 2017-02-16 21:03:48 UTC
This bug was fixed with the latest OCP 3.4.0 that is already released.