Bug 1554407

Summary: [CRI-O] logging-fluentd throughput lower with CRI-O runtime vs Docker runtime
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: ContainersAssignee: Mrunal Patel <mpatel>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: anli, aos-bugs, decarr, dwalsh, jokerman, mifiedle, mmccomas, wjiang, wsun
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-06 15:46:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Configuration and performance result details. none

Description Mike Fiedler 2018-03-12 15:52:13 UTC
Created attachment 1407281 [details]
Configuration and performance result details.

Description of problem:

Fluentd cannot reach the same throughput message rate of reading and forwarding pod logs to Elasticsearch when the pod runtime is CRI-O as compared to Docker.

In OCP 3.9 fluentd performance testing, performance data (see attached document) indicated that fluentd was using considerably more CPU to process logs when the pod runtime was CRI-O.   The I/O pattern of how CRI-O writes/batches logs to disk appears to be a primary cause.   

Example:  Sending 1Kb messages at an aggregate rate of 500 messages/second across 100 pods on a single node.   For a docker runtime, fluentd CPU is 60% of 1 core.  For a cri-o runtime, it is maxing out the core at 100%.  fluentd as configured for use with OCP can only use 1 core.

With a docker runtime, we normally see maximum throughput around 750 - 1000 1K messages/second in our standard AWS m4.xlarge compute node configuration.   With a cri-o runtime the cap is somewhere below 500 1K messages/second

Version-Release number of selected component (if applicable): 3.9.4


How reproducible: Always


Steps to Reproduce:
1.  Install a cluster on AWS with 1 master, 3 infra and 3 compute nodes
2.  Deploy logging 3.9.4 with a  3 node elasticsearch cluster
3.  Use the SVT logtest tool (https://github.com/openshift/svt/blob/master/openshift_scalability/content/logtest/ocp_logtest-README.md) with the configuration below to load a cluster with 300 projects and pods (100 pods/compute node) logging at an aggregated rate of 500 1K messages/second/node
4.  Monitor CPU usage for fluentd

Actual results:

fluentd uses 100% of a single core - the max it can use - on a cri-o pod run time.  It uses ~ 60% CPU on a docker pod runtime.  See the attached document.

Expected results:

fluentd performance on a cri-o runtime compareable to a docker runtime

Additional info:

log test configuration 

- 300 projects and pods (100/node)
- 300 messages/minute/pod = 500 messages/second aggregate on the node.
- each pod sends 6000 1K messages = ~20 minutes of run time.   Can do longer if desired.

projects:
  - num: 300
    basename: logtest
    ifexists: delete
    tuning: default
    templates:
      - num: 1
        file: ./content/logtest/logtest-rc.json
        parameters:
         - REPLICAS: "1"
         - INITIAL_FLAGS: "--num-lines 6000 --line-length 1024 --word-length 9 --rate 300 --fixed-line\n"

tuningsets:
  - name: default
    pods:
      stepping:
        stepsize: 5
        pause: 0 min
      rate_limit:
        delay: 0 ms

Comment 5 Wei Sun 2018-05-31 02:18:35 UTC
Hi Mike,please help verify this bug.Thanks!

Comment 6 Mike Fiedler 2018-05-31 18:40:01 UTC
Verified on OCP 3.9.30 + cri-o.x86_64 1.9.12-1.gitfa11beb.el7

and also on OCP 3.10.0-0.54.0 + cri-o.x86_64 1.10.2-1.rhaos3.10.git095e88c.el7


The CPU and file I/O profile of fluentd + CRI-O is now virtually identical to fluentd + docker

Comment 9 errata-xmlrpc 2018-06-06 15:46:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1796