1896770 – Increased memory consumption by the FluentD pods in Cluster Logging instance on OCP 4.6.1 / s390x

Bug 1896770 - Increased memory consumption by the FluentD pods in Cluster Logging instance on OCP 4.6.1 / s390x

Summary: Increased memory consumption by the FluentD pods in Cluster Logging instance ...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.6.z
Hardware:	s390x
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Jeff Cantrill
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:	logging-core
Depends On:
Blocks:	ocp-46-z-tracker
TreeView+	depends on / blocked

Reported:	2020-11-11 14:11 UTC by Lakshmi Ravichandran
Modified:	2021-01-20 15:44 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-01-20 15:44:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Lakshmi Ravichandran 2020-11-11 14:11:35 UTC

Description of problem:
The recent performance measurements on OCP 4.6.1 / s390x cluster running cluster logging instance shows increased CPU consumption of about
- 5 CPU cores for fluentD over 6 nodes (3M + 3 W)
- 1 CPU core for Elastic Search over 3 worker nodes

Version-Release number of selected component (if applicable):
# oc version
Client Version: 4.6.0-rc.4
Server Version: 4.6.1
Kubernetes Version: v1.19.0+d59ce34

How reproducible:
Every time

Steps to Reproduce:
1. Install an OCP 4.6.1 cluster on s390x
2. Install Elastic Search, Cluster Logging, Local Storage operators from the console
3. Make local PVs available using LSO
4. Deploy the cluster logging instance to the cluster (definition below)
5. Measure the CPU consumption of FluentD, elasticsearch process using the top command in each of the cluster nodes

the logging instance definition is as below
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
namespace: "openshift-logging"
spec:
managementState: "Managed"
logStore:
type: "elasticsearch"
elasticsearch:
nodeCount: 3
storage:
storageClassName: "local-sc"
size: 7043Mi
redundancyPolicy: "ZeroRedundancy"
resources:
request:
memory: 2Gi
visualization:
type: "kibana"
kibana:
replicas: 1
curation:
type: "curator"
curator:
schedule: "30 3 * * *"
collection:
logs:
type: "fluentd"
fluentd: {}

Actual results:
Increased CPU consumption as described above.

Expected results:
What would be the reason for an increased CPU consumption by the fluentD pods? if, is there a way to reduce it?
what would be the recommended CPU requests/limits for the ES, fluentD pods?
Is there any performance report/profiling results available for the cluster logging components (fluentD, elasticsearch)?

Additional info:
The cluster’s resource spec is
- master nodes - 4 CPU / 16G ,
- worker nodes 01,02 - 10 CPU / 32G (increased memory as needed for ES pods),
- worker 03 - 4 CPU / 16G.
In the log_instance definition, the fluentD / ES pod do not have any request limits specified.

As the performance measurements were taken using an internally developed tool, the performance results are not made visible.
Please, let me know other logs which would be of interest here, I can provide them.

Comment 1 wvoesch 2020-12-07 13:51:25 UTC

We are expecting this behavior also to occur on x86. Could someone please check this on x86? Thank you

Comment 3 Lakshmi Ravichandran 2021-01-20 15:44:55 UTC

The fix for bug https://bugzilla.redhat.com/show_bug.cgi?id=1895385 was followed up and verified in the OCP 4.6.0-0.nightly-s390x-2021-01-18-070324; hence, closing it.

Note You need to log in before you can comment on or make changes to this bug.