Bug 1974967 - Prometheus Memory Usage 50-100% higher on 4.8+ OVN when under load
Summary: Prometheus Memory Usage 50-100% higher on 4.8+ OVN when under load
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.9.0
Assignee: Antonio Ojea
QA Contact: Kedar Kulkarni
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-22 20:47 UTC by Keith
Modified: 2021-10-18 17:36 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Service controller metrics cardinality explosion, since it was tracked metrics per each service created. Consequence: High memory usage on OVN master pods Fix: Reduce cardinality on metrics removing the per service label. Result: Reduce memory usage on OVN master pods
Clone Of:
Environment:
Last Closed: 2021-10-18 17:35:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
grafana screenshot in case the url doesnt load (197.00 KB, image/png)
2021-06-22 20:47 UTC, Keith
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:36:22 UTC

Description Keith 2021-06-22 20:47:37 UTC
Created attachment 1793243 [details]
grafana screenshot in case the url doesnt load

Description of problem:

During our performance test runs we noticed that in OVN clusters on version 4.8+ have Prometheus consuming significantly more memory when subjecting the cluster to cluster_density tests at 25 and 50 node scales. At 50 node tests the memory usage is sufficient enough to have the Prometheus pods get OOMKilled and potentially exhaust the underlying node resources. 

I'm not sure but it might be related to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1925061 but we don't the see memory issue on 4.7 OVN 

Version-Release number of selected component (if applicable):

4.8/4.9 


How reproducible: Somewhat easily


Steps to Reproduce:
1. Have a 4.8 OVN Cluster at 50 nodes
2. Run a cluster_density test with kube-burner to generate k8s objects in the cluster
3. See prometheus memory 

Actual results: Prometheus uses more memory on OVN than SDN after 4.8


Expected results: Prometheus memory usage would stay relatively even across all versions of OVN/SDN 


Additional info:

These clusters were tested as part of a workflow to performance test OCP. They all were part of a scheduled test run and each of them were given the exact same benchmark configurations defined here:

https://github.com/whitleykeith/airflow-kubernetes/blob/master/dags/openshift_nightlies/tasks/benchmarks/defaults.json

The underlying scripts that run the benchmarks are here:
https://github.com/cloud-bulldozer/e2e-benchmarking

4.8 ovn install configs: https://github.com/whitleykeith/airflow-kubernetes/blob/master/dags/openshift_nightlies/releases/4.8/aws/ovn/install.json

4.9 ovn install configs: https://github.com/whitleykeith/airflow-kubernetes/blob/master/dags/openshift_nightlies/releases/4.9/aws/ovn/install.json

Grafana with prom metrics from the clusters: http://dittybopper-dittybopper.apps.keith-cluster.perfscale.devcluster.openshift.com/d/oWe9aYxmke23/workload-metrics-thanos-ds-v2?orgId=1&from=1624288003575&to=1624298803576&var-platform=aws&var-openshift_version=4.8.0-0.nightly-2021-06-19-005119&var-openshift_version=4.9.0-0.nightly-2021-06-21-084703&var-openshift_version=4.7.0-0.nightly-2021-06-20-093308&var-network_type=OVNKubernetes&var-network_type=OpenShiftSDN&var-cluster_name=whitleykeith-4-7-aws-8fzqs&var-cluster_name=whitleykeith-4-7-aws-jfrz7&var-cluster_name=whitleykeith-4-8-aws-2vf5n&var-cluster_name=whitleykeith-4-9-aws-cgpbt&var-cluster_name=whitleykeith-4-8-aws-dc4cl&var-cluster_name=whitleykeith-4-9-aws-hkb26&var-machinesets=All&var-Node=All&var-Deployment=All&var-Statefulset=prometheus-k8s&var-Daemonset=All

Comment 1 Jan Fajerski 2021-06-23 07:39:41 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1925061 is likely unrelated. We know about memory usage increase on updates and its connected to series churn/restart all containers.

This one seems to be specific to the underlying provider.

One straight forward theory would be that on OVN prometheus ingests more series then with SDN.

Is there a way to get a OVN/SDN pair of clusters, either 4.8 or 4.9, so we can investigate a bit?

Comment 2 Keith 2021-06-23 12:56:21 UTC
I would agree about OVN, however, I don't see this issue on 4.7 OVN so it might be a relatively recent change there.


We have our regular workloads running today so we should get 4.8/4.9 clusters up. They're almost done installing now but the workloads that reproduce this issue take a bit longer to run. I'll get the clusters into the state they were in so we can see what the differences are

Comment 3 Jan Fajerski 2021-06-24 14:33:51 UTC
From the monitoring perspective the serviceMonitor/openshift-ovn-kubernetes/monitor-ovn-master-metrics is the offender here. One metric in particular seems to cause the brunt of the resource usage: ovnkube_master_sync_service_latency_seconds_bucket

This metric carries a label called name which has as its value a namespace ID (probably among other things). E.g. name="cluster-density-374ea166-191f-46ba-8626-5f7859567ab3-1/deployment-1pod-1-1".
The scaling test, that exposed this creates many of these namespace and in turn we see a cardinality explosion for this metric (see screenshots attached).
This dramatically increases prometheus' resource usage and slows down the exporter quite a bit.

Using IDs that can grow without constraints should not be used as a label value.

Comment 6 Jan Fajerski 2021-06-25 08:16:32 UTC
I suspect that the main issue is that the ovnkube_master_sync_service_latency_seconds_bucket, once created, never go away when the respective namespace/pod is deleted.

If I'm not mistaken the data is exported here https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/ovn/controller/services/services_controller.go

I'm not sure how valuable this metric is but either the metrics for deleted namespaces and pods mus also be deleted or it might be worth considering if the name label is needed or not.

Comment 7 Antonio Ojea 2021-06-25 10:09:22 UTC
This is my fault, since I really didn't understand well the implications of labels on prometheus metrics.
We can just have only a global metrics, no need for such per service granularity

https://github.com/ovn-org/ovn-kubernetes/pull/2279

Comment 8 Andrew Stoycos 2021-07-19 15:31:33 UTC
This fix made it downstream in https://github.com/openshift/ovn-kubernetes/pull/600

Comment 10 Kedar Kulkarni 2021-07-29 14:07:35 UTC
Hi,

I tested the OVN and SDN 4.8/4.9 side by side, with exact same kind of workload at 50 nodes scale. Based on my observations, Prometheus memory usage for OVN 4.9 showed improvements over OVN 4.8. Roughly it improved by ~16%.

Just to note: Between OVN and SDN though, for 4.9, SDN used around ~12.6GB memory(avg across both replicas) while OVN ~22GB memory(avg across both replicas) of prometheus.

Since the fix that was merged, supposed to improve OVN, and that is what I observed, I am marking this as Verified.

@kwhitley please open a new BZ if you think this issue needs further improvements.

Thanks,
KK.

Comment 17 errata-xmlrpc 2021-10-18 17:35:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.