1955464 – [4.6] Drop container_memory_failures_total metric because of high cardinality

Bug 1955464 - [4.6] Drop container_memory_failures_total metric because of high cardinality

Summary: [4.6] Drop container_memory_failures_total metric because of high cardinality

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Jayapriya Pai
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:	1955462
Blocks:	1955452
TreeView+	depends on / blocked

Reported:	2021-04-30 07:48 UTC by Simon Pasquier
Modified:	2021-09-17 06:15 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1955462
Environment:
Last Closed:	2021-06-08 13:54:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 1180	0	None	open	Bug 1955464: drop container_memory_failures_total metric backport	2021-05-25 14:00:41 UTC
Red Hat Product Errata	RHBA-2021:2157	0	None	None	None	2021-06-08 13:54:39 UTC

Description Simon Pasquier 2021-04-30 07:48:35 UTC

+++ This bug was initially created as a clone of Bug #1955462 +++

+++ This bug was initially created as a clone of Bug #1955457 +++

Description of problem:
The container_memory_failures_total metric is in the top 10 of metrics with high cardinality. It isn't used in any rule or dashboard. Storing the metric in Prometheus increases memory usage for no benefit.

Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Steps to Reproduce:
1. Open the Prometheus UI, go to the Status > TSDB status page and look at the "Top 10 series count by metric names" section.

Actual results:
container_memory_failures_total is listed.

Expected results:
container_memory_failures_total isn't present.

Additional info:
N/A

Comment 3 Junqi Zhao 2021-05-27 08:41:42 UTC

checked with 4.6.0-0.nightly-2021-05-27-033913, no container_memory_failures_total metrics now
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep container_memory_failures_total
no result

Comment 8 errata-xmlrpc 2021-06-08 13:54:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.32 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2157

Note You need to log in before you can comment on or make changes to this bug.