1929875 – prometheus memory spikes

Bug 1929875 - prometheus memory spikes

Summary: prometheus memory spikes

Keywords:
Status:	CLOSED DUPLICATE of bug 1918683
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.6
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Sergiusz Urbaniak
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-17 19:53 UTC by dtarabor
Modified:	2021-02-18 09:37 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-18 09:37:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description dtarabor 2021-02-17 19:53:20 UTC

Description of problem:
prometheus-k8s-0 and prometheus-k8s-1 pods are having memory spikes after upgrading to 4.6.16- these appear to be unexplainable. these memory spikes (16GB+) cause the nodes to go into NotReady and pods can no longer be scheduled to those Nodes.

Workaround:
deleting the wal/ directory appears to have worked as a workaround.

Version-Release number of selected component (if applicable):
OCP 4.6.16

How reproducible:
I was not able to reproduce this on my cluster.

Steps to Reproduce:
1. Upgrade cluster to 4.6.16
2. prometheus pods will spike to a huge memory amount
3. nodes become overwhelmed

Expected results:
no memory spikes

Additional info:

appears to be related to https://github.com/prometheus/prometheus/issues/6934

Note You need to log in before you can comment on or make changes to this bug.