Bug 1929875 - prometheus memory spikes
Summary: prometheus memory spikes
Keywords:
Status: CLOSED DUPLICATE of bug 1918683
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-17 19:53 UTC by dtarabor
Modified: 2021-02-18 09:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-18 09:37:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description dtarabor 2021-02-17 19:53:20 UTC
Description of problem:
prometheus-k8s-0 and prometheus-k8s-1 pods are having memory spikes after upgrading to 4.6.16- these appear to be unexplainable. these memory spikes (16GB+) cause the nodes to go into NotReady and pods can no longer be scheduled to those Nodes.

Workaround:
deleting the wal/ directory appears to have worked as a workaround.

Version-Release number of selected component (if applicable):
OCP 4.6.16

How reproducible:
I was not able to reproduce this on my cluster.

Steps to Reproduce:
1. Upgrade cluster to 4.6.16
2. prometheus pods will spike to a huge memory amount
3. nodes become overwhelmed

Expected results:
no memory spikes

Additional info:

appears to be related to https://github.com/prometheus/prometheus/issues/6934


Note You need to log in before you can comment on or make changes to this bug.