Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1945017

Summary:	Insufficient system reserved memory in single node clusters
Product:	OpenShift Container Platform	Reporter:	Omer Tuchfeld <otuchfel>
Component:	Node	Assignee:	Harshal Patil <harpatil>
Node sub component:	Kubelet	QA Contact:	Sunil Choudhary <schoudha>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	unspecified
Priority:	unspecified	CC:	aos-bugs, rphillips
Version:	4.8
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-04-09 03:12:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Omer Tuchfeld 2021-03-31 08:41:24 UTC

Description of problem:
Trying to run E2E tests on a single node cluster results in a SystemMemoryExceedsReservation alert. It probably happens because a lot more burden is put on the single node's kubelet, cri-o and other system processes because it's not distributed across 3 nodes, so it crosses the 1GiB~ threshold.

Increasing the system reserved memory from 1GiB to 2GiB seems to have made the alerts disappear (https://github.com/openshift/machine-config-operator/pull/2501):

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2501/pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node/1377015937324027904

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2501/pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node/1376985053849980928

Version-Release number of selected component (if applicable):
4.8

How reproducible:
Happens on almost every single-node e2e test run on AWS


Steps to Reproduce:
1. Run e2e-aws-single-node prow job on any repo that has it
2. Look at the failing alerts test
3.

Actual results:
Alert is fired


Expected results:
Alert shouldn't fire


Additional info:
Might be slightly related to https://github.com/kubernetes/kubernetes/pull/100531, but I don't think it's what's happening here

Comment 1 Yu Qi Zhang 2021-03-31 16:35:47 UTC

Moving over to the node team to take a look at whether this should be modified for the SNO case, or whether we should just not alert

Comment 2 Omer Tuchfeld 2021-03-31 21:48:03 UTC

Proposed solution: https://github.com/openshift/machine-config-operator/pull/2504

Comment 4 Martin Sivák 2021-04-01 08:13:57 UTC

Memory manager is a NUMA only related component. Moving to kubelet for further triage.