Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1945017

Summary: Insufficient system reserved memory in single node clusters
Product: OpenShift Container Platform Reporter: Omer Tuchfeld <otuchfel>
Component: NodeAssignee: Harshal Patil <harpatil>
Node sub component: Kubelet QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, rphillips
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-09 03:12:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Omer Tuchfeld 2021-03-31 08:41:24 UTC
Description of problem:
Trying to run E2E tests on a single node cluster results in a SystemMemoryExceedsReservation alert. It probably happens because a lot more burden is put on the single node's kubelet, cri-o and other system processes because it's not distributed across 3 nodes, so it crosses the 1GiB~ threshold.

Increasing the system reserved memory from 1GiB to 2GiB seems to have made the alerts disappear (https://github.com/openshift/machine-config-operator/pull/2501):

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2501/pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node/1377015937324027904

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2501/pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node/1376985053849980928

Version-Release number of selected component (if applicable):
4.8

How reproducible:
Happens on almost every single-node e2e test run on AWS


Steps to Reproduce:
1. Run e2e-aws-single-node prow job on any repo that has it
2. Look at the failing alerts test
3.

Actual results:
Alert is fired


Expected results:
Alert shouldn't fire


Additional info:
Might be slightly related to https://github.com/kubernetes/kubernetes/pull/100531, but I don't think it's what's happening here

Comment 1 Yu Qi Zhang 2021-03-31 16:35:47 UTC
Moving over to the node team to take a look at whether this should be modified for the SNO case, or whether we should just not alert

Comment 2 Omer Tuchfeld 2021-03-31 21:48:03 UTC
Proposed solution: https://github.com/openshift/machine-config-operator/pull/2504

Comment 4 Martin Sivák 2021-04-01 08:13:57 UTC
Memory manager is a NUMA only related component. Moving to kubelet for further triage.