1994277 – Changing the memory manager policy via the kubelet config will drop the node to NotReady state

Bug 1994277 - Changing the memory manager policy via the kubelet config will drop the node to NotReady state

Summary: Changing the memory manager policy via the kubelet config will drop the node ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.9
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Artyom
QA Contact:	Walid A.
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-17 08:13 UTC by Artyom
Modified:	2021-10-18 17:47 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-18 17:46:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 2718	0	None	None	None	2021-08-19 14:11:54 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:47:00 UTC

Description Artyom 2021-08-17 08:13:55 UTC

Description of problem:
Changing the memory manager policy via the kubelet config will drop the node to NotReady state. The reason is that the memory manager assumes that you will delete the memory manager state file during the kubelet restart.

Version-Release number of selected component (if applicable):
master

How reproducible:
Always

Steps to Reproduce:
1. Change the memory manager policy via the KubeletConfig and set the reserved memory.
2. Wait for the node to be ready.
3.

Actual results:
The node stays in the NonReady state forever with the error under the Kubelet logs
Aug 16 18:04:51 alukiano-csbfk-worker-a-dcvzf.c.openshift-gce-devel.internal hyperkube[9402]: E0816 18:04:51.711228    9402 memory_manager.go:174] "Could not initialize checkpoint manager, please drain node and remove policy state file" err="could not restore state from checkpoint: [memorymanager] configured policy \"Static\" differs from state checkpoint policy \"None\", please drain this node and delete the memory manager checkpoint file \"/var/lib/kubelet/memory_manager_state\" before restarting Kubelet"


Expected results:
The kubelet should be ready.

Additional info:

Comment 4 errata-xmlrpc 2021-10-18 17:46:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.