Bug 1994277

Summary: Changing the memory manager policy via the kubelet config will drop the node to NotReady state
Product: OpenShift Container Platform Reporter: Artyom <alukiano>
Component: NodeAssignee: Artyom <alukiano>
Node sub component: Memory manager QA Contact: Walid A. <wabouham>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, rphillips
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:46:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Artyom 2021-08-17 08:13:55 UTC
Description of problem:
Changing the memory manager policy via the kubelet config will drop the node to NotReady state. The reason is that the memory manager assumes that you will delete the memory manager state file during the kubelet restart.

Version-Release number of selected component (if applicable):
master

How reproducible:
Always

Steps to Reproduce:
1. Change the memory manager policy via the KubeletConfig and set the reserved memory.
2. Wait for the node to be ready.
3.

Actual results:
The node stays in the NonReady state forever with the error under the Kubelet logs
Aug 16 18:04:51 alukiano-csbfk-worker-a-dcvzf.c.openshift-gce-devel.internal hyperkube[9402]: E0816 18:04:51.711228    9402 memory_manager.go:174] "Could not initialize checkpoint manager, please drain node and remove policy state file" err="could not restore state from checkpoint: [memorymanager] configured policy \"Static\" differs from state checkpoint policy \"None\", please drain this node and delete the memory manager checkpoint file \"/var/lib/kubelet/memory_manager_state\" before restarting Kubelet"


Expected results:
The kubelet should be ready.

Additional info:

Comment 4 errata-xmlrpc 2021-10-18 17:46:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759