Bug 2045972

Summary: etcd and api server cpu mask interferes with a guaranteed workload
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NodeAssignee: Artyom <alukiano>
Node sub component: CPU manager QA Contact: Walid A. <wabouham>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: urgent CC: alukiano, anr, aos-bugs, cgaynor, fbaudin, scuppett, sparpate, yjoseph
Version: 4.9Keywords: Reopened
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-23 21:25:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2041583    
Bug Blocks: 2050131    

Comment 6 errata-xmlrpc 2022-02-14 12:00:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.21 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0488

Comment 9 Martin Sivák 2022-02-22 14:12:16 UTC
> [core@ngdc-rcp-oe20-1 ~]$ kubectl -n openshift-cluster-node-tuning-operator exec  cluster-node-tuning-operator-647cbd9f67-vrqvl --  cat /sys/fs/cgroup/cpuset/cpuset.cpus
0-1,4-31,33,36-63

How is this the same issue? I clearly see the cpus 2-3 and 34-35 were removed from the cpuset likely due to a guaranteed workload consuming them.

Just to remind you, the original issue was etcd having:

0-63 - meaning guaranteed cpus were not removed at all form the cpu mask



> Can someone confirm if the fix of this issue "etcd and api server cpu mask interferes with a guaranteed workload" is to remove the pod annotation: target.workload.openshift.io/management: {"effect":"PreferredDuringScheduling"} ?

No, do not touch this annotation, it is necessary for workload partitioning to work properly.

This bug is worked around by using workload partitioning and fixed without it by using OCP 4.9.21.

Comment 10 Stephen Cuppett 2022-02-23 21:25:39 UTC
This has already been delivered. Please open a new bug (versus re-opening) and link for where the issue (or a new issue) is believed to exist.

Comment 12 Artyom 2022-02-28 08:56:19 UTC
I think Martin already answered it under https://bugzilla.redhat.com/show_bug.cgi?id=2045972#c9.