Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1985739

Summary:	Tuned affining containers to house keeping cpus
Product:	OpenShift Container Platform	Reporter:	Yanir Quinn <yquinn>
Component:	Performance Addon Operator	Assignee:	Yanir Quinn <yquinn>
Status:	CLOSED ERRATA	QA Contact:	Niranjan Mallapadi Raghavender <mniranja>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.9	CC:	grajaiya, jmencak, shajmakh, yjoseph
Target Milestone:	---
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Tuned scheduler plugin did not recognize container processes as exceptions being managed by OCP's cpu manager. Consequence: Tuned modified the pinning of containers on restart and placed them all on the house keeping (reserved) cpus. This broke pinning guarantees and overloaded the reserved cpus. Fix: A new tuned feature #1980715 allows defining exceptions using cgroup name and PAO configures tuned to ignore container pinning. Result: Container pinning is now managed by OCP's cpu manager only.	Story Points:	---
Clone Of:
Clones:	1998120 (view as bug list)		Environment:
Last Closed:	2022-08-26 14:51:04 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1998120

Comment 5 Niranjan Mallapadi Raghavender 2021-08-26 12:50:10 UTC

[root@bkr-hv03 performance]# /root/img-nvr.sh 
NVR=v4.9.0-19
[root@bkr-hv03 performance]# oc version
Client Version: 4.7.21
Server Version: 4.9.0-0.nightly-2021-08-23-224104
Kubernetes Version: v1.22.0-rc.0+5c2f7cd

Deployed PAO with below profile:
spec:
  cpu:
    isolated: 5-19,45-59,20-39,60-79
    reserved: 0-4,40-44
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 20
      size: 2M
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: true



[mniranja@mniranja test1 (master *%)]$ oc get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE     IP            NODE                                      NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-ffbc7b65-gv9lm   1/1     Running   0          5h23m   10.129.0.23   hlxcl6-master-0.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-2rxkq                                   1/1     Running   1          2d      10.46.56.67   hlxcl6-master-2.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-9fvrh                                   1/1     Running   5          2d      10.46.56.2    helix02.lab.eng.tlv2.redhat.com           <none>           <none>
tuned-rxnnq                                   1/1     Running   1          2d      10.46.56.65   hlxcl6-master-0.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-t7bdb                                   1/1     Running   1          2d      10.46.56.66   hlxcl6-master-1.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-xdzpj                                   1/1     Running   4          2d      10.46.56.3    helix03.lab.eng.tlv2.redhat.com           <none>           <none>
[mniranja@mniranja test1 (master *%)]$ oc rsh tuned-9fvrh


sh-4.4# grep Cpus_allowed_list /proc/`pidof chronyd`/status
Cpus_allowed_list:      0-4,40-44
sh-4.4# grep ^Cpus_allowed_list /proc/`pgrep openshift-tuned`/status
Cpus_allowed_list:      0-79
sh-4.4#   grep  /proc/`pidof chronyd`/cgroup
^C
sh-4.4#  grep . /proc/`pidof chronyd`/cgroup 
12:cpuset:/
11:rdma:/
10:blkio:/system.slice/chronyd.service
9:net_cls,net_prio:/
8:cpu,cpuacct:/system.slice/chronyd.service
7:hugetlb:/
6:freezer:/
5:perf_event:/
4:memory:/system.slice/chronyd.service
3:devices:/system.slice/chronyd.service
2:pids:/system.slice/chronyd.service
1:name=systemd:/system.slice/chronyd.service