1985739 – Tuned affining containers to house keeping cpus

Bug 1985739 - Tuned affining containers to house keeping cpus

Summary: Tuned affining containers to house keeping cpus

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Performance Addon Operator
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Yanir Quinn
QA Contact:	Niranjan Mallapadi Raghavender
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1998120
TreeView+	depends on / blocked

Reported:	2021-07-25 11:50 UTC by Yanir Quinn
Modified:	2022-08-26 14:51 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Tuned scheduler plugin did not recognize container processes as exceptions being managed by OCP's cpu manager. Consequence: Tuned modified the pinning of containers on restart and placed them all on the house keeping (reserved) cpus. This broke pinning guarantees and overloaded the reserved cpus. Fix: A new tuned feature #1980715 allows defining exceptions using cgroup name and PAO configures tuned to ignore container pinning. Result: Container pinning is now managed by OCP's cpu manager only.
Clone Of:
Clones:	1998120 (view as bug list)
Environment:
Last Closed:	2022-08-26 14:51:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-node-tuning-operator pull 258	0	None	None	None	2021-08-04 10:04:50 UTC

Comment 5 Niranjan Mallapadi Raghavender 2021-08-26 12:50:10 UTC

[root@bkr-hv03 performance]# /root/img-nvr.sh 
NVR=v4.9.0-19
[root@bkr-hv03 performance]# oc version
Client Version: 4.7.21
Server Version: 4.9.0-0.nightly-2021-08-23-224104
Kubernetes Version: v1.22.0-rc.0+5c2f7cd

Deployed PAO with below profile:
spec:
  cpu:
    isolated: 5-19,45-59,20-39,60-79
    reserved: 0-4,40-44
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 20
      size: 2M
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: true



[mniranja@mniranja test1 (master *%)]$ oc get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE     IP            NODE                                      NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-ffbc7b65-gv9lm   1/1     Running   0          5h23m   10.129.0.23   hlxcl6-master-0.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-2rxkq                                   1/1     Running   1          2d      10.46.56.67   hlxcl6-master-2.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-9fvrh                                   1/1     Running   5          2d      10.46.56.2    helix02.lab.eng.tlv2.redhat.com           <none>           <none>
tuned-rxnnq                                   1/1     Running   1          2d      10.46.56.65   hlxcl6-master-0.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-t7bdb                                   1/1     Running   1          2d      10.46.56.66   hlxcl6-master-1.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-xdzpj                                   1/1     Running   4          2d      10.46.56.3    helix03.lab.eng.tlv2.redhat.com           <none>           <none>
[mniranja@mniranja test1 (master *%)]$ oc rsh tuned-9fvrh


sh-4.4# grep Cpus_allowed_list /proc/`pidof chronyd`/status
Cpus_allowed_list:      0-4,40-44
sh-4.4# grep ^Cpus_allowed_list /proc/`pgrep openshift-tuned`/status
Cpus_allowed_list:      0-79
sh-4.4#   grep  /proc/`pidof chronyd`/cgroup
^C
sh-4.4#  grep . /proc/`pidof chronyd`/cgroup 
12:cpuset:/
11:rdma:/
10:blkio:/system.slice/chronyd.service
9:net_cls,net_prio:/
8:cpu,cpuacct:/system.slice/chronyd.service
7:hugetlb:/
6:freezer:/
5:perf_event:/
4:memory:/system.slice/chronyd.service
3:devices:/system.slice/chronyd.service
2:pids:/system.slice/chronyd.service
1:name=systemd:/system.slice/chronyd.service

Note You need to log in before you can comment on or make changes to this bug.