#### What happened: Setup: 1. 3 Master nodes and 2 Worker Nodes. 2. Kubernetes Version: v1.22.0-rc.0+5c2f7cd 3. Configuring Memory Manager on Kubernetes Version (v1.22.0-rc.0+5c2f7cd) and Nodes configured with Hugepages of size 2M on 2 Worker nodes. 4. Each Worker node having 2 Numa Nodes (numa node0 and numa node1) 5. Number of Hugepages of size 2M are 10 on each Numa node sh-4.4# cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages 10 sh-4.4# cat /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages 10 After configuring Memory , where Hugepages are 20M (2M * 10) on each Numa node. we create QoS pod1 pod which consumes 24Mi Hugepages (of size 2M) Pod Spec: <snip> apiVersion: v1 kind: Pod metadata: name: pod1 spec: containers: - name: example image: fedora:latest command: - sleep - inf volumeMounts: - mountPath: /hugepages-2Mi name: hugepage-2mi resources: limits: hugepages-2Mi: 24Mi memory: "24Mi" cpu: "2" requests: hugepages-2Mi: 24Mi memory: "24Mi" cpu: "2" volumes: - name: hugepage-2mi emptyDir: medium: HugePages-2Mi </snip> List pods <snip> $ oc get pods NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 13s 10.128.2.109 worker1.example.org <none> <none> </snip> Memory Manager state file: <snip> {"policyName":"Static","machineState":{"0":{"numberOfAssignments":2,"memoryMap":{"hugepages-1Gi":{"total":0,"systemReserved":0,"allocatable":0,"reserved":0,"free":0},"hugepages-2Mi":{"total":20971520,"systemReserved":0,"allocatable":20971520,"reserved":20971520,"free":0},"memory":{"total":270146174976,"systemReserved":1153433600,"allocatable":268971769856,"reserved":104857600,"free":268866912256}},"cells":[0,1]},"1":{"numberOfAssignments":2,"memoryMap":{"hugepages-1Gi":{"total":0,"systemReserved":0,"allocatable":0,"reserved":0,"free":0},"hugepages-2Mi":{"total":20971520,"systemReserved":0,"allocatable":20971520,"reserved":4194304,"free":16777216},"memory":{"total":270531874816,"systemReserved":0,"allocatable":270510903296,"reserved":0,"free":270510903296}},"cells":[0,1]}},"entries":{"bbf8fd78-3c9d-4924-b4a2-450caaca6da3":{"example":[{"numaAffinity":[0,1],"type":"hugepages-2Mi","size":25165824},{"numaAffinity":[0,1],"type":"memory","size":104857600}]}},"checksum":1759502831} </snip> Now create a QoS Guaranteed Pod2 requesting 16Mi . On the same worker node. Pod spec: <snip> apiVersion: v1 kind: Pod metadata: name: pod2 spec: containers: - name: example image: fedora:latest command: - sleep - inf volumeMounts: - mountPath: /hugepages-2Mi name: hugepage-2mi resources: limits: hugepages-2Mi: "16Mi" memory: "100Mi" cpu: "2" requests: hugepages-2Mi: "16Mi" memory: "100Mi" cpu: "2" nodeSelector: kubernetes.io/hostname: "worker1.example.org " volumes: - name: hugepage-2mi emptyDir: medium: HugePages-2Mi </snip> List pods: <snip> NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 4m39s 10.128.2.109 worker1.example.org <none> <none> pod2 1/1 Running 0 11s 10.128.2.111 worker1.example.org <none> <none> </snip> Memory Manager state file after pod2 is deployed. <snip> {"policyName":"Static","machineState":{"0":{"numberOfAssignments":4,"memoryMap":{"hugepages-1Gi":{"total":0,"systemReserved":0,"allocatable":0,"reserved":0,"free":0},"hugepages-2Mi":{"total":20971520,"systemReserved":0,"allocatable":20971520,"reserved":20971520,"free":0},"memory":{"total":270146174976,"systemReserved":1153433600,"allocatable":268971769856,"reserved":209715200,"free":268762054656}},"cells":[0,1]},"1":{"numberOfAssignments":4,"memoryMap":{"hugepages-1Gi":{"total":0,"systemReserved":0,"allocatable":0,"reserved":0,"free":0},"hugepages-2Mi":{"total":20971520,"systemReserved":0,"allocatable":20971520,"reserved":20971520,"free":0},"memory":{"total":270531874816,"systemReserved":0,"allocatable":270510903296,"reserved":0,"free":270510903296}},"cells":[0,1]}},"entries":{"90a01c04-cfc4-401b-bdb8-85667384f002":{"example":[{"numaAffinity":[0,1],"type":"hugepages-2Mi","size":16777216},{"numaAffinity":[0,1],"type":"memory","size":104857600}]},"bbf8fd78-3c9d-4924-b4a2-450caaca6da3":{"example":[{"numaAffinity":[0,1],"type":"hugepages-2Mi","size":25165824},{"numaAffinity":[0,1],"type":"memory","size":104857600}]}},"checksum":3930981289} </snip> #### What you expected to happen: Pod2 should be rejected. #### How to reproduce it (as minimally and precisely as possible): Steps provided above #### Anything else we need to know?: #### Environment: - Kubernetes version (use `kubectl version`): v1.22.0-rc.0+5c2f7cd - Cloud provider or hardware configuration: Openshift - OS (e.g: `cat /etc/os-release`):Red Hat Enterprise Linux CoreOS release 4.9 - Kernel (e.g. `uname -a`): Linux helix02.lab.eng.tlv2.redhat.com 4.18.0-305.12.1.rt7.84.el8_4.x86_64 #1 SMP PREEMPT_RT Thu Jul 29 14:18:12 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux Kubelet configuration <snip> spec: kubeletConfig: apiVersion: kubelet.config.k8s.io/v1beta1 authentication: anonymous: {} webhook: cacheTTL: 0s x509: {} authorization: webhook: cacheAuthorizedTTL: 0s cacheUnauthorizedTTL: 0s cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s evictionHard: memory.available: 100Mi evictionPressureTransitionPeriod: 0s fileCheckFrequency: 0s httpCheckFrequency: 0s imageMinimumGCAge: 0s kind: KubeletConfiguration kubeReserved: cpu: 1000m memory: 500Mi logging: {} memoryManagerPolicy: Static nodeStatusReportFrequency: 0s nodeStatusUpdateFrequency: 0s reservedMemory: - limits: memory: 1100Mi numaNode: 0 reservedSystemCPUs: 0-4,40-44 runtimeRequestTimeout: 0s shutdownGracePeriod: 0s shutdownGracePeriodCriticalPods: 0s streamingConnectionIdleTimeout: 0s syncFrequency: 0s systemReserved: cpu: 1000m memory: 500Mi topologyManagerPolicy: restricted volumeStatsAggPeriod: 0s machineConfigPoolSelector: matchLabels: machineconfiguration.openshift.io/role: worker-cnf </snip>
The bug should be fixed once the OpenShift is rebased on top of Kubernetes 1.23.
Not completed this sprint.
The rebase is completed and the bug should be fixed as a result of rebasing.
Versions: ======== oc version Client Version: 4.10.0-0.nightly-2022-02-02-000921 Server Version: 4.10.0-0.nightly-2022-02-03-220350 Kubernetes Version: v1.23.3+b63be7f PAO Version: "msg": { "architecture": "x86_64", "build-date": "2022-02-02T19:59:27.762163", "com.redhat.build-host": "cpt-1005.osbs.prod.upshift.rdu2.redhat.com", "com.redhat.component": "performance-addon-operator-container", "com.redhat.license_terms": "https://www.redhat.com/agreements", "description": "performance-addon-operator", "distribution-scope": "public", "io.k8s.description": "performance-addon-operator", "io.k8s.display-name": "performance-addon-operator", "io.openshift.expose-services": "", "io.openshift.maintainer.component": "Performance Addon Operator", "io.openshift.maintainer.product": "OpenShift Container Platform", "io.openshift.tags": "operator", "maintainer": "openshift-operators", "name": "openshift4/performance-addon-rhel8-operator", "release": "28", "summary": "performance-addon-operator", "upstream-vcs-ref": "7e40c978acca61ea540fb10b34e826474d6a93cf", "upstream-vcs-type": "git", "upstream-version": "0.0.41001-2-g7e40c978", "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift4/performance-addon-rhel8-operator/images/v4.10.0-28", "vcs-ref": "8473aa2255f73db5523c2a665256ed6297a99025", "vcs-type": "git", "vendor": "Red Hat, Inc.", "version": "v4.10.0" Steps: 1. create a performance profile as show below: apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance spec: cpu: isolated: 5-19,45-59,20-39,60-79 reserved: 0-4,40-44 hugepages: defaultHugepagesSize: 1G pages: - count: 20 size: 2M net: userLevelNetworking: true nodeSelector: node-role.kubernetes.io/workercnf: "" numa: topologyPolicy: restricted realTimeKernel: enabled: true 2. Create a pod using below specs: apiVersion: v1 kind: Pod metadata: name: pod1 spec: containers: - name: example-pod1 image: fedora:latest command: - sleep - inf volumeMounts: - mountPath: /hugepages-2Mi name: hugepage-2mi resources: limits: hugepages-2Mi: 24Mi memory: "24Mi" cpu: "2" requests: hugepages-2Mi: 24Mi memory: "24Mi" cpu: "2" nodeSelector: kubernetes.io/hostname: "worker-0" volumes: - name: hugepage-2mi emptyDir: medium: HugePages-2Mi 3. check Pod status: [root@registry bz-1999603]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 3m2s 10.128.2.26 worker-0 <none> <none> [root@registry bz-1999603]# oc debug node/worker-0 4. Get Memory Manager state file: sh-4.4# cat memory_manager_state {"policyName":"Static","machineState":{"0":{"numberOfAssignments":2,"memoryMap":{"hugepages-1Gi":{"total":0,"systemReserved":0,"allocatable":0,"reserved":0,"free":0},"hugepages-2Mi":{"total":20971520,"systemReserved":0,"allocatable":20971520,"reserved":20971520,"free":0},"memory":{"total":270146011136,"systemReserved":1153433600,"allocatable":268971606016,"reserved":25165824,"free":268946440192}},"cells":[0,1]},"1":{"numberOfAssignments":2,"memoryMap":{"hugepages-1Gi":{"total":0,"systemReserved":0,"allocatable":0,"reserved":0,"free":0},"hugepages-2Mi":{"total":20971520,"systemReserved":0,"allocatable":20971520,"reserved":4194304,"free":16777216},"memory":{"total":270531715072,"systemReserved":0,"allocatable":270510743552,"reserved":0,"free":270510743552}},"cells":[0,1]}},"entries":{"a9feb7f2-a1d4-4f7f-ae5f-ba5c2b60b254":{"example-pod1":[{"numaAffinity":[0,1],"type":"hugepages-2Mi","size":25165824},{"numaAffinity":[0,1],"type":"memory","size":25165824}]}},"checksum":279997132}sh-4.4# 5. Get cpus and numa nodes used by pod1 [root@registry bz-1999603]# oc exec -ti pods/pod1 -- bash -c "cat /sys/fs/cgroup/cpuset/cpuset.cpus" 5,45 [root@registry bz-1999603]# oc exec -ti pods/pod1 -- bash -c "cat /sys/fs/cgroup/cpuset/cpuset.mems" 0-1 6. Create Pod2. with below spec: [root@registry bz-1999603]# cat test2.yaml apiVersion: v1 kind: Pod metadata: name: pod2 spec: containers: - name: example-pod2 image: fedora:latest command: - sleep - inf volumeMounts: - mountPath: /hugepages-2Mi name: hugepage-2mi resources: limits: hugepages-2Mi: 16Mi memory: "16Mi" cpu: "2" requests: hugepages-2Mi: 16Mi memory: "16Mi" cpu: "2" nodeSelector: kubernetes.io/hostname: "worker-0" volumes: - name: hugepage-2mi emptyDir: medium: HugePages-2Mi 7. Create of pod2 should fail: [root@registry bz-1999603]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 8m11s 10.128.2.26 worker-0 <none> <none> pod2 0/1 ContainerStatusUnknown 0 13s 10.128.2.27 worker-0 <none> <none> 8. Check the pod2 status [root@registry bz-1999603]# oc describe pods/pod2 Name: pod2 Namespace: default Priority: 0 Node: worker-0/10.46.80.2 Start Time: Fri, 04 Feb 2022 04:23:52 -0500 Labels: <none> Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.128.2.27/23"],"mac_address":"0a:58:0a:80:02:1b","gateway_ips":["10.128.2.1"],"ip_address":"10.128.2.27/23"... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.2.27" ], "mac": "0a:58:0a:80:02:1b", "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.2.27" ], "mac": "0a:58:0a:80:02:1b", "default": true, "dns": {} }] Status: Failed Reason: TopologyAffinityError Message: Pod Resources cannot be allocated with Topology locality IP: 10.128.2.27 IPs: IP: 10.128.2.27 Containers: example-pod2: Container ID: Image: fedora:latest Image ID: Port: <none> Host Port: <none> Command: sleep inf State: Terminated Reason: ContainerStatusUnknown Message: The container could not be located when the pod was terminated Exit Code: 137 Started: Mon, 01 Jan 0001 00:00:00 +0000 Finished: Mon, 01 Jan 0001 00:00:00 +0000 Ready: False Restart Count: 0 Limits: cpu: 2 hugepages-2Mi: 16Mi memory: 16Mi Requests: cpu: 2 hugepages-2Mi: 16Mi memory: 16Mi Environment: <none> Mounts: /hugepages-2Mi from hugepage-2mi (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5529z (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: hugepage-2mi: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: HugePages-2Mi SizeLimit: <unset> kube-api-access-5529z: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Guaranteed Node-Selectors: kubernetes.io/hostname=worker-0 Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 22s default-scheduler Successfully assigned default/pod2 to worker-0 Warning TopologyAffinityError 23s kubelet Resources cannot be allocated with Topology locality Normal AddedInterface 20s multus Add eth0 [10.128.2.27/23] from ovn-kubernetes Normal Pulling 20s kubelet Pulling image "fedora:latest" Normal Pulled 16s kubelet Successfully pulled image "fedora:latest" in 3.597402945s Warning Failed 15s kubelet Error: container create failed: parent closed synchronisation channel As seen pod2 gets rejected
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056