This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2228103 - virt-launcher does not start with isolateEmulator thread and an even CPU count
Summary: virt-launcher does not start with isolateEmulator thread and an even CPU count
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.14.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.15.0
Assignee: Ram Lavi
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-01 12:12 UTC by Orel Misan
Modified: 2024-07-31 06:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-14 16:14:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 10593 0 None Merged isolateEmulatorThread: Add full-pcpu-only support 2024-01-04 06:38:03 UTC
Github kubevirt kubevirt pull 10757 0 None Merged [release 1.1] isolateEmulatorThread: Add full-pcpu-only support 2024-01-04 06:38:05 UTC
Github kubevirt kubevirt pull 10783 0 None Merged Housekeeping cgroup: Add full-pcpu-only support 2023-12-03 08:06:28 UTC
Github kubevirt kubevirt pull 10839 0 None Merged virt-launcher, vcpu: Fix EmulatorThreadPin assign strategy 2024-01-04 06:38:06 UTC
Github kubevirt kubevirt pull 10872 0 None Merged IsolateEmulatorThread: Add cluster-wide parity completion setting 2024-01-04 06:38:07 UTC
Red Hat Issue Tracker   CNV-31584 0 None None None 2023-12-14 16:14:49 UTC

Description Orel Misan 2023-08-01 12:12:00 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
Create the following VMI:
```
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  annotations:
    cpu-load-balancing.crio.io: disable
    cpu-quota.crio.io: disable
    irq-load-balancing.crio.io: disable
  name: dpdk-vmi
spec:
  domain:
    cpu:
      cores: 8
      dedicatedCpuPlacement: true
      isolateEmulatorThread: true
      model: host-model
      sockets: 1
      threads: 1
    devices:
      disks:
        - disk:
            bus: virtio
          name: rootdisk
        - disk:
            bus: virtio
          name: cloudinitdisk
      networkInterfaceMultiqueue: true
      rng: {}
    memory:
      guest: 4Gi
      hugepages:
        pageSize: 1Gi
  terminationGracePeriodSeconds: 0
  volumes:
    - containerDisk:
        image: 'quay.io/kiagnose/kubevirt-dpdk-checkup-vm:main'
        imagePullPolicy: Always
      name: rootdisk
    - cloudInitNoCloud:
        userData: |
          #cloud-config
          user: cloud-user
          password: 0tli-pxem-xknu
          chpasswd:
            expire: false
      name: cloudinitdisk
```

Actual results:
virt-launcher pod does not start.
kubelet emits the following event:
```
SMT Alignment Error: requested 9 cpus not multiple cpus per core = 2
```


Expected results:
The VMI should start.

Additional info:

Comment 1 Ram Lavi 2023-09-06 11:34:11 UTC
note that this problem occurs only on nodes where SMT is enabled (enabled by default on most Intel processors).

Comment 2 Itamar Holder 2023-09-07 10:45:51 UTC
I'm closing it as not-a-bug, let me explain why.

There are 3 levels of resource allocations:
1) The guest's virtual topology
2) The Pod's resources
3) The host's resources

As part of our VM manifests, we support the first two levels. For example, a VM creator can decide to define the guest with 1 socket, 4 cores and 2 threads, and the pod to be defined with 4 CPUs. However, we do not have any support for deciding how the host will provide these resources to the pod.

Actually, the error comes from Kubernetes' CPU Manager's policies. I think it was added here [1][2][3]. In any case, components like CPU Manager and OpenShift's Performance Addon Operator are responsible of configuring how the pods will be allocated with host resources.

For this reason, this is outside of CNV's scope and needs to be fixed elsewhere. We can consider to reach out to upstream Kubernetes or Openshift's PAO for a fix from their side.
I would personally love to help pushing this into Kubernetes if we think it's the right path forward. But in any case, this is IMO a feature request more than a bug.

In the meantime, a workaround is simple: to allocate one more CPU. This is not a bad workaround IMO as (with the right policy from the other components I mentioned) it would just mean that the second hyperthread from the same core would stay empty, keeping the high performance needed for such VMs.

Another important note: there is no relation between the guest's virtual topology and the host's topology. These two are completely independent of one another.

[1] https://github.com/kubernetes/enhancements/issues/2625
[2] https://github.com/kubernetes/enhancements/pull/2626
[3] https://github.com/kubernetes/kubernetes/pull/101432

Comment 3 sgott 2023-10-02 13:45:29 UTC
Under the justification that both RT workloads and DPDK will allocate an odd number of PCPUs by default, this issue is unavoidable. Thus our out-of-the-box behavior is problematic. Re-opening this BZ because of that.

Comment 4 Orel Misan 2023-10-02 13:55:17 UTC
High performance VMs, such as VMs using DPDK or realtime are sensitive to interference from neighboring workloads.

It is possible to define the kubelet’s CPU manager policy to static - `full-pcpus-only`[1], so it would only allocate full physical cores - in nodes with SMT enabled, that means allocating both threads.
The implication is that pods with an odd number of CPUs are rejected by the kubelet after being scheduled to the node - emitting the following event:

“SMT Alignment Error: requested 9 cpus not multiple cpus per core = 2”

We have created an automated test (kubevirt-dpdk-checkup) that creates two VirtualMachineInstance and runs DPDK traffic between them.

The two VMIs the checkup creates have an even number of CPUs.
Before we have started using KubeVirt’s IsolateEmulatorThread[2] option, the checkup had suffered from packet losses.

When adding the IsolateEmulatorThread option, KubeVirt adds an additional CPU to the virt-launcher pod’s requests and limits, to be used by QEMU.
This in combination with the full-pcpus-only CPU manager policy, causes the virt-launcher pod to be rejected.

As a workaround we have disabled SMT on our worker node.
This workaround is unacceptable by our stakeholders.

We wish to create a solution that will cause KubeVirt to add extra CPUs to the virt-launcher pod so that the total sum of CPUs will be even.
This behavior is wanted only when the full-pcpus-only CPU manager policy is enabled.


We use the following MCP:
```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-dpdk
  labels:
    machineconfiguration.openshift.io/role: worker-dpdk
spec:
  machineConfigSelector:
    matchExpressions:
      - key: machineconfiguration.openshift.io/role
        operator: In
        values:
          - worker
          - worker-dpdk
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-dpdk: ""
```

We use the following PerformanceProfile:
```
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: profile-1
spec:
  cpu:
    isolated: 4-39,44-79
    reserved: 0-3,40-43
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 32
      node: 0
      size: 1G
  net:
    userLevelNetworking: true
  nodeSelector:
    node-role.kubernetes.io/worker-dpdk: ""
  numa:
    topologyPolicy: single-numa-node
```

[1] https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy-options

[2] https://kubevirt.io/user-guide/virtual_machines/dedicated_cpu_resources/#requesting-dedicated-cpu-for-qemu-emulator

Comment 5 Ram Lavi 2023-10-16 09:24:29 UTC
design draft: https://github.com/kubevirt/community/pull/247

Comment 6 Ram Lavi 2023-10-22 12:47:21 UTC
implementation PR draft: https://github.com/kubevirt/kubevirt/pull/10593

Comment 7 Ram Lavi 2023-11-21 09:24:27 UTC
backport PR to 4.15
https://github.com/kubevirt/kubevirt/pull/10757


Note You need to log in before you can comment on or make changes to this bug.