Bug 2074559

Summary: RFE core scheduling support in libvirt
Product: Red Hat Enterprise Linux 9 Reporter: Stefan Hajnoczi <stefanha>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
libvirt sub component: General QA Contact: Luyao Huang <lhuang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: berrange, dzheng, jdenemar, jmario, jsuchane, lmen, mprivozn, smitterl, virt-maint, xuzhang
Version: 9.0Keywords: FutureFeature, Triaged, Upstream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-8.9.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 07:26:11 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 8.9.0
Embargoed:

Description Stefan Hajnoczi 2022-04-12 13:43:29 UTC
Core scheduling has landed in Linux. Users running untrusted guests on SMT CPUs may wish to enable it to protect against information leaks.

Libvirt probably needs to configure core scheduling when launching QEMU:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/hw-vuln/core-scheduling.rst

Comment 4 Daniel Berrangé 2022-04-20 10:28:36 UTC
The current scenario

 - The default configuration for QEMU is to not assign any CPU affinity mask for vCPUs, I/O threads, or emulator threads, or rather explicitly set an all-1s mask
 - The default configuration for hardware is usually to enable SMT
 - The default configuration for the Linux KVM host is to schedule across any host CPUs
 - The machine.slice may restrict VMs to some CPUs

Given this scenarios, the out of the box deployment for KVM is vulnerable to various information leakage attacks due to various CPU side channel/speculative execution vulnerabilities.

IOW, core scheduling should be considered to be a security fix / mitigation from a KVM POV.

Until now the only mitigations were to disable SMT (which reduces capacity) or do CPU affinity for VMs (which impacts VM management flexibility & VM density).

In practice neither of these is especially viable, so I expect most customers have done neither and simply ignored the security risks inherant in SMT.

This core scheduling finally gives us a viable mitigation that I expect customers/layered products would be willing to deploy in most scenarios.

Ideally we would enable this by default out of the box, however, there are enough caveats in the kernel docs that I think this could be risky in terms of causing performance regressions for customers in some scenarios. 

So reluctantly we probably need a config knob in libvirt and have mgmt apps (OSP, CNV, virt-manager, virt-install, cockpit, etc) explicitly opt-in when provisioning new VMs.

Comment 9 Michal Privoznik 2022-05-09 15:02:57 UTC
RFC patches posted on the list:

https://listman.redhat.com/archives/libvir-list/2022-May/230902.html

Comment 13 Michal Privoznik 2022-10-06 13:50:52 UTC
Resend of v4:

https://listman.redhat.com/archives/libvir-list/2022-October/234710.html

Comment 14 Michal Privoznik 2022-10-20 07:06:59 UTC
And merged upstream as:

ab966b9d31 qemu: Enable for vCPUs on hotplug
d942422482 qemu: Enable SCHED_CORE for vCPUs
000477115e qemu: Enable SCHED_CORE for helper processes
279527334d qemu_process: Enable SCHED_CORE for QEMU process
4be75216be qemu_domain: Introduce qemuDomainSchedCoreStart()
6a1500b4ea qemu_conf: Introduce a knob to set SCHED_CORE
bd481a79d8 virCommand: Introduce APIs for core scheduling
c935cead2d virprocess: Core Scheduling support

v8.8.0-169-gab966b9d31

There's still some follow up work needed for this to be automatically enabled though.

Comment 17 Luyao Huang 2022-11-11 08:46:21 UTC
Verify this bug with libvirt-8.9.0-2.el9.x86_64:

S1: Test sched_core = "vcpus"

1. prepare a guest which have virtiofs and current vcpu < maxvcpu

# virsh dumpxml vm1

<vcpu placement='static' current='2'>10</vcpu>

    <filesystem type='mount' accessmode='passthrough'>
      <driver type='virtiofs' queue='1024'/>
      <binary path='/usr/libexec/virtiofsd' xattr='on'/>
      <source dir='/mount/test'/>
      <target dir='test'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </filesystem>

2. set sched_core = "vcpus" in qemu.conf and restart virtqemud

# echo 'sched_core = "vcpus"' >> /etc/libvirt/qemu.conf

# service virtqemud restart
Redirecting to /bin/systemctl restart virtqemud.service

3. start guest
# virsh start vm1
Domain 'vm1' started

4. check cookie values of qemu emulator, vcpus, helper processes. And only vcpus's cookie != 0

emulator:
# ./get-cookie 85455
process 85455 cookie is 0

vcpus:
# ./get-cookie 85480 85481
process 85480 cookie is 4254838555
process 85481 cookie is 4254838555

helper processes:
# ./get-cookie 85444 85446
process 85444 cookie is 0
process 85446 cookie is 0

5. hotplug vcpu via setvcpus and setvcpu, and check new vcpus' cookie:
# virsh setvcpus vm1 4

# virsh setvcpu vm1 --enable 9

# ./get-cookie 85956 85958 85974
process 85956 cookie is 4254838555
process 85958 cookie is 4254838555
process 85974 cookie is 4254838555


And change the sched_core value and retest with the same steps, got:

when sched_core = "none": qemu emulator, vcpus, helper processes' cookies are 0
when sched_core = "emulator": qemu emulator, vcpus have the same positive integer cookies, helper processes' cookies are 0
when sched_core = "full": qemu emulator, vcpus, helper processes' cookies are the same positive integer

Comment 19 errata-xmlrpc 2023-05-09 07:26:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171