Bug 1850351 - RHEL8.3 Alpha - Stale libvirt cache leads to VM startup failures (kvm)
Summary: RHEL8.3 Alpha - Stale libvirt cache leads to VM startup failures (kvm)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: 8.2
Assignee: Jiri Denemark
QA Contact: yalzhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-24 06:37 UTC by Jiri Denemark
Modified: 2020-11-02 07:00 UTC (History)
15 users (show)

Fixed In Version: libvirt-6.0.0-25.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1848997
Environment:
Last Closed: 2020-07-28 07:14:00 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3172 0 None None None 2020-07-28 07:14:03 UTC

Description Jiri Denemark 2020-06-24 06:37:54 UTC
+++ This bug was initially created as a clone of Bug #1848997 +++

Stale libvirt cache leads to VM startup failures
  
---Additional Hardware Info---
Z15 with IBM Secute Execution 
 
Machine Type = 8562 (IBM Z15) 
 
---Steps to Reproduce---
1. Install RHEL 8.3 in the LPAR
2. Install QEMU 4.2 with SE patches
3. Modify the host kernel command line to include prot_virt=1, run zipl and reboot.
4. Define at least one KVM guest with host CPU model and start and stop it
5. Define a secure KVM guest using the host CPU model and start and stop it.
6. Change back the host kernel command line, re-run zipl, reboot.
7. Try to start the first KVM guest, which fails with a message like:
error: internal error: qemu unexpectedly closed the monitor: 2020-04-23T13:55:30.889152Z qemu-system-s390x: Some features requested in the CPU model are not available in the configuration: unpack  

The reason for that is that libvirt caches the domaincapabilities reported during the first boot and doesn't update them after the reboot in step 5 even though changing the prot_virt= in the command line changes the CPU features as reported by domcapabilities. So even though the guest may not require the unpack feature, libvirt constructs a CPU model which can't be satisfied on this configuration.

The issue also occurs the other way around, going from prot_virt=0 to prot_virt=1, in which case the guest will fail to boot as it requires the unpack feature.

Manually removing the content of /var/cache/libvirt/qemu/capabilities/ will force libvirt to refresh it's capabilities cache and temporarily resolve the situation.


Info from Boris:

Please note that the patch series has been accepted upstream.
It will be released as part of libvirt v6.5.0.

Here is the list of the commit ids:
c5fffb959d util: Introduce a parser for kernel cmdline arguments
b611b620ce qemu: Check if s390 secure guest support is enabled
657365e74f qemu: Check if AMD secure guest support is enabled
0254ceab82 tools: Secure guest check on s390 in virt-host-validate
4b561d49ad tools: Secure guest check for AMD in virt-host-validate
2c3ffa3728 docs: Update AMD launch secure description
f0d0cd6179 docs: Describe protected virtualization guest setup

The issue described in this bugzilla (stale capability cache) should be resolved already by the first two patches of the series. Patch 4 extends virt-host-validate with checks of the hosts readiness to support IBM Secure Execution and patch 6 provides documentation how to setup and use IBM Secure Execution with libvirt on linux on Z.

--- Additional comment from Hanns-Joachim Uhl on 2020-06-19 14:26:21 UTC ---

fyi ... IBM will do fix verification ... setting OtherQA ...

Comment 5 yalzhang@redhat.com 2020-06-30 02:13:40 UTC
Hi Jiri, I have done a simple test on x86 AMD host, and I have some questions, could you please help to have a look? Thank you!

1) Do you think the scenarios in 2 is enough for AMD host?

2) Is it acceptable to show "Unknown if this platform has Secure Guest support" on Intel host?

1. There may be a typo here:
diff --git a/docs/kbase/launch_security_sev.rst b/docs/kbase/launch_security_sev.rst
index 65f258587d..19b978481a 100644
--- a/docs/kbase/launch_security_sev.rst
+++ b/docs/kbase/launch_security_sev.rst
@@ -30,8 +30,11 @@ Enabling SEV on the host
 ========================
 
 Before VMs can make use of the SEV feature you need to make sure your
-AMD CPU does support SEV. You can check whether SEV is among the CPU
-flags with:
+AMD CPU does support SEV. You can run ``libvirt-host-validate``
=====> should be "virt-host-validate"


2.  Test on AMD x86_64 host with sev on libvirt-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64, the result is as expected:
# lscpu |grep sev
...ssbd mba sev …

# cat /sys/module/kvm_amd/parameters/sev
0
# ll /dev/sev
crw-------. 1 root root 10, 57 Jun 29  2020 /dev/sev

# virt-host-validate
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for device assignment IOMMU support                         : PASS
  QEMU: Checking if IOMMU is enabled by kernel                               : PASS
  QEMU: Checking for secure guest support                                    : WARN (AMD Secure Encrypted Virtualization appears to be disabled in kernel. Add kvm_amd.sev=1 to the kernel cmdline arguments)

after adding the kvm_amd.sev=1 in kernel cmdline and reboot:
# virt-host-validate
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for device assignment IOMMU support                         : PASS
  QEMU: Checking if IOMMU is enabled by kernel                               : PASS
  QEMU: Checking for secure guest support                                    : PASS

# ll -Z  /dev/sev
crw-------. 1 root root system_u:object_r:sev_device_t:s0 10, 57 Jun 29 21:40 /dev/sev

# cat /sys/module/kvm_amd/parameters/sev
1

3. Test on Intel host:
# virt-host-validate
...
  QEMU: Checking if IOMMU is enabled by kernel                               : PASS
  QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)

Comment 6 yalzhang@redhat.com 2020-06-30 03:40:42 UTC
4.test on AMD host which do not support "SEV"
# lscpu | grep sev
#
# lscpu | grep sev
[root@hp-dl385g10-16 ~]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           2
NUMA node(s):        8
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               1
Model name:          AMD EPYC 7251 8-Core Processor

# ll -Z /dev/sev
crw-------. 1 root root system_u:object_r:sev_device_t:s0 10, 58 Jun 29 22:21 /dev/sev

# virt-host-validate
...
QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)

Please help to check if this is expected, Thank you!

Comment 7 Hanns-Joachim Uhl 2020-06-30 07:39:49 UTC
(In reply to Jiri Denemark from comment #0)
...
> 
> --- Additional comment from Hanns-Joachim Uhl on 2020-06-19 14:26:21 UTC ---
> 
> fyi ... IBM will do fix verification ... setting OtherQA ...
.
fyi .. with this Red Hat bugzilla being morphed from an s390x into an x86_64 bugzilla
I am removing the OtherQA flag for IBM now ...

Comment 8 Erik Skultety 2020-07-08 10:46:35 UTC
(In reply to yalzhang from comment #5)
> Hi Jiri, I have done a simple test on x86 AMD host, and I have some
> questions, could you please help to have a look? Thank you!
> 
> 1) Do you think the scenarios in 2 is enough for AMD host?
> 
> 2) Is it acceptable to show "Unknown if this platform has Secure Guest
> support" on Intel host?
> 
> 1. There may be a typo here:
> diff --git a/docs/kbase/launch_security_sev.rst
> b/docs/kbase/launch_security_sev.rst
> index 65f258587d..19b978481a 100644
> --- a/docs/kbase/launch_security_sev.rst
> +++ b/docs/kbase/launch_security_sev.rst
> @@ -30,8 +30,11 @@ Enabling SEV on the host
>  ========================
>  
>  Before VMs can make use of the SEV feature you need to make sure your
> -AMD CPU does support SEV. You can check whether SEV is among the CPU
> -flags with:
> +AMD CPU does support SEV. You can run ``libvirt-host-validate``
> =====> should be "virt-host-validate"
> 

Sigh, I overlooked ^this during the review, good catch, I'll fix it in upstream.

> 
> 2.  Test on AMD x86_64 host with sev on
> libvirt-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64, the result is as
> expected:
> # lscpu |grep sev
> ...ssbd mba sev …
> 
> # cat /sys/module/kvm_amd/parameters/sev
> 0
> # ll /dev/sev
> crw-------. 1 root root 10, 57 Jun 29  2020 /dev/sev
> 
> # virt-host-validate
>   QEMU: Checking for hardware virtualization                                
> : PASS
>   QEMU: Checking if device /dev/kvm exists                                  
> : PASS
>   QEMU: Checking if device /dev/kvm is accessible                           
> : PASS
>   QEMU: Checking if device /dev/vhost-net exists                            
> : PASS
>   QEMU: Checking if device /dev/net/tun exists                              
> : PASS
>   QEMU: Checking for cgroup 'cpu' controller support                        
> : PASS
>   QEMU: Checking for cgroup 'cpuacct' controller support                    
> : PASS
>   QEMU: Checking for cgroup 'cpuset' controller support                     
> : PASS
>   QEMU: Checking for cgroup 'memory' controller support                     
> : PASS
>   QEMU: Checking for cgroup 'devices' controller support                    
> : PASS
>   QEMU: Checking for cgroup 'blkio' controller support                      
> : PASS
>   QEMU: Checking for device assignment IOMMU support                        
> : PASS
>   QEMU: Checking if IOMMU is enabled by kernel                              
> : PASS
>   QEMU: Checking for secure guest support                                   
> : WARN (AMD Secure Encrypted Virtualization appears to be disabled in
> kernel. Add kvm_amd.sev=1 to the kernel cmdline arguments)

Correct.

> 
> after adding the kvm_amd.sev=1 in kernel cmdline and reboot:
> # virt-host-validate
>   QEMU: Checking for hardware virtualization                                
> : PASS
>   QEMU: Checking if device /dev/kvm exists                                  
> : PASS
>   QEMU: Checking if device /dev/kvm is accessible                           
> : PASS
>   QEMU: Checking if device /dev/vhost-net exists                            
> : PASS
>   QEMU: Checking if device /dev/net/tun exists                              
> : PASS
>   QEMU: Checking for cgroup 'cpu' controller support                        
> : PASS
>   QEMU: Checking for cgroup 'cpuacct' controller support                    
> : PASS
>   QEMU: Checking for cgroup 'cpuset' controller support                     
> : PASS
>   QEMU: Checking for cgroup 'memory' controller support                     
> : PASS
>   QEMU: Checking for cgroup 'devices' controller support                    
> : PASS
>   QEMU: Checking for cgroup 'blkio' controller support                      
> : PASS
>   QEMU: Checking for device assignment IOMMU support                        
> : PASS
>   QEMU: Checking if IOMMU is enabled by kernel                              
> : PASS
>   QEMU: Checking for secure guest support                                   
> : PASS
> 
> # ll -Z  /dev/sev
> crw-------. 1 root root system_u:object_r:sev_device_t:s0 10, 57 Jun 29
> 21:40 /dev/sev
> 
> # cat /sys/module/kvm_amd/parameters/sev
> 1

Correct.

However, what I'm missing in the ^above is actually trying to start a machine with SEV, we can't rely solely on what virt-host-validate says, it should follow the reproducer in comment0 for an AMD EPYC CPU.

> 
> 3. Test on Intel host:
> # virt-host-validate
> ...
>   QEMU: Checking if IOMMU is enabled by kernel                              
> : PASS
>   QEMU: Checking for secure guest support                                   
> : WARN (Unknown if this platform has Secure Guest support)

Again, correct! This is because we don't check Secure Guest on Intel, so it would be incorrect to say "support=no" while Intel may have MKTME (or its ancestor) available in the CPU.

Comment 9 Erik Skultety 2020-07-08 10:54:18 UTC
(In reply to yalzhang from comment #6)
> 4.test on AMD host which do not support "SEV"
> # lscpu | grep sev
> #
> # lscpu | grep sev
> [root@hp-dl385g10-16 ~]# lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              32
> On-line CPU(s) list: 0-31
> Thread(s) per core:  2
> Core(s) per socket:  8
> Socket(s):           2
> NUMA node(s):        8
> Vendor ID:           AuthenticAMD
> CPU family:          23
> Model:               1
> Model name:          AMD EPYC 7251 8-Core Processor
> 
> # ll -Z /dev/sev
> crw-------. 1 root root system_u:object_r:sev_device_t:s0 10, 58 Jun 29
> 22:21 /dev/sev
> 
> # virt-host-validate
> ...
> QEMU: Checking for secure guest support                                    :
> WARN (Unknown if this platform has Secure Guest support)

This one is a bit tricky, because we do check AMD CPU for SEV, so technically we should be able to say 'no' directly in this case. On the other hand if there's a newer revision of SEV (newer than SNP) it might happen that the way how SEV is detected could slightly change or require further steps, in that case it's okay to say "Unknown". Personally, I'd stick with this unless someone complains that we should change it, this BZ is fixed by the first 2 commits after all, just like comment0 mentions.

Comment 10 yalzhang@redhat.com 2020-07-10 13:26:01 UTC
Start vm as the steps as below:
1. on AMD system, enable sev support:
# cat /sys/module/kvm_amd/parameters/sev
0
# rmmod kvm_amd
#  modprobe kvm_amd sev=1
# cat /sys/module/kvm_amd/parameters/sev
1
# virt-host-validate
...
  QEMU: Checking if IOMMU is enabled by kernel                               : PASS
  QEMU: Checking for secure guest support                                    : PASS

2. Start vm with SEV setting:
# virsh console vm
[root@localhost ~]# dmesg | grep SEV
[    0.001000] AMD Secure Encrypted Virtualization (SEV) active
[    3.224586] software IO TLB: SEV is active and system is using DMA bounce buffers

Set this bug to verified as comment 5~9

Comment 12 errata-xmlrpc 2020-07-28 07:14:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3172


Note You need to log in before you can comment on or make changes to this bug.