Bug 2159408

Summary: [s390x] VMs with ISM passthrough don't autostart after leapp upgrade from RHEL 8
Product: Red Hat Enterprise Linux 9 Reporter: smitterl
Component: qemu-kvmAssignee: Cédric Le Goater <clegoate>
qemu-kvm sub component: Machine Types QA Contact: smitterl
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: bfu, clegoate, mkluson, pstodulk, thuth, virt-maint
Version: 9.2Keywords: Regression, Triaged
Target Milestone: rc   
Target Release: 9.2   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-7.2.0-5.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 07:23:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2144443    

Description smitterl 2023-01-09 14:16:34 UTC
Description of problem:

We expect VMs with node devices such as ISM to autostart after a major dist upgrade via leapp without any changes to the VM setup.

This doesn't work currently when an ISM is attached.

The root cause seems to be that the machine type s390-ccw-virtio-rhel8.6.0 behaves differently when on RHEL 8 vs. RHEL 9. On RHEL 8 it is the latest available machine type and it allows for ISM passthrough. However, on RHEL 9, currently only  s390-ccw-virtio-rhel9.2.0 supports ISM passthrough.


Version-Release number of selected component (if applicable):
qemu-kvm-7.2.0-2.el9.s390x.rpm

How reproducible:
100%

Steps to Reproduce:
1. Define a VM with ISM passthrough device on RHEL 8. Confirm that the VM is correctly set up to autostart after host reboot.
2. Upgrade the host to RHEL 9 using leapp.
3. Reboot.

Actual results:
The VM is not autostarted.

Expected results:
The VM is autostarted.

Additional info:
After setting the machine type to -rhel9.2.0 the VM autostarts.

Comment 1 Thomas Huth 2023-01-10 10:26:59 UTC
To provide some additional information: This issue happens because the "enhanced zPCI interpretation feature" normally would require a new QEMU machine type to get enabled by default. In RHEL9, we continue to rebase qemu-kvm and thus introduced new QEMU machine type (rhel9.2.0) where we could enable this zPCI feature by default (see https://bugzilla.redhat.com/show_bug.cgi?id=1871117). We did not enable it there in old rhel8.6.0 machine type to keep it stable.

In RHEL 8.8, we didn't get a new machine type in QEMU anymore. So only option there was to enable it in the old rhel8.6.0 machine type - which could cause a slightly different behavior in the guests. After lots of discussions, it was decided that it should be OK there since VMs with PCI devices could not be migrated on s390x anyway, so we did not expect a problem with live migration. We did not consider leapp update, though.

So we're now in the situation that the zPCI feature is enabled in the rhel8.6.0 machine type on RHEL 8.8 by default, but it is disabled in the rhel8.6.0 machine type on RHEL 9.2 by default, which causes the trouble when updating from 8.8 to 9.2 via leapp.

I basically see three options to deal with this issue now:

1) Revert the default behavior in RHEL 8.8 to not enable the zPCI interpretation feature by default.
2) Enable the zPCI interpretation feature by default in the rhel8.6.0 machine type in RHEL 9.2
3) Keep the current behavior and document it properly (i.e. that leapp update with zPCI passthrough device needs some manually tweaking after the update)

Option 1 sounds like a wise and appealing choice to really make sure that the rhel8.6.0 machine type is "consistent" in any case, also with leapp upgrades. However, after a quick glance at libvirt, it seems like there is no config knob there to easily manually enable the zPCI interpretation feature for guests (without messing with <qemu:commandline> tags), so the feature would effectively become useless in RHEL 8.8. Thus this is likely a no-go ?

Option 2 would be an easy fix for *this* problem here ... but there is still a risk that it messes up other things in the machine type, so I'm not sure whether we should do this change, especially since we're in the middle of the current development cycle already.

Option 3 ... sounds OK to me, but we need to find a proper place where this information can be added to.

Has anybody else preferences or other opinions?

Comment 3 IBM Bug Proxy 2023-01-11 11:40:32 UTC
------- Comment From Niklas.Schnelle 2023-01-11 06:35 EDT-------
Hmm, I'd tend to option 2. For me the machine type is most relevant from the guest
point of view and here the zPCI interpretation enablement should only be visible
as better performance. The only other guest visible changes to some fields like "Max Store Block Length" should be limited to ISM devices which just fail attach
without zPCI interpretation so don't become visible to the guest at all.

Another point is that ISM pass-through is currently in my opinion the most useful
PCI pass-through that we have as it is basically unaffected by the overhead of
the virtualized guest IOMMU and using an ISM device with SMC-D between guests and/or other LPARs should result in very attractive performance.

Comment 4 IBM Bug Proxy 2023-01-11 12:20:25 UTC
------- Comment From cborntra.com 2023-01-11 07:10 EDT-------
I agree. 2 Seems to be the best way. that will make sure that the 8.6 machine is the same on 8.8 and 9.2.
FWIW, instances without interpretion support will not be able to use ISM devices. So we could argue that this is actually a fix.

Option 1 would make ISM unusable for RHEL8 and RHEL9 with an 8.6 machine.

Comment 6 Thomas Huth 2023-01-11 13:59:34 UTC
After many discussions, most people seem to be in favor of option 2, and the risk of changing the rhel8.6.0 machine type in RHEL9.2 is expected to be quite low. So I think we'll go with that option.

Comment 7 IBM Bug Proxy 2023-01-11 14:20:59 UTC
------- Comment From mjrosato.com 2023-01-11 09:12 EDT-------
Sorry, I'm a bit late to the party here -- but FWIW I also agree that Option 2 would make the most sense.

Comment 10 smitterl 2023-01-18 13:48:01 UTC
Pre-verified with qemu-kvm-7.2.0-5.el9

Comment 14 smitterl 2023-01-24 14:17:03 UTC
Verified with qemu-kvm-7.2.0-5.el9.s390x

Comment 16 errata-xmlrpc 2023-05-09 07:23:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2162