Bug 1467599 - Unable to start domain: the CPU is incompatible with host CPU: Host CPU does not provide required features: svm
Unable to start domain: the CPU is incompatible with host CPU: Host CPU does ...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: qemu (Show other bugs)
26
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Fedora Virtualization Maintainers
Fedora Extras Quality Assurance
AcceptedBlocker
: Reopened
: 1465747 (view as bug list)
Depends On:
Blocks: F26FinalBlocker
  Show dependency treegraph
 
Reported: 2017-07-04 06:17 EDT by Kamil Páral
Modified: 2017-08-22 16:43 EDT (History)
25 users (show)

See Also:
Fixed In Version: qemu-2.9.0-1.fc26.1 qemu-2.9.0-5.fc26
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-22 16:43:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kamil Páral 2017-07-04 06:17:27 EDT
Description of problem:
I have installed a fresh installation of F26 Workstation and tried to run F25 Workstation Live in gnome-boxes. The iso is downloaded, but the VM fails to start. Boxes say "Failed to start Fedora 25 Workstation". In system journal, I see this:

Jul 04 12:08:05 localhost.localdomain gnome-boxes[5347]: machine.vala:611: Failed to start Fedora 25 Workstation: Unable to start domain: the CPU is incompatible with host CPU: Host CPU does not provide required features: svm

That error seems to be mentioned in other places as well, e.g. bug 1386223 or https://github.com/vagrant-libvirt/vagrant-libvirt/issues/667 .

My CPU is:
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 56
Model name:            AMD A10-7870K Radeon R7, 12 Compute Cores 4C+8G
Stepping:              1
CPU MHz:               1700.000
CPU max MHz:           3900.0000
CPU min MHz:           1700.0000
BogoMIPS:              7780.92
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             96K
L2 cache:              2048K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate proc_feedback vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov



Version-Release number of selected component (if applicable):
gnome-boxes-3.24.0-3.fc26.x86_64
libvirt-daemon-3.2.1-3.fc26.x86_64

How reproducible:
always (on this machine)

Steps to Reproduce:
1. start boxes and create a new VM with URL http://download.fedoraproject.org/pub/fedora/linux/releases/25/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-25-1.3.iso
2. see the image downloaded
3. see the VM fails to start, even if you attempt repeatedly, with message "failed to start ..."

Additional info:
Might be only related to AMD CPUs (even though the reports over the internet also mention this happening in Intel CPUs), will test.
Comment 1 Kamil Páral 2017-07-04 06:50:10 EDT
The problem doesn't occur on my Intel laptop:

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               61
Model name:          Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
Stepping:            4
CPU MHz:             1700.219
CPU max MHz:         3200.0000
CPU min MHz:         500.0000
BogoMIPS:            5188.14
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            4096K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt dtherm ida arat pln pts


On yet another Intel CPU box, the problem also doesn't happen:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Model name:            Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
Stepping:              9
CPU MHz:               1600.000
CPU max MHz:           3600.0000
CPU min MHz:           1600.0000
BogoMIPS:              6400.48
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts


But on another AMD box, I can again reproduce the failure:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 1
Model name:            AMD FX(tm)-4100 Quad-Core Processor
Stepping:              2
CPU MHz:               1400.000
CPU max MHz:           3600.0000
CPU min MHz:           1400.0000
BogoMIPS:              7802.31
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              8192K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold


So it seems pretty certain this affects only AMD CPUs, probably all of them.
Comment 2 Kamil Páral 2017-07-04 06:59:37 EDT
Not only gnome-boxes don't work, virt-manager doesn't work either. So this is most probably a libvirt issue. When using virt-manager, I see the issue coming from libvirt:

Jul 04 12:56:00 localhost.localdomain libvirtd[862]: 2017-07-04 10:56:00.830+0000: 900: error : virCPUx86Compare:1707 : the CPU is incompatible with host CPU: Host CPU does not provide required features: fma, f16c, svm, tbm


In virt-manager, by default the CPU model is set to "Opteron_G4". If I enable "copy host CPU configuration" instead ("host-model" model), the VM starts fine. It seems that libvirt started misidentifying AMD CPUs.
Comment 3 Kamil Páral 2017-07-04 07:09:40 EDT
The AMD box from comment 0 is being detected as Opteron_G5, the AMD box from 1 as Opteron_G4.

Proposing as an F26 blocker due to:
"All applications that can be launched using the standard graphical mechanism of a release-blocking desktop after a default installation of that desktop must start successfully and withstand a basic functionality test. "
https://fedoraproject.org/wiki/Fedora_26_Final_Release_Criteria#Default_application_functionality
and
"The release must be able host virtual guest instances of the same release. "
https://fedoraproject.org/wiki/Fedora_26_Beta_Release_Criteria#Virtualization_requirements
which is violated for AMD CPUs.

(I also verified the same problem occurs with F26 Workstation Live guest).
Comment 4 Matthew Miller 2017-07-04 10:52:06 EDT
Something weird is going on, because as you can see, svm *is* listed for your host CPU.
Comment 5 Matthew Miller 2017-07-04 11:40:22 EDT
This seems bad, but it also seems super-weird to crop up at the last minute :(
Comment 6 Adam Williamson 2017-07-04 13:30:09 EDT
dgilbert references bug #1464832 , though that involves the 'full' check option which we don't think Boxes uses by default.
Comment 7 Adam Williamson 2017-07-04 14:47:01 EDT
I reproduced this (both Boxes and virt-manager variants) on my test box:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 19
model name	: AMD A4-7300 APU with Radeon HD Graphics
stepping	: 1
microcode	: 0x6001119
cpu MHz		: 2200.000
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 16
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs		: fxsave_leak sysret_ss_attrs null_seg
bogomips	: 7586.23
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

With virt-manager it's quite easy to work around; you can just change the CPU model, as Kamil said. However, Boxes does not make this choice available, so there's no obvious workaround with Boxes.
Comment 8 Matthew Miller 2017-07-04 14:51:06 EDT
Presumably you can change the XML by hand. That's not exactly _obvious_, though.
Comment 9 Adam Williamson 2017-07-04 14:53:20 EDT
Well, you could use virsh. But that's quite deep-down stuff to be exposing to people who just wanted to use Boxes.
Comment 10 Eduardo Habkost 2017-07-04 14:58:06 EDT
(In reply to Matthew Miller from comment #4)
> Something weird is going on, because as you can see, svm *is* listed for
> your host CPU.

SVM might be available in the host, but that doesn't mean you can enable SVM in a virtual machine.  It would require nested SVM support, which might be unavailable for some reason.  If nested SVM is enabled, you should see "kvm: Nested Virtualization enabled" in dmesg.

In either case, if libvirt detects a CPU model as not usable, it shouldn't appear as the default in virt-manager (or be chosen as the default by Boxes).  Also, nested SVM is disabled by default in QEMU, so I don't know why libvirt is looking for it.
Comment 11 Matthew Miller 2017-07-04 15:02:56 EDT
So, I'm -1 to this as a blocker in weighing its impact vs. the impact of further slip. We're already at mid-July for what is nominally our early-May release. If this had happened to be discovered a few weeks ago with time for a fix without further delay, or if the problem were something even bigger, that math would come out differently, but here we are.

Let's document the workaround and attempt to get an update out as soon as possible.
Comment 12 Adam Williamson 2017-07-04 15:22:46 EDT
If I'm reading the virt-manager code properly - it's kinda complicated - I *think* the default cpu model is 'host-model-only'.

Indeed, I can reproduce the bug by simply running a virt-install command with '--cpu host-model-only'; it produces just the same error about svm.
Comment 13 Adam Williamson 2017-07-04 15:25:20 EDT
On the other hand, using '--cpu host-model' does *not* result in the same problem.

Also, it looks to me like Boxes is using a kind of home-grown imitation of host-model-only, because it has this snippet in vm-configurator.vala:

        var cpu = new DomainCpu ();
        // Ideally we should be using 'host-model' but there is currently issues with that:
        // https://bugzilla.redhat.com/show_bug.cgi?id=870071
        cpu.set_mode (DomainCpuMode.CUSTOM);
        cpu.set_topology (topology);

        var model_caps = cpu_caps.get_model ();
        var model = new DomainCpuModel ();
        model.set_name (model_caps.get_name ());
        cpu.set_model (model);

        domain.set_cpu (cpu);
Comment 14 Adam Williamson 2017-07-04 15:51:15 EDT
Note, "-cpu -host-model-only" is implemented in virt-install (not libvirt), where it does this:

        elif val == self.SPECIAL_MODE_HOST_MODEL_ONLY:
            if self.conn.caps.host.cpu.model:
                self.clear()
                self.model = self.conn.caps.host.cpu.model
        else:
            raise RuntimeError("programming error: unknown "
                "special cpu mode '%s'" % val)

so, looks very similar to what Boxes does.
Comment 15 Adam Williamson 2017-07-04 16:14:55 EDT
I just tested and *cannot* reproduce this on Fedora 25: the bug really is new in Fedora 26. Not sure what the difference is, yet. Investigating.
Comment 16 Matthew Miller 2017-07-04 18:07:20 EDT
FWIW this seems like a likely dupe of bug #1465747. That is apparently fixed in Future Libvirt by a "series of patches" including (but not only?) this:


commit 5b4a6adb5ca24a6cb91cdc55c31506fb278d3a91
Refs: v3.2.0-197-g5b4a6adb5
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Tue Apr 11 20:46:05 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Wed Apr 19 16:36:38 2017 +0200

    qemu: Use more data for comparing CPUs

    With QEMU older than 2.9.0 libvirt uses CPUID instruction to determine
    what CPU features are supported on the host. This was later used when
    checking compatibility of guest CPUs. Since QEMU 2.9.0 we ask QEMU for
    the host CPU data. But the two methods we use usually provide disjoint
    sets of CPU features because QEMU/KVM does not support all features
    provided by the host CPU and on the other hand it can enable some
    feature even if the host CPU does not support them.

    So if there is a domain which requires a CPU features disabled by
    QEMU/KVM, libvirt will refuse to start it with QEMU > 2.9.0 as its guest
    CPU is incompatible with the host CPU data we got from QEMU. But such
    domain would happily start on older QEMU (of course, the features would
    be missing the guest CPU). To fix this regression, we need to combine
    both CPU feature sets when checking guest CPU compatibility.

-------

That patch doesn't apply cleanly to libvirt 3.2.1 in current Fedora (Adam tried it). Investigation is ongoing.

In any case, it appears the difference is "newer QEMU" -- F26 has 2.9.0, where F25 shipped with 2.7.0 (and has 2.7.1 as an update).
Comment 17 Matthew Miller 2017-07-04 18:11:15 EDT
I'm still of the opinion that we should not block the release and provide a zero-day update, but I'm very sympathetic to the argument that it's pretty terrible to  break currently-working -- and made by default -- configuration for every AMD system using virtualization.
Comment 18 Adam Williamson 2017-07-04 18:12:15 EDT
Yep, that's all in line with my assessment.

I believe the 'series of patches' starts with bffc3b9fe501ff122ad81ddf42ecdb69f70ff70a and ends with 5b4a6adb5ca24a6cb91cdc55c31506fb278d3a91 , but it's not a straightforward backport at all. I'm working on it, but I may get it wrong, since I'm not super familiar with all this code.

If we accept this as a blocker, other *possible* 'fixes' are "drop Boxes from the live media" (although that still leaves the "must be able to host virt guests" criterion...) and "revert qemu to 2.7 until we can fix this".
Comment 19 Adam Williamson 2017-07-04 18:43:03 EDT
Unfortunately I can't seem to get a clean backport; I managed to get it to *build*, but some of the tests fail in a way that suggests I messed something up.

I'm personally a bit closer to seeing this as a blocker than Matt; I just don't like the thought that forever, anyone who installs F26 Workstation on an affected system (which so far still seems to be 'lots of AMD boxes') and happens to try and use Boxes before doing a system update will get a cryptic error. It'd be good to get some other votes on this.
Comment 20 Eduardo Habkost 2017-07-04 19:29:44 EDT
(In reply to Adam Williamson from comment #18)
> If we accept this as a blocker, other *possible* 'fixes' are "drop Boxes
> from the live media" (although that still leaves the "must be able to host
> virt guests" criterion...) and "revert qemu to 2.7 until we can fix this".

A possible one-line workaround is to disable the query-cpu-model-* QMP commands in QEMU so libvirt will fallback to the older methods of querying for host CPU capabilities.  I will make some tests to try to find out what's wrong.
Comment 21 Adam Williamson 2017-07-04 19:32:41 EDT
Thanks Eduardo! A few more notes:

* The earliest libvirt release containing the commits that should fix this is 3.3.0. That has never been built for F26, but Rawhide went to 3.3.0 on 2017-05-08 and 3.4.0 on 2017-06-05. I tested and confirmed that installing libvirt-3.3.0-1.fc27 on an affected F26 install *does* solve the problem, but of course we have no idea what other effects updating libvirt to a whole new version at this late stage might have.

* qemu 2.9.0 landed in F26 stable around end of April / start of May (Bodhi doesn't seem to give an exact date): https://bodhi.fedoraproject.org/updates/qemu-2.9.0-1.fc26

* F26 had qemu 2.8.0 on 2016-12-22. The current 2.8 series release is 2.8.1.1.

So our choices to fix this are:

* Manage a successful backport of the libvirt fix to 3.2
* Try what Eduardo suggested in #c20
* Update libvirt to 3.3.0 or 3.4.0
* Downgrade qemu to a 2.8 release (probably 2.8.1.1), with option to update back to 2.9 if we fix this by backporting or updating libvirt

As far as the blocker process goes, if we don't fix this, we can:

* Slip
* Reject as a blocker
* Remove Boxes from Workstation, respin, and reject as a blocker
Comment 22 Matthew Miller 2017-07-04 19:43:22 EDT
I guess removing Boxes means assuming we have an update available we could make Boxes require at least that version so if people install it, it works. (Just removing it to skirt the blocker doesn't seem right — it's not like removing something that we've decided we don't recommend.)
Comment 23 Eduardo Habkost 2017-07-04 21:14:00 EDT
My findings:
* "virt-install --cpu Opteron_G3" fails too.
* libvirt's cpu_map.xml has "svm" in the AMD CPU models, so it disagrees with QEMU about the need for the "svm" flag.
* 'query-cpu-model-expansion type=full model={"name":"max"}' will return "svm" as unavailable in QEMU (meaning SVM nesting is not just disabled by default, but reported as unsupported)

The last two items would be minor bugs alone, but the two together cause this problem.  I'm going to test a patch that disables the query-cpu-model-* QMP commands as an emergency measure, and report the results soon.
Comment 24 Eduardo Habkost 2017-07-04 22:49:37 EDT
The following QEMU patch can be used to disable the new CPU model probing QMP commands, and get back to the old libvirt code used for older QEMU versions:

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
Index: qemu-2.9.0/monitor.c
===================================================================
--- qemu-2.9.0.orig/monitor.c
+++ qemu-2.9.0/monitor.c
@@ -983,10 +983,12 @@ static void qmp_unregister_commands_hack
 #ifndef TARGET_ARM
     qmp_unregister_command(&qmp_commands, "query-gic-capabilities");
 #endif
-#if !defined(TARGET_S390X) && !defined(TARGET_I386)
+#if 1 || !defined(TARGET_S390X) && !defined(TARGET_I386)
+    /*XXX emergency workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1467599 */
     qmp_unregister_command(&qmp_commands, "query-cpu-model-expansion");
 #endif
-#if !defined(TARGET_S390X)
+#if 1 || !defined(TARGET_S390X)
+    /*XXX emergency workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1467599 */
     qmp_unregister_command(&qmp_commands, "query-cpu-model-baseline");
     qmp_unregister_command(&qmp_commands, "query-cpu-model-comparison");
 #endif
Comment 25 Adam Williamson 2017-07-05 02:30:04 EDT
Tested and confirmed that patch does the trick, thanks Eduardo. I am running a 2.9.0-1.fc26.1 build (2.9.0-2 exists on Rawhide, for other reasons, and we don't want this change on Rawhide, so we need to distinguish between the streams) with this change now:

https://koji.fedoraproject.org/koji/taskinfo?taskID=20336656

will create an update once the build is done, and we can possibly run a compose with it.
Comment 26 Daniel Berrange 2017-07-05 04:51:18 EDT
(In reply to Adam Williamson from comment #21)
> So our choices to fix this are:
> 
> * Manage a successful backport of the libvirt fix to 3.2

This should be feasible, as RHEL-7.4 also ships 3.2 libvirt and has these fixes backported.

> * Try what Eduardo suggested in #c20
> * Update libvirt to 3.3.0 or 3.4.0
> * Downgrade qemu to a 2.8 release (probably 2.8.1.1), with option to update
> back to 2.9 if we fix this by backporting or updating libvirt
Comment 27 Matthew Miller 2017-07-05 08:39:18 EDT
So, I think at this point, if we want to block on this *and* get the release out on time, the plan would be:

1. Ship with Adam's qemu-kvm 2.9.0-1.fc26.1 with Eduardo's patch
2. Update ship an update to libvirt with the backport to 3.2
3. Update qemu-kvm with the emergency patch removed, requiring qemu-kvm nevra >= the one in step 2.

... including calling for a new RC approximately right now, and then a bunch of overnight testing to validate the new RC.

Does that make sense?
Comment 28 Eduardo Habkost 2017-07-05 09:45:51 EDT
(In reply to Matthew Miller from comment #27)
> So, I think at this point, if we want to block on this *and* get the release
> out on time, the plan would be:
> 
> 1. Ship with Adam's qemu-kvm 2.9.0-1.fc26.1 with Eduardo's patch
> 2. Update ship an update to libvirt with the backport to 3.2
> 3. Update qemu-kvm with the emergency patch removed, [...]

Sounds good to me, but only if we can't backport the patches to libvirt like Dan suggested at comment 26 (which would be the best option IMO).

> [...] requiring qemu-kvm
> nevra >= the one in step 2.

Why is this part necessary?
Comment 29 Matthew Miller 2017-07-05 09:49:40 EDT
(In reply to Eduardo Habkost from comment #28)
> Sounds good to me, but only if we can't backport the patches to libvirt like
> Dan suggested at comment 26 (which would be the best option IMO).

The compose takes about 12 hours, so if we want to have *any* hope of validating it, we need an update in bodhi with the patches applied in an hour or so. If that's possible, awesome.

> > [...] requiring qemu-kvm
> > nevra >= the one in step 2.
> Why is this part necessary?

Because otherwise someone might just update qemu-kvm and not get the new libvirt, and get a broken system. They _shouldn't_, but people often do stuff like that. It's not any trouble to express the dependency, so we might as well.
Comment 30 Eduardo Habkost 2017-07-05 10:21:53 EDT
(In reply to Matthew Miller from comment #29)
> > > [...] requiring qemu-kvm
> > > nevra >= the one in step 2.
> > Why is this part necessary?
> 
> Because otherwise someone might just update qemu-kvm and not get the new
> libvirt, and get a broken system. They _shouldn't_, but people often do
> stuff like that. It's not any trouble to express the dependency, so we might
> as well.

I see.  Did you mean making qemu require libvirt >= the one in step 2?  qemu doesn't require libvirt today, so I'm not sure how to address that.
Comment 31 Matthew Miller 2017-07-05 10:51:17 EDT
(In reply to Eduardo Habkost from comment #30)
> I see.  Did you mean making qemu require libvirt >= the one in step 2?  qemu
> doesn't require libvirt today, so I'm not sure how to address that.

Ah. We can use the new "rich" dependencies:

Requires: libvirt >= 3.2.1-4.fc26 if libvirt

(I think. Or 3.2.2 or whatever.)
Comment 32 Dennis Gilmore 2017-07-05 10:58:07 EDT
Matthew Miller the compose and update tools do not support rich dependencies, they are currently banned by FESCo
Comment 33 Fedora Update System 2017-07-05 11:05:59 EDT
qemu-2.9.0-1.fc26.1 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-db0ad12bde
Comment 34 Adam Williamson 2017-07-05 14:50:58 EDT
*** Bug 1465747 has been marked as a duplicate of this bug. ***
Comment 35 Adam Williamson 2017-07-05 16:42:49 EDT
Re-assigning to qemu, as we are at least likely to take the qemu workaround for this initially. I will open a new bug to cover fixing it more 'correctly' in libvirt, and removing the qemu workaround.
Comment 36 Adam Williamson 2017-07-05 17:00:38 EDT
I have filed https://bugzilla.redhat.com/show_bug.cgi?id=1468043 to track an expected subsequent update to fix the problem 'properly' in libvirt, and remove the qemu workaround.
Comment 37 Adam Williamson 2017-07-06 13:58:51 EDT
Verified the fix in RC-1.5.
Comment 38 Fedora Update System 2017-07-06 14:22:14 EDT
qemu-2.9.0-1.fc26.1 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-db0ad12bde
Comment 39 Adam Williamson 2017-07-06 16:58:50 EDT
Discussed at 2017-07-06 Fedora 26 Final Go/No-Go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-meeting-2/2017-07-06/f26_final_gono-go_meeting.2017-07-06-17.00.html . Accepted as a blocker as a conditional violation of the criteria cited in #c3, on affected systems (which so far seems to include all AMD systems tested).
Comment 40 Fedora Update System 2017-07-06 18:52:41 EDT
qemu-2.9.0-1.fc26.1 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
Comment 41 Fedora Update System 2017-07-13 19:37:25 EDT
qemu-2.9.0-3.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-97c48d9a81
Comment 42 Fedora Update System 2017-07-14 18:55:28 EDT
qemu-2.9.0-3.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-97c48d9a81
Comment 43 Fedora Update System 2017-07-25 12:54:09 EDT
qemu-2.9.0-3.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
Comment 44 Cole Robinson 2017-08-04 16:14:06 EDT
Reopening this, because I'm going to be dropping the qemu workaround patch in the next update now that the fix on the libvirt side is in stable repos. People who were hitting this issue before, please verify that the upcoming qemu doesn't rebreak things
Comment 45 Fedora Update System 2017-08-04 22:43:29 EDT
qemu-2.9.0-4.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-923c28037f
Comment 46 Fedora Update System 2017-08-07 02:24:07 EDT
qemu-2.9.0-4.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-923c28037f
Comment 47 Fedora Update System 2017-08-16 18:23:07 EDT
qemu-2.9.0-5.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-a314d15e62
Comment 48 Fedora Update System 2017-08-19 14:54:10 EDT
qemu-2.9.0-5.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-a314d15e62
Comment 49 Fedora Update System 2017-08-22 16:43:19 EDT
qemu-2.9.0-5.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.