Bug 1199446
Summary: | HLE and RTM still listed in /usr/share/libvirt/cpu_map.xml for haswell and broadwell | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Paolo Bonzini <pbonzini> |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.1 | CC: | berrange, dyuan, ehabkost, honzhang, jdenemar, jmelvin, lhuang, mzhan, pbonzini, rbalakri, sbonazzo, stirabos, xwei |
Target Milestone: | rc | Keywords: | Upstream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-1.2.14-1.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-11-19 06:19:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1035038, 1218673 |
Description
Paolo Bonzini
2015-03-06 10:27:06 UTC
This cannot be fixed until we have the ability to do per-machine type CPU models. Removing features from an existing CPU model would create a change in guest ABI and break migration feature compatibility checking. So while it is unfortunate that CPU is detected as Sandybridge, at the end of the day, the model names are just shortcuts for a set of feature flags, so having the model name show up as Sandybridge instead of Haswell is less bad than changing the guest ABI In other words, we need bug 824989 to be resolved first (and for that we need QEMU support). And this is definitely not something we could backport to 7.1.z. I partially agree with Daniel: model names are just shortcuts for a set of feature flags but calling it 'sandybridge' could be a bit ambiguous on user eyes cause that CPU is still something more than a sandybridge ones but now is something less than the haswell as previously defined. So probably adding an intermediate label (something like haswell-post) could be a good idea to correctly identify it still avoiding any issues on migration between haswell and haswell-post. In reading the wikipedia article on http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions a thought occurs: "In August 2014 Intel announced a bug in the TSX implementation on current steppings of Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX feature on affected CPUs via a microcode update.[9][10]" note that this is errata is said to be disabling the feature flags in /current/ shipping CPUs. IOW there is a non-zero chance that Intel may later ship Broadwell CPUs which have the features enabled once again. This actually suggests to me that using machine type versioned CPU models for this problem would be the wrong solution. Instead we should probably consider treating these as two completely separate CPU models. If we did that I'd suggest including the errata date in the name to future proof ourselves, in the unlikely event that later errata disable more feature. So we could define a completely new model "SandyBridgeErrata201409" to indicate a Sandybridge model with the set of features implied by the 2014/09 errata. This approach would be something we could consider z-streaming. It would have to go in upstream libvirt first though, so we don't end up with RHEL only CPU models. Moreover, we'd ideally need this new model to be added to QEMU otherwise we'd end up translating it to the closes model, which may not work as expected. While libvirt treats models just as shortcuts for sets of features, they are actually more than that. QEMU sets various other things (model, stepping, ...) too and they are significant. In the meantime QEMU fixed this issue by removing the two features from Haswell and Broadwell models for pc-*-2.3 machine types. Couldn't libvirt simply use '-cpu Haswell,-hle,-rtm' when invoking QEMU, avoiding the need for new CPU models. Of course given that QEMU deleted the features from those machine types, we'd also need to now use '-cpu Haswell,+hle,+rtm' to ensure QEMU actually gives us the features we want in the case where people are running without the microcode update applied. Wish QEMU hadn't changed their models in that way as it isn't future proof at all :-( (In reply to Daniel Berrange from comment #6) > Couldn't libvirt simply use '-cpu Haswell,-hle,-rtm' when invoking QEMU, > avoiding the need for new CPU models. That's what would likely happen because Haswell would be the closest model. However, it doesn't have to be always like that. In case a new model is created which libvirt would find to be closer (it needs a shorter list of additional features to be specified on the command line), it would use it. This could happen even without a new model in case some additional features were enabled/disabled in domain XML. > Of course given that QEMU deleted the features from those machine types, we'd > also need to now use '-cpu Haswell,+hle,+rtm' to ensure QEMU actually gives > us the features we want in the case where people are running without the > microcode update applied. Yeah, this wouldn't work at all. Because we'd think Haswell still has the features so we'd use just plain Haswell. (In reply to Jiri Denemark from comment #7) > (In reply to Daniel Berrange from comment #6) > > Couldn't libvirt simply use '-cpu Haswell,-hle,-rtm' when invoking QEMU, > > avoiding the need for new CPU models. > > That's what would likely happen because Haswell would be the closest model. > However, it doesn't have to be always like that. In case a new model is > created which libvirt would find to be closer (it needs a shorter list of > additional features to be specified on the command line), it would use it. > This could happen even without a new model in case some additional features > were enabled/disabled in domain XML. > > > Of course given that QEMU deleted the features from those machine types, we'd > > also need to now use '-cpu Haswell,+hle,+rtm' to ensure QEMU actually gives > > us the features we want in the case where people are running without the > > microcode update applied. > > Yeah, this wouldn't work at all. Because we'd think Haswell still has the > features so we'd use just plain Haswell. Right, our logic is based on the assumption that QEMUs list of features is the same as libvirt's list of features. It seems we need to explicitly stop making that assumption and have separate XML cpu maps, one reflecting libvirt's canonical view and the other reflecting QEMU's view. Or in the short term, perhaps just special case the code for Haswell to add these flags to the CLI when we see they are needed to work around EQMU? Sure, the ultimate solution is to be able to probe QEMU for features assigned to individual models (i.e., bug 824989). We could add various hacks and handle the two CPU models in a special way, but ideally both QEMU and libvirt would introduce new models. What do you think Eduardo/Paolo? It should still be possible before 2.3 hard freezes, shouldn't it? In theory Daniel is right. In practice, here there is not much that you can do if the processor pulls the features from under your feet. If the features were removed from cpu_map.xml, a libvirt update would remove HLE/RTM features from VMs that are started, just like upgrading the microcode would do. The difference is that the libvirt update would not break migration, while upgrading the microcode does. So perhaps we were even too cautious when QEMU limited the removal of HLE/RTM to new machine types. Intel has no intention of restoring HLE and RTM in Haswell and Broadwell (probably they cannot even do that!), so the outcome is just that "-cpu Haswell,enforce -M some-old-machine-type" will never work. > restoring HLE and RTM in Haswell and Broadwell
Correction: in the current steppings of HSW and BDW.
(In reply to Paolo Bonzini from comment #10) > In theory Daniel is right. In practice, here there is not much that you can > do if the processor pulls the features from under your feet. > > If the features were removed from cpu_map.xml, a libvirt update would remove > HLE/RTM features from VMs that are started, just like upgrading the > microcode would do. The difference is that the libvirt update would not > break migration, while upgrading the microcode does. > > So perhaps we were even too cautious when QEMU limited the removal of > HLE/RTM to new machine types. Intel has no intention of restoring HLE and > RTM in Haswell and Broadwell (probably they cannot even do that!), so the > outcome is just that "-cpu Haswell,enforce -M some-old-machine-type" will > never work. While they cannot restore it in currently shipping silicon,it is entirely possible they could do a new stepping of the silicon so future manufactured Broadwell chips may support it once more. There is also the issue that not everyone applies the microcode updates, so even with currently silicon we must not assume the features are disabled. So if we change current CPU models we're just moving the breakage from one group of people to another group of people. So I think we must treat these as separately named CPU models > even with currently silicon we must not assume the features are disabled
> [...] So if we change current CPU models we're just moving the breakage from
> one group of people to another group of people.
What breakage are we introducing? That is, what breaks if you forcibly disable the features for everyone and for all machine types?
(In reply to Paolo Bonzini from comment #13) > What breakage are we introducing? That is, what breaks if you forcibly > disable the features for everyone and for all machine types? If there are two hosts, one with the features removed & one without, and we do a compatibility check, we'd see them as compatible even though they're different (In reply to Paolo Bonzini from comment #13) > > even with currently silicon we must not assume the features are disabled > > [...] So if we change current CPU models we're just moving the breakage from > > one group of people to another group of people. > > What breakage are we introducing? That is, what breaks if you forcibly > disable the features for everyone and for all machine types? There are also people on 6.6 with the older microcode where the hle and rtm features was still active and maybe they are not going to update soon. If they start an haswell VM on that host than you can have issue migrating it to a newer 7.1 host if you identify the model as Haswell as well on feature reduced CPUs. So identifying the feature reduced model as SandyBridge is not correct but at least it prevents migration issues. Having a new label for the Haswell reduced CPU could be a solution as well. Dealing with the single specific CPU capabilities it's a mess cause on higher level product such as RHEV-M you need a tag to identify a group of hosts with common characteristics otherwise it will be really too fine-grained. (In reply to Daniel Berrange from comment #12) > (In reply to Paolo Bonzini from comment #10) > > In theory Daniel is right. In practice, here there is not much that you can > > do if the processor pulls the features from under your feet. > > > > If the features were removed from cpu_map.xml, a libvirt update would remove > > HLE/RTM features from VMs that are started, just like upgrading the > > microcode would do. The difference is that the libvirt update would not > > break migration, while upgrading the microcode does. > > > > So perhaps we were even too cautious when QEMU limited the removal of > > HLE/RTM to new machine types. Intel has no intention of restoring HLE and > > RTM in Haswell and Broadwell (probably they cannot even do that!), so the > > outcome is just that "-cpu Haswell,enforce -M some-old-machine-type" will > > never work. This is on purpose. If somebody is asking for -cpu Haswell,enforce -M old_machine_type, we need to tell the user that they can't run that configuration anymore (instead of changing ABI under the guest's feet). > > While they cannot restore it in currently shipping silicon,it is entirely > possible they could do a new stepping of the silicon so future manufactured > Broadwell chips may support it once more. There is also the issue that not > everyone applies the microcode updates, so even with currently silicon we > must not assume the features are disabled. So if we change current CPU > models we're just moving the breakage from one group of people to another > group of people. So I think we must treat these as separately named CPU > models Having different CPU model names was exactly the first proposal I made years ago, for dealing with CPU model changes. It was rejected in favor of simply making machine-type code change CPU behavior. That's exactly what QEMU does for other devices that need to implement compatibility behavior. We don't have "virtio-blk-2.1" and "virtio-blk-v2.0", we just have "virtio-blk" with different defaults on pc-2.1 and pc-2.0. I agree that separate CPU model names be simpler for software higher in the stack, but it would make things more complex for anybody having to choose between lots of versions of CPU models. The main point of CPU model changes is to have reasonable defaults on the latest machine-type. Then instead of having a single CPU model namespace full of legacy models that are there just for compatibility, we just treat the machine-types as separate CPU model namespaces. The difference is that libvirt ignores the low-level details on most devices that implement compatibility mode in older machine-types, but it still likes to be aware of the low-level details of CPU models so it can let the user enable/disable indidividual CPU features. That can be solved by a proper CPU model probing interface that would make cpu_map.xml obsolete. Also note that libvirt can already solve this without a probing interface by having a machine-type-aware cpu_map.xml. libvirt already duplicates information from QEMU in cpu_map.xml (because of the lack of a CPU model probing interface), it could just start doing it in a more accurate way instead of assuming that CPU models never change (which was never true). > If somebody is asking for -cpu Haswell,enforce -M old_machine_type, we need to
> tell the user that they can't run that configuration anymore (instead of
> changing ABI under the guest's feet).
Understood, but I wonder if we're being more royalist than the king. Intel decided that the microcode can change under a running (bare metal) machine's feet, which had some pretty bad effects...
(In reply to Paolo Bonzini from comment #17) > > If somebody is asking for -cpu Haswell,enforce -M old_machine_type, we need to > > tell the user that they can't run that configuration anymore (instead of > > changing ABI under the guest's feet). > > Understood, but I wonder if we're being more royalist than the king. Intel > decided that the microcode can change under a running (bare metal) machine's > feet, which had some pretty bad effects... Well, if we decide that this is acceptable in this specific case, then we are lucky: libvirt doesn't use the "enforce" flag yet, so that (changing CPUID under guest's feet) is already exactly what will happen. We would just need to change libvirt to not complain if HLE/RTM are unavailable (they can do that in a more explicit way by changing cpu_map.xml, or they could add special cases to the existing code that checks the host CPUID data against cpu_map.xml). (In reply to Eduardo Habkost from comment #16) > > > > While they cannot restore it in currently shipping silicon,it is entirely > > possible they could do a new stepping of the silicon so future manufactured > > Broadwell chips may support it once more. There is also the issue that not > > everyone applies the microcode updates, so even with currently silicon we > > must not assume the features are disabled. So if we change current CPU > > models we're just moving the breakage from one group of people to another > > group of people. So I think we must treat these as separately named CPU > > models > > Having different CPU model names was exactly the first proposal I made years > ago, for dealing with CPU model changes. It was rejected in favor of simply > making machine-type code change CPU behavior. That's exactly what QEMU does > for other devices that need to implement compatibility behavior. We don't > have "virtio-blk-2.1" and "virtio-blk-v2.0", we just have "virtio-blk" with > different defaults on pc-2.1 and pc-2.0. I think this is a different scenario to other cases. The tieing of device defaults to machine types is primarily a mechanism by which we fix bugs in a safe manner, or introduce new features without breaking compat. Tihs is all about guest ABI <-> QEMU emulator feature support capabilities What we have here is an issue where the physical hardware the guest is running on has changed. Like it or not, we are forever more going to have two different hardware CPU models, old CPU without microcode, new CPU with fixed, or old CPU with microcode. I don't think it is correct to associate changes in the underlying physical hardware with the guest machine type, as that is now tieing the use of that machine type to a specific hardware platform configuration, which is fundamentally wrong IMHO. That is why I think we must treat these as two separate named CPU models. I think we would have to do in this kind of scenario, regardless of whether we have the ability to version CPUs per machine type or not. In fact I tend to think that the idea of associating CPU models with machine types at all is a fundamentally broken idea. If you have a guest running with a particular CPU model and particular machine type, you should be able to change the guest's machine type at will, in order to get access to QEMU guest hardware bug fixes or improvements. By allowing CPU model changes per machine type, you prevent this, because changing the machine type may in turn mean the guest can no longer boot on that host. I think that's a very bad thing, because it puts a dependancy from machine types to host hardware To provide another view on this issue I talked to oVirt guys and they are actually happy that libvirt did not remove the features from existing models because it can prevent domains from being migrated to a host with the updated CPU. They are afraid changing guest CPU during migration could make some applications unhappy. And since nobody can presume what weired application users run in their guests, it's impossible to say this is not an issue. However, they'd like the host CPUs to be treated as more than just SandyBridge so the ideal solution for them are the new CPU models. Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2015-March/msg01188.html In addition to the upstream patch, we will need a RHEL-only hack to handle Haswell/Broadwell per machine type changes backported to RHEL The new *-noTSX models were added upstream: commit c563b50605ae9895b981d198e11dbe9f6e18027b Author: Jiri Denemark <jdenemar> Date: Mon Mar 23 17:19:28 2015 +0100 cpu: Add {Haswell,Broadwell}-noTSX CPU models QEMU 2.3 adds these new models to cover Haswell and Broadwell CPUs with updated microcode. Luckily, they also reverted former the machine type specific changes to existing models. And since these changes were never released, we don't need to hack around them in libvirt. Signed-off-by: Jiri Denemark <jdenemar> commit 53c8062f7eb7028043a314f2f18bf34ff0e82f0d Author: Jiri Denemark <jdenemar> Date: Tue Mar 24 14:12:07 2015 +0100 qemu: Give hint about -noTSX CPU model Because of the microcode update to Haswell/Broadwell CPUs, existing domains using these CPUs may fail to start even though they used to run just fine. To help users solve this issue we try to suggest switching to -noTSX variant of the CPU model: virsh # start cd error: Failed to start domain cd error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: rtm, hle; try using 'Haswell-noTSX' CPU model Signed-off-by: Jiri Denemark <jdenemar> I can reproduce this bug in libvirt-1.2.13-1.el7.x86_64: 1. prepare a Haswell (removed hle,rtm) machine 2. check cpu_map.xml: ... <model name='Haswell'> <model name='SandyBridge'/> <feature name='fma'/> <feature name='pcid'/> <feature name='movbe'/> <feature name='fsgsbase'/> <feature name='bmi1'/> <feature name='hle'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/> <feature name='erms'/> <feature name='invpcid'/> <feature name='rtm'/> </model> <model name='Broadwell'> <model name='Haswell'/> <feature name='3dnowprefetch'/> <feature name='rdseed'/> <feature name='adx'/> <feature name='smap'/> </model> ... 3. check capabilities: virsh # capabilities <capabilities> <host> <uuid>001320fb-6b42-0013-20fb-6b42001320fb</uuid> <cpu> <arch>x86_64</arch> <model>SandyBridge</model> <vendor>Intel</vendor> 4. try to start a guest with Haswell cpu model: # virsh dumpxml test4 <cpu mode='custom' match='exact'> <model fallback='allow'>Haswell</model> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> </numa> </cpu> # virsh start test4 error: Failed to start domain test4 error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: rtm, hle And verify this bug with libvirt-1.2.17-6.el7.x86_64: 1. check the cpu_map.xml: <model name='Haswell-noTSX'> <model name='SandyBridge'/> <feature name='fma'/> <feature name='pcid'/> <feature name='movbe'/> <feature name='fsgsbase'/> <feature name='bmi1'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/> <feature name='erms'/> <feature name='invpcid'/> </model> <model name='Haswell'> <model name='Haswell-noTSX'/> <feature name='hle'/> <feature name='rtm'/> </model> <model name='Broadwell-noTSX'> <model name='Haswell-noTSX'/> <feature name='3dnowprefetch'/> <feature name='rdseed'/> <feature name='adx'/> <feature name='smap'/> </model> <model name='Broadwell'> <model name='Broadwell-noTSX'/> <feature name='hle'/> <feature name='rtm'/> </model> 2. check capabilities (libvirt show the model is Haswell-noTSX) virsh # capabilities <capabilities> <host> <uuid>001320fb-6b42-0013-20fb-6b42001320fb</uuid> <cpu> <arch>x86_64</arch> <model>Haswell-noTSX</model> <vendor>Intel</vendor> <topology sockets='1' cores='4' threads='1'/> ... 3. try to start a guest with Haswell and will get clearly error: # virsh dumpxml test4 <cpu mode='custom' match='exact'> <model fallback='allow'>Haswell</model> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> </numa> </cpu> # virsh start test4 error: Failed to start domain test4 error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: rtm, hle; try using 'Haswell-noTSX' CPU model 4. change cpu mode to host-model and let libvirt detect cpu model: # virsh dumpxml test4 --inactive ... <cpu mode='host-model'> <model fallback='allow'/> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> </numa> </cpu> ... # virsh dumpxml test4 --update-cpu <cpu mode='host-model' match='exact'> <model fallback='allow'>Haswell-noTSX</model> <vendor>Intel</vendor> <feature policy='require' name='abm'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='rdrand'/> <feature policy='require' name='f16c'/> <feature policy='require' name='osxsave'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='tm2'/> <feature policy='require' name='est'/> <feature policy='require' name='smx'/> <feature policy='require' name='vmx'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='monitor'/> <feature policy='require' name='dtes64'/> <feature policy='require' name='pbe'/> <feature policy='require' name='tm'/> <feature policy='require' name='ht'/> <feature policy='require' name='ss'/> <feature policy='require' name='acpi'/> <feature policy='require' name='ds'/> <feature policy='require' name='vme'/> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> </numa> </cpu> # ps aux|grep test4 qemu 31142 6.7 0.4 1379912 35988 ? Sl 03:43 0:09 /usr/libexec/qemu-kvm -name test4 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme... 5. change cpu mode to custom and retest: # virsh dumpxml test4 ... <cpu mode='custom' match='exact'> <model fallback='forbid'>Haswell-noTSX</model> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> </numa> </cpu> ... # ps aux|grep test4 qemu 31533 59.4 0.4 1379912 34920 ? Sl 05:18 0:02 /usr/libexec/qemu-kvm -name test4 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu Haswell-noTSX ... 6. Also test with Broadwell which remove hle and rtm features, it works as expect. Verify this bug and steps in comment 27. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2202.html |