Description of problem: Enable intel-pt on host, boot guest with cpu mode=host-model, flag intel-pt is not added into qemu cli. Version-Release number of selected component (if applicable): qemu-kvm-4.2.0-28.module+el8.2.1+7211+16dfe810 libvirt-client-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64 kernel-4.18.0-193.11.1.el8_2.x86_64 How reproducible: always Steps to Reproduce: 1. Set pt_mode = 1 on Icelake host first, # rmmod kvm_intel # modprobe kvm_intel pt_mode=1 2. Boot guest with libvirt cpu mode=host-model <cpu mode='host-model'/> 3. Check qemu cli Actual results: Flag intel-pt is not in qemu cli: -machine pc-q35-rhel8.2.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu Icelake-Server,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,mpx=off,hv-time,hv-vapic,hv-spinlocks=0x1000 Expected results: Option "intel-pt=on" should be added into qemu cli. Additional info: Reproduce with both pc and q35 machine type.
Add Regression keyword since the test passed before in https://bugzilla.redhat.com/show_bug.cgi?id=1526581#c7.
We need more information on what information QEMU is returning through QMP, to know where exactly is the problem. I suggest the following: 1) adding the following to the /etc/libvirt/libvirtd.conf: log_filters="1:libvirt 1:util 1:qemu" log_outputs="1:file:/var/log/libvirt/libvirtd.log" 2) delete /var/cache/libvirt/qemu/capabilities/* 3) restart libvirtd 4) reproduce the bug 5) Run: # grep QEMU_MONITOR /var/log/libvirt/libvirtd.log | grep -3 query-cpu-model-expansion Please also attach /var/log/libvirt/libvirtd.log to the BZ, just in case there's other information useful for debugging the problem.
There's a missing step in the instructions below: (In reply to Eduardo Habkost from comment #2) [...] > 2) delete /var/cache/libvirt/qemu/capabilities/* 2.1) reload kvm_intel with pt_mode=1 # rmmod kvm_intel # modprobe kvm_intel pt_mode=1 > > 3) restart libvirtd [...] Maybe we are dealing with two separate issues here: intel_pt not being enabled in some circumstances, and also lack of invalidation of QEMU capabilities cache when the kernel command line or kvm module arguments change.
Created attachment 1700100 [details] libvirtd.log
Created attachment 1700101 [details] output of step 5)
(In reply to Eduardo Habkost from comment #2) > We need more information on what information QEMU is returning through QMP, > to know where exactly is the problem. > > I suggest the following: Thanks for the instructions. > > 1) adding the following to the /etc/libvirt/libvirtd.conf: > > log_filters="1:libvirt 1:util 1:qemu" > log_outputs="1:file:/var/log/libvirt/libvirtd.log" > > 2) delete /var/cache/libvirt/qemu/capabilities/* > > 3) restart libvirtd > > 4) reproduce the bug > > 5) Run: > > # grep QEMU_MONITOR /var/log/libvirt/libvirtd.log | grep -3 > query-cpu-model-expansion > The output contains many data, so I attached it in comment 5. > > Please also attach /var/log/libvirt/libvirtd.log to the BZ, just in case > there's other information useful for debugging the problem. Sure, attached in comment 4. Thanks.
From attachment 1700101 [details]: 2020-07-07 03:38:38.414+0000: 47276: info : qemuMonitorIOWrite:453 : QEMU_MONITOR_IO_WRITE: mon=0x7efc10021850 buf={"execute":"query-cpu-model-expansion","arguments":{"type":"static","model":{"name":"max","props":{"migratable":false}}},"id":"libvirt-4"} 2020-07-07 03:38:38.417+0000: 47276: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7efc10021850 reply={"return": {"model": {"name": "base", "props": {[...] "intel-pt": false, [...] }}}, "id": "libvirt-4"} It looks like QEMU is telling libvirt that intel-pt is not available.
(In reply to Eduardo Habkost from comment #7) > From attachment 1700101 [details]: > > 2020-07-07 03:38:38.414+0000: 47276: info : qemuMonitorIOWrite:453 : > QEMU_MONITOR_IO_WRITE: mon=0x7efc10021850 > buf={"execute":"query-cpu-model-expansion","arguments":{"type":"static", > "model":{"name":"max","props":{"migratable":false}}},"id":"libvirt-4"} > 2020-07-07 03:38:38.417+0000: 47276: info : qemuMonitorJSONIOProcessLine:240 > : QEMU_MONITOR_RECV_REPLY: mon=0x7efc10021850 reply={"return": {"model": > {"name": "base", "props": {[...] "intel-pt": false, [...] }}}, "id": > "libvirt-4"} > > It looks like QEMU is telling libvirt that intel-pt is not available. False alarm. That was the probing for TCG capabilities. A few lines above, we can see: 2020-07-07 03:38:38.347+0000: 47279: debug : qemuProcessQMPLaunch:8436 : Try to probe capabilities of '/usr/libexec/qemu-kvm' via QMP, machine none,accel=tcg If we look for probing of KVM capabilities, we can see that QEMU is correctly returning intel-pt=true: 2020-07-07 03:38:38.146+0000: 47279: debug : qemuProcessQMPLaunch:8436 : Try to probe capabilities of '/usr/libexec/qemu-kvm' via QMP, machine none,accel=kvm:tcg 2020-07-07 03:38:38.342+0000: 47276: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": {"model": {"name": "base", "props": {[...] "intel-pt": true, [...]}}}, "id": "libvirt-46"}]
It looks like intel-pt is included as part of a few CPU models in cpu_map.xml: src/cpu_map/x86_Icelake-Client-noTSX.xml: <feature name='intel-pt'/> src/cpu_map/x86_Icelake-Client.xml: <feature name='intel-pt'/> src/cpu_map/x86_Icelake-Server-noTSX.xml: <feature name='intel-pt'/> src/cpu_map/x86_Icelake-Server.xml: <feature name='intel-pt'/> However, intel-pt was never part of any builtin CPU model, according to: commit 4c257911dcc7c4189768e9651755c849ce9db4e8 Author: Paolo Bonzini <pbonzini> Date: Fri Dec 21 12:35:56 2018 +0100 i386: remove the 'INTEL_PT' CPUID bit from named CPU models Processor tracing is not yet implemented for KVM and it will be an opt in feature requiring a special module parameter. Disable it, because it is wrong to enable it by default and it is impossible that no one has ever used it. Cc: qemu-stable Signed-off-by: Paolo Bonzini <pbonzini> We probably need to remove the <feature name='intel-pt'/> entries from those CPU models in libvirt.
As already mentioned by Eduardo, there are two separate issues here: (1) changing kvm* module parameters does not invalidate QEMU capabilities cache in libvirt and thus libvirt does not know intel-pt becomes available (2) intel-pt is not enabled by host-model on an Icelake-Server host I created a separate bug 1879200 for the first issue. So let's focus on the second one here as it also matches the summary. Because the definition of Icelake-Server in libvirt contains intel-pt (copied from QEMU before intel-pt was removed from all models), we do not include the feature in host-model. We think QEMU will enable it automatically when asked for Icelake-Server. When intel-pt is explicitly requested: <cpu mode='host-model'> <feature name='intel-pt' policy='require'/> </cpu> or <cpu mode='custom' match='exact'> <model fallback='forbid'>Icelake-Server</model> <feature name='intel-pt' policy='require'/> </cpu> libvirt will ask QEMU for intel-pt even though it thinks the feature would be enabled implicitly. That said, intel-pt is still usable, it is just not enabled by default for host-model on an Icelake-Server host. To fix this, we would need to change the definition of Icelake-Server model and remove intel-pt from it. We did remove some features in the past, but only when there was no way a domain using such feature could ever be started. By removing a feature from a CPU model definition, we can end up with mismatching definitions during migration when migrating from new to old libvirt or the opposite direction. Unfortunately in this specific case changing Icelake-Server CPU model would result in broken migration in several cases and mostly when intel-pt was not actually enabled. Specifically, in the default case of pt_mode=0 migration from new libvirt (with updated Icelake-Server CPU model) to older libvirt would fail for both custom and host-model mode unless intel-pt was explicitly mentioned in domain XML. The older libvirt would think the original domain was started with intel-pt enabled while it would be disabled on the destination. Migration from new to old libvirt would also fail (for the same reason) when pt_mode=1 on the source using custom CPU mode and Icelake-Server model unless intel-pt was explicitly mentioned in domain XML. Migrating a domain with host-model CPU would work fine in this case (i.e., it would correctly either work or fail depending on pt_mode settings on the destination host and machine type). That said, I don't think we can fix this by modifying the Icelake-Server CPU model. Unless we can be sure there are no Icelake-Server CPUs in production. An ideal solution would be using CPU model definitions from QEMU when building host-model CPU definition to make sure the result matches exactly what QEMU could enable on the host. But AFAIK QEMU does not currently provide any interface we could use to probe all CPU model definitions with -machine none.
The fix is now pushed upstream as commit 4901314d0dd9904225b7567fc282793c039f2efa Refs: v7.0.0-133-g4901314d0d Author: Jiri Denemark <jdenemar> AuthorDate: Fri Jan 22 15:04:42 2021 +0100 Commit: Jiri Denemark <jdenemar> CommitDate: Tue Jan 26 15:44:50 2021 +0100 cpu_map: Remove intel-pt from x86 CPU models As explained in QEMU commit 4c257911dcc7c4189768e9651755c849ce9db4e8 intel-pt features should never be included in the CPU models as it was not supported by KVM back then and even once it started to be supported, users have to enable it by passing pt_mode=1 parameter to kvm_intel module. The Icelake-* CPU models with intel-pt included were added to QEMU 3.1.0 and removed right in the following 4.0.0 release (and even in 3.1.1 maintenance release). In libvirt 6.10.0 I introduced 'removed' attribute for features included in our CPU model definitions which we can use to drop intel-pt from Icelake-* CPU models. Back then I explained we can safely do so only for features which could never be enabled, which is not the case of intel-pt. Theoretically, it could be possible to create an environment in which QEMU would enable intel-pt without asking for it explicitly: it would need to use a new enough kernel (not available at the time of QEMU 3.1.0) and pt_mode KVM parameter in combination with QEMU 3.1.0 running a domain with q35 machine type and all that on a CPU which didn't really exist at that time. Migrating such domain to a host with newer SW stack including libvirt with this patch applied would result in incompatible guest ABI (the virtual CPU would lose intel-pt). However, QEMU changed its CPU models unconditionally and thus migration would not work even without this patch. That said, it is safe to follow QEMU and remove the feature from Icelake-* CPU models in our cpu_map. https://bugzilla.redhat.com/show_bug.cgi?id=1853972 Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Tim Wiederhake <twiederh>
Verified with upstream libvirt v7.0.0-157-g85be8e3d74 & qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64 S1. Start vm with pt_mode=1 enabled in Icelake server steps: 1. Set pt_mode = 1 on Icelake host first, # rmmod kvm_intel # modprobe kvm_intel pt_mode=1 2.delete /var/cache/libvirt/qemu/capabilities/* 3.restart libvirtd 4.Domain with below cpu configuration - <cpu mode='host-model' check='partial'> <model fallback='allow'>qemu64</model> <feature policy='disable' name='svm'/> </cpu> Checking from qemu cmd line, there is intel-pt=on - -machine pc-q35-rhel8.3.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram -cpu Icelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,intel-pt=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,mpx=off,svm=off
The <model fallback='allow'>qemu64</model> in a CPU definition with mode='host-model' is useless and can be dropped. Libvirt will actually ignore it so keeping it has no impact on the behavior, but it could confuse people.
Thanks Jiri. S1: Use mode='hot-model' to start domain 1. <cpu mode='host-model' check='partial'> </cpu> qemu-cmd : -cpu Icelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,intel-pt=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,mpx=off S2:Use mode='custom' to start domain 1. <cpu mode='custom' match='exact'> <model fallback='forbid'>Icelake-Server</model> <feature name='intel-pt' policy='require'/> <feature name='mpx' policy='disable'/> </cpu> qemu-cmd : -cpu Icelake-Server,intel-pt=on,mpx=off
verified with libvirt-daemon-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64 qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64 S1: Use mode='host-model' to start domain in Icelake server steps: 1.Set pt_mode = 1 on Icelake host # rmmod kvm_intel # modprobe kvm_intel pt_mode=1 2. start libvirtd #systemctl start libvirtd.service 3. Start domain <cpu mode='host-model' check='partial'> </cpu> Result: The intel-pt=on is in qemu-cmd line: -cpu Icelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,intel-pt=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,mpx=off
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098