1439078 – After migration,VM crash in dst host with "qemu-kvm: error: failed to set MSR 0x38f to 0x7000000ff"

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1439078 - After migration,VM crash in dst host with "qemu-kvm: error: failed to set MSR 0x38f to 0x7000000ff"

Summary: After migration,VM crash in dst host with "qemu-kvm: error: failed to set MSR...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Dr. David Alan Gilbert
QA Contact:	xianwang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-04-05 07:49 UTC by xianwang
Modified:	2022-08-16 12:48 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-04-20 11:23:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
bug1439078_gdb_info.txt (21.21 KB, text/plain) 2017-04-05 07:53 UTC, xianwang	no flags	Details
dsthost_dmesg (1.79 KB, text/plain) 2017-04-20 05:52 UTC, xianwang	no flags	Details
dsthost_x86info (14.00 KB, text/plain) 2017-04-20 05:53 UTC, xianwang	no flags	Details
gdb_dst_boot_guest (20.29 KB, text/plain) 2017-04-20 05:53 UTC, xianwang	no flags	Details
guest_dmesg (35.19 KB, text/plain) 2017-04-20 05:54 UTC, xianwang	no flags	Details
guest_x86info (11.15 KB, text/plain) 2017-04-20 05:55 UTC, xianwang	no flags	Details
srchost_dmesg (3.31 KB, text/plain) 2017-04-20 05:56 UTC, xianwang	no flags	Details
srchost_x86info (14.00 KB, text/plain) 2017-04-20 05:57 UTC, xianwang	no flags	Details
View All

Description xianwang 2017-04-05 07:49:23 UTC

Description of problem:
Boot a vm both in src host and dst host with -cpu host, after migration, qemu prompt error message and exit qemu processor automatically.

Version-Release number of selected component (if applicable):
3.10.0-635.el7.x86_64
qemu-kvm-rhev-2.9.0-0.el7.patchwork201703291116.x86_64
seabios-bin-1.10.2-1.el7.noarch

How reproducible:
3/3

Steps to Reproduce:
1.Boot a guest in src host with qemu cli as bellow:
/usr/libexec/qemu-kvm \
    -name 'vm1'  \
    -sandbox off  \
    -machine pc-i440fx-rhel7.4.0 \
    -nodefaults  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \
    -chardev socket,path=/tmp/virtio_port-vs-20170207-030401-FfusAC1v,nowait,id=idQdLRHP,server \
    -device virtserialport,id=idBu8FQH,name=vs,bus=virtio_serial_pci0.0,chardev=idQdLRHP \
    -object rng-random,filename=/dev/random,id=passthrough-rOXjKxaC \
    -device virtio-rng-pci,id=virtio-rng-pci-GVn8yzUA,rng=passthrough-rOXjKxaC,bus=pci.0,addr=05 \
    -device usb-ehci,id=usb1,bus=pci.0,addr=06 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=09 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/root/rhel74-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,bootindex=0 \
    -object iothread,id=iothread0 \
    -drive file=/root/r1.qcow2,format=qcow2,if=none,id=drive_plane,werror=stop,rerror=stop \
    -device virtio-blk-pci,iothread=iothread0,bus=pci.0,drive=drive_plane,id=plane \
    -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0  \
    -netdev tap,id=idjlQN53,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 4G \
    -smp 4 \
    -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/root/RHEL-7.3-20161019.0-Server-x86_64-dvd1.iso \
    -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -device usb-mouse,id=input1,bus=usb1.0,port=2 \
    -device usb-kbd,id=input2,bus=usb1.0,port=3 \
    -vnc :1 \
    -qmp tcp:0:8881,server,nowait \
    -vga std \
    -cpu host \
    -monitor stdio \
    -rtc base=localtime  \
    -boot order=cdn,once=c,menu=on,strict=off  \
    -enable-kvm  \
    -watchdog i6300esb \
    -watchdog-action reset \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0 
2.Boot a guest in dst host with same qemu cli as above appending "-incoming tcp:0:5801"
3.in src host, do migration
(qemu) migrate -d tcp:10.16.184.234:5801

Actual results:
in src host, migration status is completed, but in dst host, qemu prompt error message and exit qemu processor automatically
src host:
(qemu) info status 
VM status: paused (postmigrate)
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: completed
total time: 30088 milliseconds
downtime: 70 milliseconds
setup: 15 milliseconds
transferred ram: 992777 kbytes
throughput: 270.43 mbps
remaining ram: 0 kbytes
total ram: 4211528 kbytes
duplicate: 815784 pages
skipped: 0 pages
normal: 245921 pages
normal bytes: 983684 kbytes
dirty sync count: 4

in dst host:
qemu-kvm: error: failed to set MSR 0x38f to 0x7000000ff
qemu-kvm: /builddir/build/BUILD/qemu-2.9.0/target/i386/kvm.c:1833:kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

Expected results:
migration completed and vm works well

Additional info:

Comment 2 xianwang 2017-04-05 07:53:38 UTC

Created attachment 1268880 [details]
bug1439078_gdb_info.txt

Comment 3 xianwang 2017-04-05 08:00:59 UTC

1)I have add gdb infomation as attachment 
2)src host:
[root@dell-per630-01 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               1790.250
CPU max MHz:           3200.0000
CPU min MHz:           1200.0000
BogoMIPS:              4800.30
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc

dst host:
[root@dell-per630-02 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               1200.000
CPU max MHz:           3200.0000
CPU min MHz:           1200.0000
BogoMIPS:              4799.74
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc

3)if qemu cli both in src and dst drop "-cpu host", migration can succeed and vm can work well

Comment 4 Dr. David Alan Gilbert 2017-04-07 19:05:39 UTC

Hi,

  38F is 'IA32_PERF_GLOBAL_CTRL'

  Can I check a few things please:
   a) Can you confirm this is native on the host - you're not trying to run nested?
   b) Can you please attach the output of 'dmesg' from the boot of the host prior to starting the guest.


an E5-2630 v3 is a Haswell.  If I read the docs right it hasn't got 38F - but I need to check.

Comment 5 Dr. David Alan Gilbert 2017-04-18 14:59:40 UTC

Please also provide the output of:
  x86info -a

on both hosts and the guest

Comment 6 xianwang 2017-04-19 06:20:19 UTC

(In reply to Dr. David Alan Gilbert from comment #5)
> Please also provide the output of:
>   x86info -a
> 
> on both hosts and the guest

I have re-test this scenario on my local intel hosts, but this bug can't be reproduced, so, I have submitted jobs in beaker to reserve Haswell hosts, and I will update test information later.

Comment 7 Dr. David Alan Gilbert 2017-04-19 10:11:30 UTC

I tried reproducing it and I can't on the similar box I have:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
stepping	: 2
microcode	: 0x38
cpu MHz		: 2999.812
cache size	: 20480 KB
physical id	: 0
siblings	: 16
core id		: 0
cpu cores	: 8
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 15
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc
bogomips	: 4794.20
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

using /usr/libexec/qemu-kvm -M pc,accel=kvm -m 8G rhel-guest-image-7.4-106.x86_64.qcow2 -vnc :0 -monitor stdio -cpu host -smp 4

with either HEAD qemu or 2.9.0-rc based packages.

Comment 8 xianwang 2017-04-20 05:51:00 UTC

(In reply to Dr. David Alan Gilbert from comment #7)
> I tried reproducing it and I can't on the similar box I have:
> 
> processor	: 0
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 63
> model name	: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
> stepping	: 2
> microcode	: 0x38
> cpu MHz		: 2999.812
> cache size	: 20480 KB
> physical id	: 0
> siblings	: 16
> core id		: 0
> cpu cores	: 8
> apicid		: 0
> initial apicid	: 0
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 15
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln
> pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1
> avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc
> bogomips	: 4794.20
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 46 bits physical, 48 bits virtual
> power management:
> 
> using /usr/libexec/qemu-kvm -M pc,accel=kvm -m 8G
> rhel-guest-image-7.4-106.x86_64.qcow2 -vnc :0 -monitor stdio -cpu host -smp 4
> 
> with either HEAD qemu or 2.9.0-rc based packages.

Hi, Dave,
I have re test this issue, I can reproduce it, the qemu cli of booting guest is as below
/usr/libexec/qemu-kvm \
    -name 'vm1'  \
    -sandbox off  \
    -machine pc-i440fx-rhel7.4.0 \
    -nodefaults  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \
    -device usb-ehci,id=usb1,bus=pci.0,addr=06 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=09 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/root/rhel74-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,bootindex=0 \
    -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0  \
    -netdev tap,id=idjlQN53,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 4G \
    -smp 4 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -device usb-mouse,id=input1,bus=usb1.0,port=2 \
    -device usb-kbd,id=input2,bus=usb1.0,port=3 \
    -vnc :1 \
    -qmp tcp:0:8881,server,nowait \
    -vga std \
    -cpu host \
    -monitor stdio \
    -rtc base=localtime  \
    -boot order=cdn,once=c,menu=on,strict=off  \
    -enable-kvm  \
    -watchdog i6300esb \
    -watchdog-action reset \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0 
I upload the debug info as attachments.

Comment 9 xianwang 2017-04-20 05:52:27 UTC

Created attachment 1272855 [details]
dsthost_dmesg

Comment 10 xianwang 2017-04-20 05:53:07 UTC

Created attachment 1272856 [details]
dsthost_x86info

Comment 11 xianwang 2017-04-20 05:53:49 UTC

Created attachment 1272857 [details]
gdb_dst_boot_guest

Comment 12 xianwang 2017-04-20 05:54:35 UTC

Created attachment 1272858 [details]
guest_dmesg

Comment 13 xianwang 2017-04-20 05:55:39 UTC

Created attachment 1272859 [details]
guest_x86info

Comment 14 xianwang 2017-04-20 05:56:29 UTC

Created attachment 1272860 [details]
srchost_dmesg

Comment 15 xianwang 2017-04-20 05:57:11 UTC

Created attachment 1272861 [details]
srchost_x86info

Comment 16 Dr. David Alan Gilbert 2017-04-20 10:09:41 UTC

Interesting, I think this is a real bug.

dell-per630-01 has hyperthreading disabled - it shows 16 CPUs
dell-per630-02 has hyperthreading enabled - it shows 32 CPUs

One of the differences of the x86info is:
190c190
< eax in: 0x0000000a, eax = 07300803 ebx = 00000000 ecx = 00000000 edx = 00000603
---
> eax in: 0x0000000a, eax = 07300403 ebx = 00000000 ecx = 00000000 edx = 00000603

From Intel table 3-8 it says eax field 8-15:
Number of general-purpose performance monitoring counter per logical processor.

So the CPU with hyperthreading has half of the counters of the host without hyperthreading; but that makes sense, the counters have been split between the threads.

Each bit in MSR 38f is an 'enable' for one of those counters, the value we're trying to write (...ff) is trying to enable 8 counters, which our hyperthreaded destination doesn't have.

Comment 17 Dr. David Alan Gilbert 2017-04-20 11:23:11 UTC

Closing as not-a-bug because:
  a) The use of -cpu host requires identical cpus
  b) The source CPU in this system was configured without hyperthreading while the destination was configured with hyperthreading; this changes not only the number of CPUs but also some of the characteristics of the CPU (some of the number of counters)

Note that we'll hit similar problems if you use a none-host cpu but enable the PMU; but that's already a problem where migration is known not to succeed with perf counters.

Comment 18 xianwang 2017-04-24 02:37:34 UTC

(In reply to Dr. David Alan Gilbert from comment #17)
> Closing as not-a-bug because:
>   a) The use of -cpu host requires identical cpus
>   b) The source CPU in this system was configured without hyperthreading
> while the destination was configured with hyperthreading; this changes not
> only the number of CPUs but also some of the characteristics of the CPU
> (some of the number of counters)
> 
> Note that we'll hit similar problems if you use a none-host cpu but enable
> the PMU; but that's already a problem where migration is known not to
> succeed with perf counters.

1)Before doing migration, do we need to check the following two requirements?
a) The use of -cpu host requires identical cpus
b) The CPU configuration for hyperthreading of src host and dst host are same, ie, both src and dst host enable hyperthreading or disable it

2)If these two requirements are not matched, maybe there is something wrong for migration but not a bug, right?

3)could you help me to check if the following method to check these two parameter right?
a) The use of -cpu host requires identical cpus
#cat /proc/cpuinfo | grep processor
If this value of src host and dst host are same, it indicate they have dentical cpus, yes?
b)The CPU configuration for hyperthreading
#dmidecode -t processor | grep -E '(Core Count|Thread Count)'
If the "Thread Count" is double "Core Count", this indicate the hyperthreading is enabled, or, if the "Thread Count" is same as "Core Count", this indicate the hyperthreading is disabled, yes?

Comment 19 Dr. David Alan Gilbert 2017-04-24 08:18:20 UTC

(In reply to xianwang from comment #18)
> (In reply to Dr. David Alan Gilbert from comment #17)
> > Closing as not-a-bug because:
> >   a) The use of -cpu host requires identical cpus
> >   b) The source CPU in this system was configured without hyperthreading
> > while the destination was configured with hyperthreading; this changes not
> > only the number of CPUs but also some of the characteristics of the CPU
> > (some of the number of counters)
> > 
> > Note that we'll hit similar problems if you use a none-host cpu but enable
> > the PMU; but that's already a problem where migration is known not to
> > succeed with perf counters.
> 
> 1)Before doing migration, do we need to check the following two requirements?
> a) The use of -cpu host requires identical cpus

Correct.

> b) The CPU configuration for hyperthreading of src host and dst host are
> same, ie, both src and dst host enable hyperthreading or disable it

Yes, when using -cpu host

> 2)If these two requirements are not matched, maybe there is something wrong
> for migration but not a bug, right?

I don't understand this question.
 
> 3)could you help me to check if the following method to check these two
> parameter right?
> a) The use of -cpu host requires identical cpus
> #cat /proc/cpuinfo | grep processor
> If this value of src host and dst host are same, it indicate they have
> dentical cpus, yes?

It's probably best to use the 'model name' field

> b)The CPU configuration for hyperthreading
> #dmidecode -t processor | grep -E '(Core Count|Thread Count)'
> If the "Thread Count" is double "Core Count", this indicate the
> hyperthreading is enabled, or, if the "Thread Count" is same as "Core
> Count", this indicate the hyperthreading is disabled, yes?

Yes, I think that's right.

Comment 20 Chris Friesen 2017-11-21 22:50:01 UTC

Will calling virConnectCompareCPU() detect that the destination host is incompatible with the source host?  If not, perhaps it would be reasonable to call this a bug?

Comment 21 Jiri Denemark 2017-11-22 09:49:13 UTC

It depends. virConnectCompareCPU will just check the CPUs are compatible in respect to provided CPU features. So as long as both CPUs report the same features they will be reported as compatible and it won't be a bug. If they report different features, virConnectCompareCPU should report they are incompatible.

Comment 22 Daniel Berrangé 2017-11-22 10:29:53 UTC

NB, that virConnectCompareCPU only compares the features exposed by the physical CPUs. When running a guest limitations of KVM and/or QEMU may prevent some features being exposed to the guest. This filtering may vary between KVM/QEMU versions.  So even if virConnectCompareCPU says the hosts are identical, it has not verified whether KVM/QEMU expose the same features to the guests. If you have the same KVM/QEMU versions on each host this isn't a problem, but be aware of this edge case if you have differing versions of KVM/QEMU

Comment 23 Chris Friesen 2017-11-22 15:54:01 UTC

As I understand it, the issue in this case is that HT is available on both source and dest, but is only enabled on the dest, causing the number of performance monitoring counters per logical processor to be smaller on the destination than on the source.

If this is the case, then it is a host CPU mismatch and arguably virConnectCompareCPU() should catch it.

Alternately, qemu should make it clear in the docs that live migration with "-cpu host" not only requires the CPUs to be identical, but requires them to be configured identically by the BIOS/OS, specifically with respect to HT.

Comment 24 Dr. David Alan Gilbert 2017-11-22 20:06:43 UTC

I'm sympathetic that something should catch it somewhere;  I suspect there are other situations as well (e.g. if you explicitly enable performance counters on a non 'host' cpu and your source and destination are different generations with different numbers of counters).  We also thought it would be nice if we could pass the number of supported counters to qemu so that you could setup a minimum that would enable you to migrate between these types of hosts.

Comment 25 Barak 2022-08-16 12:27:00 UTC

> I'm sympathetic that something should catch it somewhere;  I suspect there are other situations as well (e.g. if you explicitly enable performance counters on a non 'host' cpu and your source and destination are different generations with different numbers of counters).  We also thought it would be nice if we could pass the number of supported counters to qemu so that you could setup a minimum that would enable you to migrate between these types of hosts.

Hey David,
I'm investigating something similar issue:
https://bugzilla.redhat.com/show_bug.cgi?id=2066222

I was wondering was there anything done to mitigate this problem since your last update here?

Thanks Barak.

Comment 26 Dr. David Alan Gilbert 2022-08-16 12:48:01 UTC

Hi Barak,
  No I don't think we have anything more to help there.
I'd generally advise against using the host cpu type, because it's so sensitive to *anything* that's different - especially if you're enabling performance counters.
Without the performance counters you'd probably get away with the HT difference.

Note You need to log in before you can comment on or make changes to this bug.