Bug 1368907

Summary: guest on the target host crashed when do migration from intel i7 to intel xeon with pmu enabled
Product: Red Hat Enterprise Linux 7 Reporter: yafu <yafu>
Component: qemu-kvm-rhevAssignee: Hai Huang <hhuang>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: chayang, dyuan, fjin, juzhang, knoel, mzhan, qizhu, virt-maint, yafu, yanqzhan, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu 2.8.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-07 11:49:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1401400    
Attachments:
Description Flags
libvirtd.log and qemu.log on target host none

Description yafu 2016-08-22 05:04:28 UTC
Created attachment 1192769 [details]
libvirtd.log and qemu.log on target host

Description of problems:
guest on the target host crashed when do migration from intel i7 to intel xeon with pmu enabled.

Version:
libvirt-2.0.0-5.el7.x86_64
qemu-kvm-rhev-2.6.0-21.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Prepare two hosts:
a)source host(intel i7):
#lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 61
Model name:            Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
Stepping:              4
CPU MHz:               3149.859
BogoMIPS:              5187.75
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3

b)target host(intel xeon):
#lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 23
Model name:            Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
Stepping:              10
CPU MHz:               2992.229
BogoMIPS:              5984.45
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              6144K
NUMA node0 CPU(s):     0,1


2.Compute cpu baseline:
On both hosts, get host cpu capabilities by "virsh capabilities", copy capabilites->host->cpu part from the output of "virsh capabilities" into a file, and compute cpu baseline of two hosts:
# cat /tmp/cpubase
   <cpu>
      <arch>x86_64</arch>
      <model>Broadwell</model>
      <vendor>Intel</vendor>
      <topology sockets='1' cores='2' threads='2'/>
      <feature name='vme'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='osxsave'/>
      <feature name='f16c'/>
      <feature name='rdrand'/>
      <feature name='arat'/>
      <feature name='tsc_adjust'/>
      <feature name='xsaveopt'/>
      <feature name='pdpe1gb'/>
      <feature name='abm'/>
      <feature name='invtsc'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>
 <cpu>
      <arch>x86_64</arch>
      <model>Penryn</model>
      <vendor>Intel</vendor>
      <topology sockets='1' cores='2' threads='1'/>
      <feature name='vme'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='xsave'/>
      <feature name='osxsave'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
    </cpu>


# virsh cpu-baseline /tmp/cpubase
<cpu mode='custom' match='exact'>
  <model fallback='allow'>Penryn</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='vme'/>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='dtes64'/>
  <feature policy='require' name='monitor'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='smx'/>
  <feature policy='require' name='est'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='xsave'/>
  <feature policy='require' name='osxsave'/>
</cpu>

3.On source host,prepare a guest with pmu enabled:
   ...
    <features>
    <acpi/>
    <apic/>
    <pae/>
    <pmu state='on'/>
  </features>
   ...

4.Do migration:
# virsh migrate rhel7.3 qemu+ssh://10.66.4.111/system --live --verbose
Migration: [100 %]error: operation failed: domain is not running

5.Check the qemu log on the target host:
 ...
 red_dispatcher_loadvm_commands:
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1715: kvm_put_msrs: Assertion `ret == n' failed.
2016-08-22 11:04:30.009+0000: shutting down
 ...
 
Actual results:
guest on the target host crashed when do migration from intel i7 to intel xeon with pmu enabled.

Expected results:
Migration can complete correctly.

Additional info:
1.Do migration from intel xeon to intel i7 with pmu enabled can complete correctly.

2.Do migration from intel i7 to intel xeon can complete correctly if disable pmu.

3.Backtrace of qemu coredump:
Thread 11 (Thread 0x7f43644da700 (LWP 23488)):
#0  0x00007f436df7a96d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f436d410a98 in g_usleep (microseconds=microseconds@entry=10000)
    at gtimer.c:259
#2  0x00007f4374c5715c in call_rcu_thread (opaque=<optimized out>)
    at util/rcu.c:245
#3  0x00007f436df73dc5 in start_thread (arg=0x7f43644da700)
    at pthread_create.c:308
#4  0x00007f436c6ac1cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 9 (Thread 0x7f43626e8700 (LWP 23497)):
#0  0x00007f436c5eb5f7 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f436c5ecce8 in __GI_abort () at abort.c:90
#2  0x00007f436c5e4566 in __assert_fail_base (
    fmt=0x7f436c7341e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7f4374c78eb2 "ret == n", 
    file=file@entry=0x7f4374c78878 "/builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c", line=line@entry=1723, 
    function=function@entry=0x7f4374c791c8 <__PRETTY_FUNCTION__.30631> "kvm_put_msrs") at assert.c:92
---Type <return> to continue, or q <return> to quit---
#3  0x00007f436c5e4612 in __GI___assert_fail (
    assertion=assertion@entry=0x7f4374c78eb2 "ret == n", 
    file=file@entry=0x7f4374c78878 "/builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c", line=line@entry=1723, 
    function=function@entry=0x7f4374c791c8 <__PRETTY_FUNCTION__.30631> "kvm_put_msrs") at assert.c:101
#4  0x00007f4374a73e97 in kvm_put_msrs (cpu=cpu@entry=0x7f4377594000, 
    level=level@entry=3) at /usr/src/debug/qemu-2.6.0/target-i386/kvm.c:1723
#5  0x00007f4374a77cf9 in kvm_arch_put_registers (
    cpu=cpu@entry=0x7f4377594000, level=level@entry=3)
    at /usr/src/debug/qemu-2.6.0/target-i386/kvm.c:2613
#6  0x00007f43749c21fe in do_kvm_cpu_synchronize_post_init (arg=0x7f4377594000)
    at /usr/src/debug/qemu-2.6.0/kvm-all.c:1869
#7  0x00007f43749b1b12 in qemu_wait_io_event_common (cpu=<optimized out>)
    at /usr/src/debug/qemu-2.6.0/cpus.c:1001
#8  0x00007f43749b3bff in qemu_kvm_wait_io_event (cpu=<optimized out>)
    at /usr/src/debug/qemu-2.6.0/cpus.c:1046
#9  qemu_kvm_cpu_thread_fn (arg=<optimized out>)
    at /usr/src/debug/qemu-2.6.0/cpus.c:1081
#10 0x00007f436df73dc5 in start_thread (arg=0x7f43626e8700)
    at pthread_create.c:308
#11 0x00007f436c6ac1cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
---Type <return> to continue, or q <return> to quit---

Thread 8 (Thread 0x7f4361ee7700 (LWP 23499)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f4374c48869 in qemu_cond_wait (cond=<optimized out>, 
    mutex=mutex@entry=0x7f4375231700 <qemu_global_mutex>)
    at util/qemu-thread-posix.c:123
#2  0x00007f43749b3be3 in qemu_kvm_wait_io_event (cpu=<optimized out>)
    at /usr/src/debug/qemu-2.6.0/cpus.c:1042
#3  qemu_kvm_cpu_thread_fn (arg=<optimized out>)
    at /usr/src/debug/qemu-2.6.0/cpus.c:1081
#4  0x00007f436df73dc5 in start_thread (arg=0x7f4361ee7700)
    at pthread_create.c:308
#5  0x00007f436c6ac1cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 7 (Thread 0x7f42c29ff700 (LWP 23523)):
#0  0x00007f436c6a1b7d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f436f3dd1e7 in poll (__timeout=<optimized out>, __nfds=20, 
    __fds=0x7f43794d8038) at /usr/include/bits/poll2.h:46
#2  red_worker_main (arg=<optimized out>) at red_worker.c:12235
#3  0x00007f436df73dc5 in start_thread (arg=0x7f42c29ff700)
    at pthread_create.c:308
---Type <return> to continue, or q <return> to quit---
#4  0x00007f436c6ac1cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 6 (Thread 0x7f42bc9ff700 (LWP 23524)):
#0  0x00007f436c6a1b7d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f436f3dd1e7 in poll (__timeout=<optimized out>, __nfds=20, 
    __fds=0x7f4379818038) at /usr/include/bits/poll2.h:46
#2  red_worker_main (arg=<optimized out>) at red_worker.c:12235
#3  0x00007f436df73dc5 in start_thread (arg=0x7f42bc9ff700)
    at pthread_create.c:308
#4  0x00007f436c6ac1cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 5 (Thread 0x7f42b6bff700 (LWP 23527)):
#0  0x00007f436c6a1b7d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f436f3dd1e7 in poll (__timeout=<optimized out>, __nfds=20, 
    __fds=0x7f4379b1e038) at /usr/include/bits/poll2.h:46
#2  red_worker_main (arg=<optimized out>) at red_worker.c:12235
#3  0x00007f436df73dc5 in start_thread (arg=0x7f42b6bff700)
    at pthread_create.c:308
#4  0x00007f436c6ac1cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

---Type <return> to continue, or q <return> to quit---
Thread 4 (Thread 0x7f42b0dff700 (LWP 23534)):
#0  0x00007f436c6a1b7d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f436f3dd1e7 in poll (__timeout=<optimized out>, __nfds=20, 
    __fds=0x7f4379e24038) at /usr/include/bits/poll2.h:46
#2  red_worker_main (arg=<optimized out>) at red_worker.c:12235
#3  0x00007f436df73dc5 in start_thread (arg=0x7f42b0dff700)
    at pthread_create.c:308
#4  0x00007f436c6ac1cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7f42afdff700 (LWP 23562)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f4374c48869 in qemu_cond_wait (cond=cond@entry=0x7f43768b28a0, 
    mutex=mutex@entry=0x7f43768b2878) at util/qemu-thread-posix.c:123
#2  0x00007f43749d5869 in do_data_decompress (opaque=0x7f43768b2870)
    at /usr/src/debug/qemu-2.6.0/migration/ram.c:2198
#3  0x00007f436df73dc5 in start_thread (arg=0x7f42afdff700)
    at pthread_create.c:308
#4  0x00007f436c6ac1cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7f42af5fe700 (LWP 23563)):
---Type <return> to continue, or q <return> to quit---
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f4374c48869 in qemu_cond_wait (cond=cond@entry=0x7f43768b2918, 
    mutex=mutex@entry=0x7f43768b28f0) at util/qemu-thread-posix.c:123
#2  0x00007f43749d5869 in do_data_decompress (opaque=0x7f43768b28e8)
    at /usr/src/debug/qemu-2.6.0/migration/ram.c:2198
#3  0x00007f436df73dc5 in start_thread (arg=0x7f42af5fe700)
    at pthread_create.c:308
#4  0x00007f436c6ac1cd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 1 (Thread 0x7f4374720c40 (LWP 23479)):
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f4374c48869 in qemu_cond_wait (
    cond=cond@entry=0x7f4375231600 <qemu_work_cond>, 
    mutex=mutex@entry=0x7f4375231700 <qemu_global_mutex>)
    at util/qemu-thread-posix.c:123
#2  0x00007f43749b331e in run_on_cpu (cpu=<optimized out>, 
    func=<optimized out>, data=<optimized out>)
    at /usr/src/debug/qemu-2.6.0/cpus.c:940
#3  0x00007f43749c295f in kvm_cpu_synchronize_post_init (
    cpu=cpu@entry=0x7f4377594000) at /usr/src/debug/qemu-2.6.0/kvm-all.c:1875
---Type <return> to continue, or q <return> to quit---
#4  0x00007f43749b312a in cpu_synchronize_post_init (cpu=<optimized out>)
    at /usr/src/debug/qemu-2.6.0/include/sysemu/kvm.h:470
#5  cpu_synchronize_all_post_init () at /usr/src/debug/qemu-2.6.0/cpus.c:729
#6  0x00007f43749dbd6e in qemu_loadvm_state (f=f@entry=0x7f4378c2e000)
    at /usr/src/debug/qemu-2.6.0/migration/savevm.c:2029
#7  0x00007f4374b5f76b in process_incoming_migration_co (opaque=0x7f4378c2e000)
    at migration/migration.c:385
#8  0x00007f4374c57d9a in coroutine_trampoline (i0=<optimized out>, 
    i1=<optimized out>) at util/coroutine-ucontext.c:78
#9  0x00007f436c5fd110 in ?? () from /lib64/libc.so.6
#10 0x00007fffa33b55a0 in ?? ()
#11 0x0000000000000000 in ?? ()

Comment 2 Karen Noel 2016-08-22 22:19:45 UTC
Live migration and with the PMU enabled is not supported. But, we'll see if someone wants to look at the crash anyway.