Bug 1378006 - guest paused on target host sometimes when do migration during guest boot
Summary: guest paused on target host sometimes when do migration during guest boot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: seabios
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Dr. David Alan Gilbert
QA Contact: FuXiangChun
URL:
Whiteboard:
Depends On:
Blocks: 1401400
TreeView+ depends on / blocked
 
Reported: 2016-09-21 09:44 UTC by yafu
Modified: 2017-08-01 17:44 UTC (History)
15 users (show)

Fixed In Version: seabios-1.10.1-2.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 17:44:06 UTC
Target Upstream Version:


Attachments (Terms of Use)
domain xml (3.83 KB, text/html)
2016-09-21 09:44 UTC, yafu
no flags Details
qemu log and libvirtd log both on source and target host (499.05 KB, application/x-gzip)
2016-09-21 09:46 UTC, yafu
no flags Details
host crash screen (1.89 MB, image/jpeg)
2016-11-29 05:21 UTC, yafu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1855 0 normal SHIPPED_LIVE seabios bug fix and enhancement update 2017-08-01 18:03:30 UTC

Description yafu 2016-09-21 09:44:55 UTC
Created attachment 1203194 [details]
domain xml

Description of problem:
Do migration during guest boot, guest paused on target host after migration completed sometimes.

Version-Release number of selected component (if applicable):
libvirt-2.0.0-9.el7.x86_64
qemu-kvm-rhev-2.6.0-25.el7.x86_64

How reproducible:
20%

Steps to Reproduce:
 
1.start a guest on the source host
 #virsh start mig1

2.During the process of guest boot, begin to perform the migration:
# virsh migrate mig1 --live qemu+ssh://10.66.14.148/system --verbose
Migration: [100 %]

3.Check the guest status on the target host:
# virsh list
 Id    Name                           State
----------------------------------------------------
 3     mig1                       paused

4.Check the guesgt status using qmp:
## virsh qemu-monitor-command mig1 --hmp info status
VM status: paused (internal-error)


5.There is error in the qemu log on the target host:
#cat /var/log/libvirt/qemu/mig1.log
KVM internal error. Suberror: 1
emulation failure
EAX=0000a0b5 EBX=ffffffff ECX=0002ffff EDX=000a0000
ESI=ffffffff EDI=ffffffff EBP=ffffffff ESP=000a8000
EIP=ffffffff EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f79b0 00000037
IDT=     000f79ee 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=5b 66 5e 66 c3 ea 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


Actual results:
guest on the target host paused after migration completed.

Expected results:
guest on the target host is running after migration completed.

Additional info:
1.Please see the guest xml in the attachment;

2.Please see the libvirt and qemu log both on the source and target host in the attachment;

3.The stack trace of the paused guest:
  #gstack `pidof qemu-kvm`
  #2  0x00007f28f2f48f7e in call_rcu_thread ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f28cd662700 (LWP 1375)):
#0  0x00007f28da22a6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f28f2f3a669 in qemu_cond_wait ()
#2  0x00007f28f2ca45c3 in qemu_kvm_cpu_thread_fn ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f28cce61700 (LWP 1376)):
#0  0x00007f28da22a6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f28f2f3a669 in qemu_cond_wait ()
#2  0x00007f28f2ca45c3 in qemu_kvm_cpu_thread_fn ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f28cc660700 (LWP 1377)):
#0  0x00007f28da22a6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f28f2f3a669 in qemu_cond_wait ()
#2  0x00007f28f2ca45c3 in qemu_kvm_cpu_thread_fn ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f28cbe5f700 (LWP 1378)):
#0  0x00007f28da22a6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f28f2f3a669 in qemu_cond_wait ()
#2  0x00007f28f2ca45c3 in qemu_kvm_cpu_thread_fn ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f2866fff700 (LWP 1380)):
#0  0x00007f28d9f4adfd in poll () from /lib64/libc.so.6
#1  0x00007f28dbc81327 in red_worker_main () from /lib64/libspice-server.so.1
#2  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f28667fe700 (LWP 1381)):
#0  0x00007f28d9f4adfd in poll () from /lib64/libc.so.6
#1  0x00007f28dbc81327 in red_worker_main () from /lib64/libspice-server.so.1
#2  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f28f2a08d00 (LWP 1362)):
#0  0x00007f28d9f4aebf in ppoll () from /lib64/libc.so.6
#1  0x00007f28f2ea8009 in qemu_poll_ns ()
#2  0x00007f28f2ea799c in main_loop_wait ()
#3  0x00007f28f2c7570f in main ()


4.Use the scripts below can easily reproduce the error:
  #!/bin/bash

HOSTA="$2"
HOSTB="$3"
i=0
while [ $i -le 1024 ]
do
        virsh migrate $1 qemu+ssh://$HOSTB/system --live --verbose
        ssh $HOSTB virsh list
        sleep 5
        ssh $HOSTB virsh migrate $1 qemu+ssh://$HOSTA/system --live --verbose
        virsh list
        sleep 5
        i=`expr $i + 1`
done

Comment 1 yafu 2016-09-21 09:46:34 UTC
Created attachment 1203195 [details]
qemu log and libvirtd log both on source and target host

Comment 9 Amit Shah 2016-09-29 08:18:38 UTC
Can you try with seabios from 7.2?  Paolo's guess is that if that works, it would be a problem with SMM.

Comment 10 yafu 2016-09-29 09:17:42 UTC
(In reply to Amit Shah from comment #9)
> Can you try with seabios from 7.2?  Paolo's guess is that if that works, it
> would be a problem with SMM.

Sorry, I did not understand. Do you mean test with rhel7.2 host or guest machine type using pc-i440fx-rhel7.2.0 ?

Comment 11 Amit Shah 2016-09-29 09:34:05 UTC
(In reply to yafu from comment #10)
> (In reply to Amit Shah from comment #9)
> > Can you try with seabios from 7.2?  Paolo's guess is that if that works, it
> > would be a problem with SMM.
> 
> Sorry, I did not understand. Do you mean test with rhel7.2 host or guest
> machine type using pc-i440fx-rhel7.2.0 ?

Just use the seabios package from RHEL7.2 release.  Keep everything else the same as in your original test.

Comment 13 yafu 2016-10-08 04:11:07 UTC
(In reply to Amit Shah from comment #11)
> (In reply to yafu from comment #10)
> > (In reply to Amit Shah from comment #9)
> > > Can you try with seabios from 7.2?  Paolo's guess is that if that works, it
> > > would be a problem with SMM.
> > 
> > Sorry, I did not understand. Do you mean test with rhel7.2 host or guest
> > machine type using pc-i440fx-rhel7.2.0 ?
> 
> Just use the seabios package from RHEL7.2 release.  Keep everything else the
> same as in your original test.

1.I tested with seabios-bin-1.7.5-11.el7.noarch and multiple times. The migration works well during guest boot.

2.I can still reproduce the issue with seabios-bin-1.9.1-5.el7.noarch and qemu-kvm-rhev-2.6.0-28.el7.x86_64.

Comment 14 Amit Shah 2016-10-12 10:04:45 UTC
Thank you for testing!

Upstream commit 9c1f8f4493e8355d0e48f7d1eebdf86893ba082d would likely resolve this.

Can you check if this qemu build fixes the problem?  (Check with all components from 7.3 - including seabios).

Comment 22 yafu 2016-11-29 05:21:31 UTC
Created attachment 1225605 [details]
host crash screen

Comment 37 Miroslav Rezanina 2017-02-03 10:46:42 UTC
Fix included in seabios-1.10.1-2.el7

Comment 40 FuXiangChun 2017-06-15 10:44:59 UTC
According to comment0, I try to reproduce it with qemu-kvm-rhev-2.6.0-25.el7.x86_64 & seabios-1.9.1-5.el7.x86_64

ping-pong 32 times via your script. But can not reproduce it yet. how many times did you test before?

Comment 41 yafu 2017-06-16 06:00:13 UTC
(In reply to FuXiangChun from comment #40)
> According to comment0, I try to reproduce it with
> qemu-kvm-rhev-2.6.0-25.el7.x86_64 & seabios-1.9.1-5.el7.x86_64
> 
> ping-pong 32 times via your script. But can not reproduce it yet. how many
> times did you test before?

The bug only happens during guest os booting. So you need to repeat doing migration during guest os booting.

Comment 43 FuXiangChun 2017-06-16 07:44:28 UTC
According to this result. set this bug as verified.

Comment 47 errata-xmlrpc 2017-08-01 17:44:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1855


Note You need to log in before you can comment on or make changes to this bug.