Bug 1378006

Summary: guest paused on target host sometimes when do migration during guest boot
Product: Red Hat Enterprise Linux 7 Reporter: yafu <yafu>
Component: seabiosAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED ERRATA QA Contact: FuXiangChun <xfu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: chayang, dgilbert, dyuan, fjin, hhuang, juzhang, lersek, mrezanin, qizhu, quintela, qzhang, rjones, virt-maint, yafu, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: seabios-1.10.1-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 17:44:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1401400    
Attachments:
Description Flags
domain xml
none
qemu log and libvirtd log both on source and target host
none
host crash screen none

Description yafu 2016-09-21 09:44:55 UTC
Created attachment 1203194 [details]
domain xml

Description of problem:
Do migration during guest boot, guest paused on target host after migration completed sometimes.

Version-Release number of selected component (if applicable):
libvirt-2.0.0-9.el7.x86_64
qemu-kvm-rhev-2.6.0-25.el7.x86_64

How reproducible:
20%

Steps to Reproduce:
 
1.start a guest on the source host
 #virsh start mig1

2.During the process of guest boot, begin to perform the migration:
# virsh migrate mig1 --live qemu+ssh://10.66.14.148/system --verbose
Migration: [100 %]

3.Check the guest status on the target host:
# virsh list
 Id    Name                           State
----------------------------------------------------
 3     mig1                       paused

4.Check the guesgt status using qmp:
## virsh qemu-monitor-command mig1 --hmp info status
VM status: paused (internal-error)


5.There is error in the qemu log on the target host:
#cat /var/log/libvirt/qemu/mig1.log
KVM internal error. Suberror: 1
emulation failure
EAX=0000a0b5 EBX=ffffffff ECX=0002ffff EDX=000a0000
ESI=ffffffff EDI=ffffffff EBP=ffffffff ESP=000a8000
EIP=ffffffff EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     000f79b0 00000037
IDT=     000f79ee 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=5b 66 5e 66 c3 ea 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


Actual results:
guest on the target host paused after migration completed.

Expected results:
guest on the target host is running after migration completed.

Additional info:
1.Please see the guest xml in the attachment;

2.Please see the libvirt and qemu log both on the source and target host in the attachment;

3.The stack trace of the paused guest:
  #gstack `pidof qemu-kvm`
  #2  0x00007f28f2f48f7e in call_rcu_thread ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f28cd662700 (LWP 1375)):
#0  0x00007f28da22a6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f28f2f3a669 in qemu_cond_wait ()
#2  0x00007f28f2ca45c3 in qemu_kvm_cpu_thread_fn ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f28cce61700 (LWP 1376)):
#0  0x00007f28da22a6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f28f2f3a669 in qemu_cond_wait ()
#2  0x00007f28f2ca45c3 in qemu_kvm_cpu_thread_fn ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f28cc660700 (LWP 1377)):
#0  0x00007f28da22a6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f28f2f3a669 in qemu_cond_wait ()
#2  0x00007f28f2ca45c3 in qemu_kvm_cpu_thread_fn ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f28cbe5f700 (LWP 1378)):
#0  0x00007f28da22a6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f28f2f3a669 in qemu_cond_wait ()
#2  0x00007f28f2ca45c3 in qemu_kvm_cpu_thread_fn ()
#3  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f2866fff700 (LWP 1380)):
#0  0x00007f28d9f4adfd in poll () from /lib64/libc.so.6
#1  0x00007f28dbc81327 in red_worker_main () from /lib64/libspice-server.so.1
#2  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f28667fe700 (LWP 1381)):
#0  0x00007f28d9f4adfd in poll () from /lib64/libc.so.6
#1  0x00007f28dbc81327 in red_worker_main () from /lib64/libspice-server.so.1
#2  0x00007f28da226dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f28d9f5573d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f28f2a08d00 (LWP 1362)):
#0  0x00007f28d9f4aebf in ppoll () from /lib64/libc.so.6
#1  0x00007f28f2ea8009 in qemu_poll_ns ()
#2  0x00007f28f2ea799c in main_loop_wait ()
#3  0x00007f28f2c7570f in main ()


4.Use the scripts below can easily reproduce the error:
  #!/bin/bash

HOSTA="$2"
HOSTB="$3"
i=0
while [ $i -le 1024 ]
do
        virsh migrate $1 qemu+ssh://$HOSTB/system --live --verbose
        ssh $HOSTB virsh list
        sleep 5
        ssh $HOSTB virsh migrate $1 qemu+ssh://$HOSTA/system --live --verbose
        virsh list
        sleep 5
        i=`expr $i + 1`
done

Comment 1 yafu 2016-09-21 09:46:34 UTC
Created attachment 1203195 [details]
qemu log and libvirtd log both on source and target host

Comment 9 Amit Shah 2016-09-29 08:18:38 UTC
Can you try with seabios from 7.2?  Paolo's guess is that if that works, it would be a problem with SMM.

Comment 10 yafu 2016-09-29 09:17:42 UTC
(In reply to Amit Shah from comment #9)
> Can you try with seabios from 7.2?  Paolo's guess is that if that works, it
> would be a problem with SMM.

Sorry, I did not understand. Do you mean test with rhel7.2 host or guest machine type using pc-i440fx-rhel7.2.0 ?

Comment 11 Amit Shah 2016-09-29 09:34:05 UTC
(In reply to yafu from comment #10)
> (In reply to Amit Shah from comment #9)
> > Can you try with seabios from 7.2?  Paolo's guess is that if that works, it
> > would be a problem with SMM.
> 
> Sorry, I did not understand. Do you mean test with rhel7.2 host or guest
> machine type using pc-i440fx-rhel7.2.0 ?

Just use the seabios package from RHEL7.2 release.  Keep everything else the same as in your original test.

Comment 13 yafu 2016-10-08 04:11:07 UTC
(In reply to Amit Shah from comment #11)
> (In reply to yafu from comment #10)
> > (In reply to Amit Shah from comment #9)
> > > Can you try with seabios from 7.2?  Paolo's guess is that if that works, it
> > > would be a problem with SMM.
> > 
> > Sorry, I did not understand. Do you mean test with rhel7.2 host or guest
> > machine type using pc-i440fx-rhel7.2.0 ?
> 
> Just use the seabios package from RHEL7.2 release.  Keep everything else the
> same as in your original test.

1.I tested with seabios-bin-1.7.5-11.el7.noarch and multiple times. The migration works well during guest boot.

2.I can still reproduce the issue with seabios-bin-1.9.1-5.el7.noarch and qemu-kvm-rhev-2.6.0-28.el7.x86_64.

Comment 14 Amit Shah 2016-10-12 10:04:45 UTC
Thank you for testing!

Upstream commit 9c1f8f4493e8355d0e48f7d1eebdf86893ba082d would likely resolve this.

Can you check if this qemu build fixes the problem?  (Check with all components from 7.3 - including seabios).

Comment 22 yafu 2016-11-29 05:21:31 UTC
Created attachment 1225605 [details]
host crash screen

Comment 37 Miroslav Rezanina 2017-02-03 10:46:42 UTC
Fix included in seabios-1.10.1-2.el7

Comment 40 FuXiangChun 2017-06-15 10:44:59 UTC
According to comment0, I try to reproduce it with qemu-kvm-rhev-2.6.0-25.el7.x86_64 & seabios-1.9.1-5.el7.x86_64

ping-pong 32 times via your script. But can not reproduce it yet. how many times did you test before?

Comment 41 yafu 2017-06-16 06:00:13 UTC
(In reply to FuXiangChun from comment #40)
> According to comment0, I try to reproduce it with
> qemu-kvm-rhev-2.6.0-25.el7.x86_64 & seabios-1.9.1-5.el7.x86_64
> 
> ping-pong 32 times via your script. But can not reproduce it yet. how many
> times did you test before?

The bug only happens during guest os booting. So you need to repeat doing migration during guest os booting.

Comment 43 FuXiangChun 2017-06-16 07:44:28 UTC
According to this result. set this bug as verified.

Comment 47 errata-xmlrpc 2017-08-01 17:44:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1855