Bug 612074

Summary: core dumped while live migration with spice
Product: Red Hat Enterprise Linux 6 Reporter: Cao, Chen <kcao>
Component: qemu-kvmAssignee: Gerd Hoffmann <kraxel>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: bcao, ddumas, llim, mkenneth, tburke, virt-maint
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.100.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-10 21:26:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 618168    
Bug Blocks:    

Description Cao, Chen 2010-07-07 09:19:20 UTC
Description of problem:
core dumped while live migration with spice.
guest os Windows 2008 64 rtl8139 ide smp = 1



Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.90


How reproducible:
100% (2/2)


Steps to Reproduce:
1. start source vm with command:
qemu-kvm -name 'vm1' -monitor unix:'/tmp/monitor-humanmonitor1-20100706-191418-KJad',server,nowait -drive file='tests/kvm/images/win2008-64.qcow2',if=ide,cache=none,aio=native -net nic,vlan=0,netdev=5JFn,model=rtl8139,macaddr='02:30:20:EC:85:8e' -netdev tap,id=5JFn,ifname=rtl8139_0_8000,script=tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 2048 -smp 1 -drive file='tests/kvm/isos/windows/winutils.iso',index=2,media=cdrom -vnc :0 -spice port=8000,disable-ticketing -usbdevice tablet -rtc-td-hack -cpu qemu64,+sse2 -no-kvm-pit-reinjection -rtc-td-hack -serial unix:/tmp/serial-20100706-191418-KJad,server,nowait

2. start dst vm with command:
qemu-kvm -name 'vm1' -monitor unix:'/tmp/monitor-humanmonitor1-20100706-191418-KJad',server,nowait -drive file='tests/kvm/images/win2008-64.qcow2',if=ide,cache=none,aio=native -net nic,vlan=0,netdev=XOAX,model=rtl8139,macaddr='02:30:20:EC:85:8e' -netdev tap,id=XOAX,ifname=rtl8139_0_8001,script=tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 2048 -smp 1 -drive file='tests/kvm/isos/windows/winutils.iso',index=2,media=cdrom -vnc :1 -spice port=8001,disable-ticketing -usbdevice tablet -rtc-td-hack -cpu qemu64,+sse2 -no-kvm-pit-reinjection -rtc-td-hack -serial unix:/tmp/serial-20100706-191418-KJad,server,nowait -incoming tcp:0:5200

3. migrate:
migrate tcp:localhost:5200

  
Actual results:
migration failed with core dump


Expected results:
migration succeeds, and guest continues running.


Additional info:

(gdb) bt
#0  0x000000362f083e7b in memcpy () from /lib64/libc.so.6
#1  0x0000000000471a00 in qemu_spice_display_create_update (ds=0x1b36fb0, 
    dirty=<value optimized out>, unique=<value optimized out>)
    at /usr/include/bits/string3.h:52
#2  0x0000000000471aab in interface_get_command (qxl=<value optimized out>, 
    cmd=0x7f8829426180) at /usr/src/debug/qemu-kvm-0.12.1.2/spice-display.c:255
#3  0x000000363b42cbd2 in ?? () from /usr/lib64/libspice-server.so.0
#4  0x000000363b42eb86 in red_worker_main ()
   from /usr/lib64/libspice-server.so.0
#5  0x000000362f8077e1 in start_thread () from /lib64/libpthread.so.0
#6  0x000000362f0e151d in clone () from /lib64/libc.so.6


$ uname -r
2.6.32-37.el6.x86_64

$ cat /proc/cpuinfo
processor	: 2
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: AMD Phenom(tm) 8750 Triple-Core Processor
stepping	: 3
cpu MHz		: 1200.000
cache size	: 512 KB
physical id	: 0
siblings	: 3
core id		: 2
cpu cores	: 3
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
bogomips	: 4809.91
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Comment 2 Gerd Hoffmann 2010-07-07 16:18:00 UTC
Hmm, triggers very rarely for me, once out of ~20 times, unfortunaly that was with core dumps disabled.  Happened on the destination host.

Which host crashes for you?  src?  dst?

Do you have a spice client connected?  To the src host?  To the dst host?  Both?

Comment 3 Cao, Chen 2010-07-08 01:55:09 UTC
(In reply to comment #2)
> Hmm, triggers very rarely for me, once out of ~20 times, unfortunaly that was
> with core dumps disabled.  Happened on the destination host.
> 
> Which host crashes for you?  src?  dst?
> 

dst crashes.

> Do you have a spice client connected?  To the src host?  To the dst host? 
> Both?    

I do not have spice client connected while migrating.


besides, I saw the "migration completed" msg, but the guest died.
and i cannot trigger this every time either.

Comment 4 RHEL Program Management 2010-07-15 14:11:31 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 5 Cao, Chen 2010-07-16 01:53:03 UTC
reproduced on qemu-kvm-0.12.1.2-2.93


(gdb) bt
#0  memcpy () at ../sysdeps/x86_64/memcpy.S:267
#1  0x0000000000471c20 in qemu_spice_display_create_update (ds=0x201afb0, 
    dirty=<value optimized out>, unique=<value optimized out>)
    at /usr/include/bits/string3.h:52
#2  0x0000000000471ccb in interface_get_command (qxl=<value optimized out>, 
    cmd=0x7fce38a23180) at /usr/src/debug/qemu-kvm-0.12.1.2/spice-display.c:255
#3  0x0000003f5d82cbd2 in ?? ()
#4  0x0000000000004a5b in ?? ()
#5  0x000000002892a86d in ?? ()



missing debuginfo package info:

Missing separate debuginfo for /lib64/libz.so.1
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.bu
ild-id/4a/36f8b932130acbe6e6a7372342b97a9d9a8b6b
Missing separate debuginfo for 
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.bu
ild-id/d2/1873acaffc5459742228a59b05878920a8ff4c

Comment 6 Gerd Hoffmann 2010-07-16 10:22:08 UTC
Finally found a way to trigger it reliably.
Investigating ...

Comment 7 Gerd Hoffmann 2010-07-16 11:33:16 UTC
patch posted to rhvirt-patches

Comment 11 Cao, Chen 2010-07-29 05:23:53 UTC
(In reply to comment #6)
> Finally found a way to trigger it reliably.
> Investigating ...    

Hi Gerd,

could you please share the way to trigger it reliably with us, to verify this bug?

Comment 12 Cao, Chen 2010-07-29 11:26:29 UTC
encountered core dump when try to verify:

(gdb) bt
#0  0x0000003e138329b5 in raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003e13834195 in abort () at abort.c:92
#2  0x0000003e1382b945 in __assert_fail (assertion=0x5bc90b "s->state != 2", 
    file=<value optimized out>, line=295, function=<value optimized out>)
    at assert.c:81
#3  0x00000000004b2eac in migrate_fd_cleanup (s=0x2a3cf10) at migration.c:295
#4  0x00000000004b2f85 in migrate_fd_put_ready (opaque=0x2a3cf10)
    at migration.c:396
#5  0x000000000040b588 in qemu_run_timers (timeout=1000)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:1166
#6  main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4235
#7  0x000000000042898a in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2133
#8  0x000000000040e47b in main_loop (argc=<value optimized out>, 
    argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4408
#9  main (argc=<value optimized out>, argv=<value optimized out>, 
    envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6564
(gdb) 

it is bug https://bugzilla.redhat.com/show_bug.cgi?id=618158, set dependency.

Comment 13 Cao, Chen 2010-08-02 09:13:11 UTC
(In reply to comment #11)
> (In reply to comment #6)
> > Finally found a way to trigger it reliably.
> > Investigating ...    
> 
> Hi Gerd,
> 
> could you please share the way to trigger it reliably with us, to verify this
> bug?    

Thanks Gerd, this problem most likely happens when changing from text mode to
gfx mode.

so the testcase should be:
1. migrating when just booting rhel (with init 5 as default)
2. or run
"while true; if [[ $next_init -eq 3 ]]; then init 3 && next_init=5; else init 5; fi"
while live migration.

Comment 14 Gerd Hoffmann 2010-08-02 09:28:57 UTC
[ adding info sent via mail here for reference ]

> Hello, Gerd,
>
> could you please share the way to trigger the bug with me? i'm very
> interested.
>
> and i have tried to print the registers/variables when core dumped,
> some lvalue (some coordication of the screen?) is NULL.
> so my guess is, the screen resolution is changing while live
> migration, do i touth the edge of the truth? :-)

Pretty close ;)

The problem is that the memory backing the framebuffer can be released while still being in use by spice server.  This can happen when the screen resolution changes, but it doesn't happen on every mode switch. It does happen when going from text mode to graphics mode.

The way I can easily trigger it is start migration right after boot, while the guest is still in text mode, so you have one txt -> gfx mode switch while migrating.

cheers,
  Gerd

Comment 15 Mike Cao 2010-08-05 07:46:29 UTC
Reproduced on qemu-kvm-0.12.1.2-2.90.el6
Verified on qemu-kvm-0.12.1.2-2.108.el6

Steps 
1. start VM in the src host
2. start the listening mode in the dest host
3. do live mgiration while the guest is booting.

Actual Results:
on the qemu-kvm-0.12.1.2-2.90.el6 ,qemu-kvm core dumped in the dest host.
on the qemu-kvm-0.12.1.2-2.108.el6, After migration ,No core dump occurs and guest can be used successfully.

Based on above ,this issue has already been fixed.Change status to VERIFIED.


Addtion note:
Can not reproduce following steps on comment #13,these steps may cause another issue ,i will do further research on it .

Comment 16 releng-rhel@redhat.com 2010-11-10 21:26:22 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.