Bug 1571230

Summary:	Off-line migration via "exec:gzip -c", vm hit call trace when boot it with "-incoming"
Product:	Red Hat Enterprise Linux 7	Reporter:	xianwang <xianwang>
Component:	qemu-kvm-rhev	Assignee:	Laurent Vivier <lvivier>
Status:	CLOSED NOTABUG	QA Contact:	xianwang <xianwang>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	7.5	CC:	dgibson, knoel, lvivier, mdeng, micai, michen, pbonzini, qzhang, virt-maint, xianwang, xuma, yhong, yilzhang
Target Milestone:	rc
Target Release:	---
Hardware:	ppc64le
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-06-04 15:43:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description xianwang 2018-04-24 11:14:56 UTC

Description of problem:
Do off-line migration via "exec:gzip -c > //home/xianwang/STATEFILE.gz", migration status is "completed", then start listening mode with 
"-incoming "exec: gzip -c -d //home/xianwang/STATEFILE.gz"", guest hit call trace after boot up.

Version-Release number of selected component (if applicable):
Host:
3.10.0-862.el7.ppc64le
qemu-kvm-rhev-2.10.0-21.el7_5.1.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

Guest:
3.10.0-847.el7.ppc64le

How reproducible:
50%

Steps to Reproduce:
1.Boot a guest with qemu cli as "Additional info"
2.Stop vm and save VM state into a compressed file in host
(qemu) stop
or {"execute":"stop"}
(qemu) migrate_set_speed 1G
(qemu) migrate -d "exec:gzip -c> /home/xianwang/STATEFILE.gz"
after (qemu) info status
VM status: paused (postmigrate)
3.Load the file in local host.
"-incoming "exec: gzip -c -d /home/xianwang/STATEFILE.gz""
4.Check network, dd a file and dmesg inside guest

Actual results:
guest ping external host--successfully
dd a file inside guest--successfully
dmesg---hit call trace
[   92.520064] Task dump for CPU 0:
[   92.520078] swapper/0       R  running task        0     0      0 0x00000000
[   92.520098] Call Trace:
[   92.520149] [c000000001273b60] [c0000000001efdd8] rcu_idle_exit+0xb8/0x1d0 (unreliable)
[   92.520171] [c000000001273d30] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520172] Task dump for CPU 1:
[   92.520180] swapper/1       R  running task        0     0      1 0x00000800
[   92.520182] Call Trace:
[   92.520192] [c00000017b82fba0] [c000000001274700] .TOC.+0x0/0x3900 (unreliable)
[   92.520194] [c00000017b82fd70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520195] Task dump for CPU 2:
[   92.520203] goa-daemon      R  running task        0  2344   2343 0x00040080
[   92.520212] Call Trace:
[   92.520460] [c0000000296bb780] [c0000000296bb810] 0xc0000000296bb810
[   92.520461] Task dump for CPU 3:
[   92.520468] swapper/3       R  running task        0     0      1 0x00000800
[   92.520470] Call Trace:
[   92.520473] [c00000017b837ba0] [c00000000014e970] post_schedule_idle+0x0/0x30 (unreliable)
[   92.520475] [c00000017b837d70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520476] Task dump for CPU 4:
[   92.520477] swapper/4       R  running task        0     0      1 0x00000804
[   92.520479] Call Trace:
[   92.520481] [c00000017b83b5a0] [c00000000001b5e0] show_stack+0x80/0x330 (unreliable)
[   92.520483] [c00000017b83b650] [c00000000014bcf0] dump_cpu_task+0x100/0x1d0
[   92.520485] [c00000017b83b6c0] [c0000000001f5a34] rcu_check_callbacks+0x724/0xba0
[   92.520487] [c00000017b83b800] [c0000000000ff68c] update_process_times+0x5c/0xb0
[   92.520490] [c00000017b83b840] [c0000000001863c4] tick_sched_timer+0x84/0x180
[   92.520492] [c00000017b83b880] [c00000000012b4d0] __hrtimer_run_queues+0xf0/0x3f0
[   92.520494] [c00000017b83b920] [c00000000012bdfc] hrtimer_interrupt+0xdc/0x310
[   92.520495] [c00000017b83b9e0] [c000000000023060] __timer_interrupt+0x90/0x240
[   92.520497] [c00000017b83ba30] [c0000000000232b0] timer_interrupt+0xa0/0xe0
[   92.520499] [c00000017b83ba60] [c000000000002a14] decrementer_common+0x114/0x180
[   92.520516] --- Exception: 901 at plpar_hcall_norets+0x8c/0xdc
    LR = shared_cede_loop+0xb8/0xd0
[   92.520520] [c00000017b83bd50] [c000000140000000] 0xc000000140000000 (unreliable)
[   92.520522] [c00000017b83bdc0] [c0000000007e489c] cpuidle_idle_call+0x11c/0x400
[   92.520523] [c00000017b83be30] [c0000000000a04a8] pseries_lpar_idle+0x18/0x60
[   92.520525] [c00000017b83be90] [c00000000001bfb8] arch_cpu_idle+0x68/0x160
[   92.520527] [c00000017b83bec0] [c0000000001723c0] cpu_startup_entry+0x170/0x1e0
[   92.520530] [c00000017b83bf20] [c000000000050980] start_secondary+0x310/0x340
[   92.520532] [c00000017b83bf90] [c000000000009a6c] start_secondary_prolog+0x10/0x14
[   92.520533] Task dump for CPU 5:
[   92.520540] swapper/5       R  running task        0     0      1 0x00000800
[   92.520542] Call Trace:
[   92.520544] [c00000017b83fd70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520545] Task dump for CPU 6:
[   92.520553] swapper/6       R  running task        0     0      1 0x00000800
[   92.520555] Call Trace:
[   92.520557] [c00000017b843ba0] [0000000000000008] 0x8 (unreliable)
[   92.520560] [c00000017b843d70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520561] Task dump for CPU 7:
[   92.520568] swapper/7       R  running task        0     0      1 0x00000800
[   92.520570] Call Trace:
[   92.520572] [c00000017b847ba0] [c000000001274700] .TOC.+0x0/0x3900 (unreliable)
[   92.520574] [c00000017b847d70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8

Expected results:
guest don't hit call trace

Additional info:
/usr/libexec/qemu-kvm  \
-nodefaults  \
-vnc :19 \
-monitor stdio \
-rtc base=utc,clock=host \
-netdev tap,id=tap0,vhost=on \
-boot menu=off,strict=off,order=cdn,once=c \
-m 4096 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/rhel75-ppc64le-virtio-scsi.qcow2 \
-vga std \
-machine pseries \
-sandbox off \
-enable-kvm  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server,nowait \
-qmp tcp:0:3339,server,nowait \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=pci.0,addr=0x5 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-serial tcp:0:4449,server,nowait \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-name "vm" \

Comment 2 xianwang 2018-04-24 11:20:06 UTC

This issue is hit only on powerpc platform, works well on x86_64 platform

Comment 4 David Gibson 2018-04-26 02:26:07 UTC

My guess would be that this is due to the cpu clock stopping during the offline migration.

Comment 5 Laurent Vivier 2018-05-04 07:09:10 UTC

Perhaps this upstream patch could fix this issue:

target/ppc: only save guest timebase once after stopping
http://patchwork.ozlabs.org/patch/908487/

Comment 8 Paolo Bonzini 2018-05-18 15:54:55 UTC

The outcome of the discussion was:

> Yes, downtimes can sometimes be long.  I still think it's correct to
> keep the clock going in that case.  The guest may give warnings
> because it's seeing something funny with the clock.  Something *is*
> funny with the clock, and those warnings are correct.  Basically, when
> the downtime is that long we can't really maintain the illusion of a
> continuously running VM.  Pretending we can by fudging the clocks is
> not doing our users a service

Does this mean this bug is invalid?

Comment 9 David Gibson 2018-06-04 06:44:00 UTC

> Does this mean this bug is invalid?

That would be my opinion.

Comment 10 Laurent Vivier 2018-06-04 15:43:56 UTC

As stated in comment 8 and comment 7, it's not a bug.