1571230 – Off-line migration via "exec:gzip -c", vm hit call trace when boot it with "-incoming"

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1571230 - Off-line migration via "exec:gzip -c", vm hit call trace when boot it with "-incoming"

Summary: Off-line migration via "exec:gzip -c", vm hit call trace when boot it with "...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.5
Hardware:	ppc64le
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Laurent Vivier
QA Contact:	xianwang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-24 11:14 UTC by xianwang
Modified:	2018-06-04 15:43 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-04 15:43:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description xianwang 2018-04-24 11:14:56 UTC

Description of problem:
Do off-line migration via "exec:gzip -c > //home/xianwang/STATEFILE.gz", migration status is "completed", then start listening mode with 
"-incoming "exec: gzip -c -d //home/xianwang/STATEFILE.gz"", guest hit call trace after boot up.

Version-Release number of selected component (if applicable):
Host:
3.10.0-862.el7.ppc64le
qemu-kvm-rhev-2.10.0-21.el7_5.1.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

Guest:
3.10.0-847.el7.ppc64le

How reproducible:
50%

Steps to Reproduce:
1.Boot a guest with qemu cli as "Additional info"
2.Stop vm and save VM state into a compressed file in host
(qemu) stop
or {"execute":"stop"}
(qemu) migrate_set_speed 1G
(qemu) migrate -d "exec:gzip -c> /home/xianwang/STATEFILE.gz"
after (qemu) info status
VM status: paused (postmigrate)
3.Load the file in local host.
"-incoming "exec: gzip -c -d /home/xianwang/STATEFILE.gz""
4.Check network, dd a file and dmesg inside guest

Actual results:
guest ping external host--successfully
dd a file inside guest--successfully
dmesg---hit call trace
[   92.520064] Task dump for CPU 0:
[   92.520078] swapper/0       R  running task        0     0      0 0x00000000
[   92.520098] Call Trace:
[   92.520149] [c000000001273b60] [c0000000001efdd8] rcu_idle_exit+0xb8/0x1d0 (unreliable)
[   92.520171] [c000000001273d30] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520172] Task dump for CPU 1:
[   92.520180] swapper/1       R  running task        0     0      1 0x00000800
[   92.520182] Call Trace:
[   92.520192] [c00000017b82fba0] [c000000001274700] .TOC.+0x0/0x3900 (unreliable)
[   92.520194] [c00000017b82fd70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520195] Task dump for CPU 2:
[   92.520203] goa-daemon      R  running task        0  2344   2343 0x00040080
[   92.520212] Call Trace:
[   92.520460] [c0000000296bb780] [c0000000296bb810] 0xc0000000296bb810
[   92.520461] Task dump for CPU 3:
[   92.520468] swapper/3       R  running task        0     0      1 0x00000800
[   92.520470] Call Trace:
[   92.520473] [c00000017b837ba0] [c00000000014e970] post_schedule_idle+0x0/0x30 (unreliable)
[   92.520475] [c00000017b837d70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520476] Task dump for CPU 4:
[   92.520477] swapper/4       R  running task        0     0      1 0x00000804
[   92.520479] Call Trace:
[   92.520481] [c00000017b83b5a0] [c00000000001b5e0] show_stack+0x80/0x330 (unreliable)
[   92.520483] [c00000017b83b650] [c00000000014bcf0] dump_cpu_task+0x100/0x1d0
[   92.520485] [c00000017b83b6c0] [c0000000001f5a34] rcu_check_callbacks+0x724/0xba0
[   92.520487] [c00000017b83b800] [c0000000000ff68c] update_process_times+0x5c/0xb0
[   92.520490] [c00000017b83b840] [c0000000001863c4] tick_sched_timer+0x84/0x180
[   92.520492] [c00000017b83b880] [c00000000012b4d0] __hrtimer_run_queues+0xf0/0x3f0
[   92.520494] [c00000017b83b920] [c00000000012bdfc] hrtimer_interrupt+0xdc/0x310
[   92.520495] [c00000017b83b9e0] [c000000000023060] __timer_interrupt+0x90/0x240
[   92.520497] [c00000017b83ba30] [c0000000000232b0] timer_interrupt+0xa0/0xe0
[   92.520499] [c00000017b83ba60] [c000000000002a14] decrementer_common+0x114/0x180
[   92.520516] --- Exception: 901 at plpar_hcall_norets+0x8c/0xdc
    LR = shared_cede_loop+0xb8/0xd0
[   92.520520] [c00000017b83bd50] [c000000140000000] 0xc000000140000000 (unreliable)
[   92.520522] [c00000017b83bdc0] [c0000000007e489c] cpuidle_idle_call+0x11c/0x400
[   92.520523] [c00000017b83be30] [c0000000000a04a8] pseries_lpar_idle+0x18/0x60
[   92.520525] [c00000017b83be90] [c00000000001bfb8] arch_cpu_idle+0x68/0x160
[   92.520527] [c00000017b83bec0] [c0000000001723c0] cpu_startup_entry+0x170/0x1e0
[   92.520530] [c00000017b83bf20] [c000000000050980] start_secondary+0x310/0x340
[   92.520532] [c00000017b83bf90] [c000000000009a6c] start_secondary_prolog+0x10/0x14
[   92.520533] Task dump for CPU 5:
[   92.520540] swapper/5       R  running task        0     0      1 0x00000800
[   92.520542] Call Trace:
[   92.520544] [c00000017b83fd70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520545] Task dump for CPU 6:
[   92.520553] swapper/6       R  running task        0     0      1 0x00000800
[   92.520555] Call Trace:
[   92.520557] [c00000017b843ba0] [0000000000000008] 0x8 (unreliable)
[   92.520560] [c00000017b843d70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8
[   92.520561] Task dump for CPU 7:
[   92.520568] swapper/7       R  running task        0     0      1 0x00000800
[   92.520570] Call Trace:
[   92.520572] [c00000017b847ba0] [c000000001274700] .TOC.+0x0/0x3900 (unreliable)
[   92.520574] [c00000017b847d70] [c00000000167fc98] cpuidle_curr_governor+0x0/0x8

Expected results:
guest don't hit call trace

Additional info:
/usr/libexec/qemu-kvm  \
-nodefaults  \
-vnc :19 \
-monitor stdio \
-rtc base=utc,clock=host \
-netdev tap,id=tap0,vhost=on \
-boot menu=off,strict=off,order=cdn,once=c \
-m 4096 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/rhel75-ppc64le-virtio-scsi.qcow2 \
-vga std \
-machine pseries \
-sandbox off \
-enable-kvm  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server,nowait \
-qmp tcp:0:3339,server,nowait \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=pci.0,addr=0x5 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-serial tcp:0:4449,server,nowait \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-name "vm" \

Comment 2 xianwang 2018-04-24 11:20:06 UTC

This issue is hit only on powerpc platform, works well on x86_64 platform

Comment 4 David Gibson 2018-04-26 02:26:07 UTC

My guess would be that this is due to the cpu clock stopping during the offline migration.

Comment 5 Laurent Vivier 2018-05-04 07:09:10 UTC

Perhaps this upstream patch could fix this issue:

target/ppc: only save guest timebase once after stopping
http://patchwork.ozlabs.org/patch/908487/

Comment 8 Paolo Bonzini 2018-05-18 15:54:55 UTC

The outcome of the discussion was:

> Yes, downtimes can sometimes be long.  I still think it's correct to
> keep the clock going in that case.  The guest may give warnings
> because it's seeing something funny with the clock.  Something *is*
> funny with the clock, and those warnings are correct.  Basically, when
> the downtime is that long we can't really maintain the illusion of a
> continuously running VM.  Pretending we can by fudging the clocks is
> not doing our users a service

Does this mean this bug is invalid?

Comment 9 David Gibson 2018-06-04 06:44:00 UTC

> Does this mean this bug is invalid?

That would be my opinion.

Comment 10 Laurent Vivier 2018-06-04 15:43:56 UTC

As stated in comment 8 and comment 7, it's not a bug.

Note You need to log in before you can comment on or make changes to this bug.