RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1526266 - qemu-kvm process on destination side quitted itself abnormally after live migration
Summary: qemu-kvm process on destination side quitted itself abnormally after live mig...
Keywords:
Status: CLOSED DUPLICATE of bug 1523414
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: ppc64le
OS: Linux
urgent
high
Target Milestone: rc
: 7.5
Assignee: Laurent Vivier
QA Contact: xianwang
URL:
Whiteboard:
Depends On: 1523414
Blocks: 1399177 1476742
TreeView+ depends on / blocked
 
Reported: 2017-12-15 05:20 UTC by yilzhang
Modified: 2018-02-12 13:34 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-28 22:47:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 162995 0 None None None 2019-05-16 12:52:48 UTC

Description yilzhang 2017-12-15 05:20:24 UTC
Description of problem:
qemu-kvm process on destination side quitted itself abnormally after migration, when using NBD backend image as system disk.

Version-Release number of selected component (if applicable):
Host kernel:  4.14.0-18.el7a.ppc64le
Guest kernel: 4.14.0-18.el7a.ppc64le
qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-12.el7

How reproducible: 30%


Steps to Reproduce:
1. Start one guest on src host, the guest uses one NBD backend image as its system disk
/usr/libexec/qemu-kvm \
 -smp 8,sockets=2,cores=4,threads=1 -m 8192 \
-serial unix:/tmp/1-nbd-serial.log,server,nowait \
-nodefaults \
 -rtc base=localtime,clock=host \
 -boot menu=on \
 -monitor stdio \
 -vnc :88 \
 -qmp tcp:0:9991,server,nowait \
\
-object iothread,id=iothread0 \
 -device virtio-scsi-pci,bus=pci.0,id=scsi0,iothread=iothread0 \
-drive file=nbd://10.0.3.25:9002,if=none,cache=none,id=drive_sysdisk,snapshot=off,aio=native,format=qcow2,werror=stop,rerror=stop \
-device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 \
\
 -device virtio-scsi-pci,bus=pci.0,id=scsi1 \
 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
 -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:81 \

2. Start qemu on des host with "incoming" option
 -incoming tcp:0:7777

3. Migrate guest from src to des
(qemu) migrate   tcp:10.0.1.8:7777

4. After a while, check migration status on source side
(qemu) info status
VM status: paused (postmigrate)
(qemu) info migrate
globals: store-global-state=1, only_migratable=0, send-configuration=1, send-section-footer=1
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off 
Migration status: completed
total time: 15440 milliseconds
downtime: 87 milliseconds
setup: 13 milliseconds
transferred ram: 498703 kbytes
throughput: 266.39 mbps
remaining ram: 0 kbytes
total ram: 8388864 kbytes
duplicate: 1996121 pages
skipped: 0 pages
normal: 120055 pages
normal bytes: 480220 kbytes
dirty sync count: 5
page size: 4 kbytes

5. Check qemu-kvm process on destination side



Actual results:
qemu-kvm process on destination side quitted itself abnormally:
[root@c155f2-u7 NBD]# sh des-1503437.sh 
QEMU 2.10.0 monitor - type 'help' for more information
(qemu) info status
VM status: paused (inmigrate)
(qemu) [root@c155f2-u7 NBD]# 


Expected results:
qemu-kvm process on destination side should work well, and VM should be in running status


Additional info:

Comment 2 David Gibson 2017-12-19 05:13:57 UTC
Yilin,

Does this happen only on Power, or on x86 as well?

Comment 3 yilzhang 2017-12-19 08:40:48 UTC
Seems this bug only happens when the local filesystem of guest is nearly full, so the whole "Reproduce Steps" may be:
1. Start one guest on src host, the guest uses one NBD backend image as its system disk

2. Login guest, and write a big file to overwrite the guest's local filesystem
[root@virt8-Guest3 ~]# dd if=/dev/zero of=/root/test bs=1M count=16000 oflag=sync
dd: error writing ‘test’: No space left on device
13604+0 records in
13603+0 records out
14263844864 bytes (14 GB) copied, 403.345 s, 35.4 MB/s
[root@virt8-Guest3 ~]# df -h /root
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/rhelaa-root   17G   17G  840K 100% /

3. Start qemu on des host with "incoming" option
 -incoming tcp:0:7777

4. Migrate guest from src to des
(qemu) migrate   tcp:10.0.1.8:7777

5. After a while, check migration status on source side, the migration completed successfully.
(qemu) info status
VM status: paused (postmigrate)

6. Check qemu-kvm process on destination side


Actual results: Sometimes qemu-kvm process on destination side quitted itself abnormally after live migration


The reproducible rate is low, about 30%
And the above test was executed on Power9 hosts, and I will try it on x86.

Comment 4 David Gibson 2017-12-20 03:04:53 UTC
Thanks for the extra information.  I'll await information on whether it also occurs on x86.

Comment 5 yilzhang 2017-12-20 03:18:16 UTC
x86 doesn't have this issue. I ran the test for more than 30 times and did not hit.

Src host and Des host have the same kernel version and qemu-kvm version:
Host kernel: 3.10.0-823.el7.x86_64
qemu-kvm: qemu-kvm-rhev-2.10.0-12.el7

Guest kernel: 3.10.0-799.el7.x86_64

Comment 6 David Gibson 2017-12-21 04:35:29 UTC
This looks like it's ppc specific, so assigning to the POWER team.

Comment 7 xianwang 2017-12-27 08:52:20 UTC
I have hit this issue when do migration from P8 to P9, after migration, qemu of P9(dst) quit automatically without any related message.
version:
Host:
P8:
3.10.0-823.el7.ppc64le
qemu-kvm-rhev-2.10.0-13.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

P9:
4.14.0-18.el7a.XIVE_fixes.ppc64le
qemu-kvm-rhev-2.10.0-12.el7.BZ1525866.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch
# ppc64_cpu --smt=off
# echo N > /sys/module/kvm_hv/parameters/indep_threads_mode

Guest:
3.10.0-823.el7.ppc64le

P8->P9
p8 qemu cli:
/usr/libexec/qemu-kvm -monitor stdio -machine pseries-rhel7.5.0 -nodefaults  -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/RHEL.7.5LE.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0
p9 qemu cli:
/usr/libexec/qemu-kvm -monitor stdio -machine pseries-rhel7.5.0,max-cpu-compat=power8 -nodefaults -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/mount_point/RHEL.7.5LE.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 -incoming tcp:0:5801

result:
src: migration complete and gust is "VM status: paused (postmigrate)"
dst: qemu quit automatically without any message.

Comment 8 xianwang 2017-12-27 09:11:47 UTC
(In reply to xianwang from comment #7)
> I have hit this issue when do migration from P8 to P9, after migration, qemu
> of P9(dst) quit automatically without any related message.
> version:
> Host:
> P8:
> 3.10.0-823.el7.ppc64le
> qemu-kvm-rhev-2.10.0-13.el7.ppc64le
> SLOF-20170724-2.git89f519f.el7.noarch
> 
> P9:
> 4.14.0-18.el7a.XIVE_fixes.ppc64le
> qemu-kvm-rhev-2.10.0-12.el7.BZ1525866.ppc64le
> SLOF-20170724-2.git89f519f.el7.noarch
> # ppc64_cpu --smt=off
> # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode
> 
> Guest:
> 3.10.0-823.el7.ppc64le
> 
> P8->P9
> p8 qemu cli:
> /usr/libexec/qemu-kvm -monitor stdio -machine pseries-rhel7.5.0 -nodefaults 
> -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive
> id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,
> file=/home/RHEL.7.5LE.qcow2 -device
> scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-
> id=0,lun=0,bootindex=0
> p9 qemu cli:
> /usr/libexec/qemu-kvm -monitor stdio -machine
> pseries-rhel7.5.0,max-cpu-compat=power8 -nodefaults -device
> virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive
> id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,
> file=/home/xianwang/mount_point/RHEL.7.5LE.qcow2 -device
> scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-
> id=0,lun=0,bootindex=0 -incoming tcp:0:5801
> 
> result:
> src: migration complete and gust is "VM status: paused (postmigrate)"
> dst: qemu quit automatically without any message.

Img is shared for src and dst by nfs

Comment 9 David Gibson 2018-01-09 05:10:00 UTC
Yilin or Xianxian,

I suspect the guest might be crashing, causing qemu to quit with a GUEST_PANICKED notification.

Can you check this my connecting a QMP socket to qemu on both source and dest, and issueing the capabilities command to initialize it before attempting the migration.

Comment 10 David Gibson 2018-01-09 05:14:09 UTC
Also, if I understand correctly the situation in comment 7 is different from comment 0 - it us using NFS shared images instead of NBD.  Is that right?

This suggests the problem is not actually related to NBD.

Comment 11 xianwang 2018-01-09 09:03:06 UTC
(In reply to David Gibson from comment #10)
> Also, if I understand correctly the situation in comment 7 is different from
> comment 0 - it us using NFS shared images instead of NBD.  Is that right?
> 
> This suggests the problem is not actually related to NBD.

Hi, David,
a)yes, I think so, this problem is not related to NBD, because I hit this problem with NFS not NBD.

b)I have re-test this scenario with latest released version and also hit this bug, there is some event prompting in qmp
version:
Host:
P8:
3.10.0-827.el7.ppc64le
qemu-kvm-rhev-2.10.0-16.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

P9:
4.14.0-20.el7a.ppc64le
qemu-kvm-rhev-2.10.0-16.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

qemu cli on p8:
# /usr/libexec/qemu-kvm -monitor stdio -machine pseries-.0 -nodefaults  -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/rhel75-ppc64le-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 -qmp tcp:0:8881,server,nowait -chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0

qemu cli on p9:
# /usr/libexec/qemu-kvm -monitor stdio -machine pseries-rhel7.5.0,max-cpu-compat=power8 -nodefaults -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 -incoming tcp:0:5801 -qmp tcp:0:8881,server,nowait -chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0

Do migration from p8->p9
(qemu) migrate -d tcp:10.19.19.13:5801

resut:
migration completed without error message
P8:
(qemu) info migrate
Migration status: completed
(qemu) info status 
VM status: paused (postmigrate)
QMP:
{"execute{"timestamp": {"seconds": 1515487838, "microseconds": 547447}, "event": "STOP"}
{"timestamp": {"seconds": 1515488019, "microseconds": 818220}, "event": "SHUTDOWN", "data": {"guest": false}}


P9:
qemu quit automatically.
QMP:
{"execute":"qmp_capabilities"}
{"return": {}}
{"timestamp": {"seconds": 1515487838, "microseconds": 654350}, "event": "RESUME"}
{"timestamp": {"seconds": 1515487844, "microseconds": 727193}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}
{"timestamp": {"seconds": 1515487844, "microseconds": 727278}, "event": "GUEST_PANICKED", "data": {"action": "poweroff"}}
{"timestamp": {"seconds": 1515487844, "microseconds": 727322}, "event": "SHUTDOWN", "data": {"guest": true}}

Comment 12 xianwang 2018-01-09 09:23:50 UTC
I have re-try several times with latest released version and hit this problem 100% when do migration from p8->p9, it blocks migration from p8->p9, so, I think it is serial enough to improve its priority.
Host:
P8:
3.10.0-827.el7.ppc64le
qemu-kvm-rhev-2.10.0-16.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

P9:
4.14.0-20.el7a.ppc64le
qemu-kvm-rhev-2.10.0-16.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch
# ppc64_cpu --smt
SMT is off
# cat /sys/module/kvm_hv/parameters/indep_threads_mode N

qemu cli on p8:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-rhel7.5.0 \
-nodefaults \
-vga std \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/rhel75-ppc64le-virtio-scsi.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=11 \
-netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 8G \
-smp 8 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :1 \
-qmp tcp:0:8881,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \

qemu cli on p9:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-rhel7.5.0,max-cpu-compat=power8 \
-nodefaults \
-vga std \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=11 \
-netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 8G \
-smp 8 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :1 \
-incoming tcp:0:5801 \
-qmp tcp:0:8881,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \

Do migration from p8->p9
(qemu) migrate -d tcp:10.19.19.13:5801

result is same with above(comment 11)

Comment 13 David Gibson 2018-01-10 06:07:35 UTC
Ok, so it's a guest crash triggering the problem.

Can we please get a log of the guest console - it's vty device.  A screenshot of the VGA after the crash might also be helpful.

If you're unable to get the log / screenshot because qemu is quitting, you should be able to prevent that by adding the -no-shutdown option.

Comment 14 xianwang 2018-01-10 08:46:26 UTC
(In reply to David Gibson from comment #13)
> Ok, so it's a guest crash triggering the problem.
> 
> Can we please get a log of the guest console - it's vty device.  A
> screenshot of the VGA after the crash might also be helpful.
> 
> If you're unable to get the log / screenshot because qemu is quitting, you
> should be able to prevent that by adding the -no-shutdown option.

Hi, David,
I re-try as same as comment 11 scenario, the console output log is as following:
console output of p9(des):

# nc -U /tmp/console0 

[  126.874418] Oops: Exception in kernel mode, sig: 4 [#1]
[  126.874622] SMP NR_CPUS=2048 NUMA pSeries
[  126.874814] Modules linked in: ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sg ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common scsi_transport_iscsi virtio_scsi virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
[  126.876702] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-827.el7.ppc64le #1
[  126.876831] task: c0000000011fdb20 ti: c000000001270000 task.ti: c000000001270000
[  126.876960] NIP: c00000000005acec LR: c000000000019e20 CTR: 0000000000000000
[  126.877088] REGS: c000000001273ab0 TRAP: 0700   Not tainted  (3.10.0-827.el7.ppc64le)
[  126.877215] MSR: 8000000100081033 <SF,ME,IR,DR,RI,LE>  CR: 28002048  XER: 00000000
[  126.877535] CFAR: c000000000019e1c SOFTE: 0 
GPR00: 0000000100000000 c000000001273d30 c000000001273f00 c0000000011fe030 
GPR04: 8000000100001033 0000000000000008 000000000000003f 0000000000000000 
GPR08: 0000000000000001 c000000001273ea0 0000000000000000 c00000000fb80004 
GPR12: 00003fffa8130000 c00000000fb80000 0000000000000000 c000000000dce800 
GPR16: c000000000dce800 c000000000a0b4b4 c000000000dce800 c000000000dce800 
GPR20: 0000000000000001 c000000000dce800 0000000000000000 0000000000000004 
GPR24: c000000000dce800 c00000001ed64780 c000000001270000 c0000000012e8190 
GPR28: c0000000011fe030 c0000000037dd640 c0000000011fdb20 c0000000011fdb20 
[  126.879381] NIP [c00000000005acec] tm_save_sprs+0x0/0x1c
[  126.879476] LR [c000000000019e20] __switch_to+0x140/0x470
[  126.879569] Call Trace:
[  126.879671] [c000000001273d30] [c000000001273d90] init_thread_union+0x3d90/0x3f00 (unreliable)
[  126.879892] [c000000001273d90] [c000000000a098a8] __schedule+0x438/0xa30
[  126.880034] [c000000001273e60] [c000000000a0b4b4] schedule_preempt_disabled+0x34/0xb0
[  126.880214] [c000000001273e80] [c000000000171824] cpu_startup_entry+0x134/0x1e0
[  126.880376] [c000000001273ee0] [c00000000000caec] rest_init+0x9c/0xb0
[  126.880552] [c000000001273f00] [c000000000d23e74] start_kernel+0x4cc/0x4e8
[  126.880690] [c000000001273f90] [c000000000009b6c] start_here_common+0x20/0xa8
[  126.880826] Instruction dump:
[  126.880898] 60420000 39200005 7d234b78 4e800020 7c8000a6 38600001 786307c6 7c801839 
[  126.881135] 4082000c 7c841b78 7c800164 4e800020 <7c0022a6> f80304a8 7c0222a6 f80304b0 
[  126.881376] ---[ end trace c6d75c4b75b03c34 ]---
[  126.884522] 
[  128.884638] Kernel panic - not syncing: Fatal exception

Comment 15 Laurent Vivier 2018-01-10 14:19:40 UTC
(In reply to xianwang from comment #14)
...
> [  126.879381] NIP [c00000000005acec] tm_save_sprs+0x0/0x1c

The problem is related to the TM hardware bug.

The Transactional Memory must be disabled on the P8 side to allow to migrate to P9 host.

Comment 16 Laurent Vivier 2018-01-10 14:42:54 UTC
TM problem is tracked by BZ 1523414

The upstream patch series to fix it is:

spapr: Capabilities infrastructure
       http://patchwork.ozlabs.org/patch/854868/
spapr: Treat Hardware Transactional Memory (HTM) as an optional capability
       http://patchwork.ozlabs.org/patch/854865/
spapr: Validate capabilities on migration
       http://patchwork.ozlabs.org/patch/854864/

Once this series applied, guest to be started to a P9 host from a P8 host must be started with:

    ... -machine cap-htm=false ...

Comment 21 xianwang 2018-01-12 07:42:02 UTC
Hi, Laurent,
I have tried with "-smp 1", other information is same with comment19, after migration, vm is running on destination host and after a while vm will hang with error message prompting on console,hmp and qmp:

# nc -U /tmp/console0 
Message from syslogd@localhost at Jan 12 01:41:38 ...
 kernel:kvmppc_emulate_mmio: emulation failed (7c6020ce)

(qemu) info status 
VM status: paused (internal-error)

Comment 22 Laurent Vivier 2018-01-12 10:37:33 UTC
Hi Xianxian,

(In reply to xianwang from comment #21)
> Hi, Laurent,
> I have tried with "-smp 1", other information is same with comment19, after
> migration, vm is running on destination host and after a while vm will hang
> with error message prompting on console,hmp and qmp:
> 
> # nc -U /tmp/console0 
> Message from syslogd@localhost at Jan 12 01:41:38 ...
>  kernel:kvmppc_emulate_mmio: emulation failed (7c6020ce)
> 
> (qemu) info status 
> VM status: paused (internal-error)

This is another bug, please open a new BZ to track it.

This one is related to HTM and I think we can set as duplicate of BZ 1523414.

Comment 23 Yongxue Hong 2018-01-15 02:26:42 UTC
It is reproduced with 4.14.0-23.el7a.ppc64le and qemu-kvm-rhev-2.10.0-16.el7.ppc64le on beaker host(ibm-p9z-09.pnr.lab.eng.bos.redhat.com; ibm-p9b-07.pnr.lab.eng.bos.redhat.com)

Comment 24 Laurent Vivier 2018-01-23 13:10:53 UTC
Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides

This release should disable HTM by default.

Comment 25 xianwang 2018-01-24 10:42:04 UTC
(In reply to Laurent Vivier from comment #24)
> Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides
> 
> This release should disable HTM by default.

Hi, Laurent, 
I re test the scenario with the newest version as #comment7 and I didn't hit this issue any more
version:
Host p8:
3.10.0-837.el7.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
SLOF-20170724-5.git89f519f.el8.ppc64le

Host p9:
4.14.0-29.el7a.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch
# ppc64_cpu --smt=off
# echo N > /sys/module/kvm_hv/parameters/indep_threads_mode

Guest:
3.10.0-837.el7.ppc64le

the qemu cli is as following:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-rhel7.5.0,max-cpu-compat=power8 \
-nodefaults \
-vga std \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=11 \
-netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 8G \
-smp 8 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :1 \
-incoming tcp:0:5801 \
-qmp tcp:0:8881,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \

Do migration from p9->p8

result:
migration completed and vm works well.

Comment 26 Laurent Vivier 2018-01-24 12:35:01 UTC
(In reply to xianwang from comment #25)
> (In reply to Laurent Vivier from comment #24)
> > Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides
> > 
> > This release should disable HTM by default.
...
> Do migration from p9->p8

The HTM problem is with p8 -> p9.

Could you re-test?

Thanks.

Comment 27 xianwang 2018-01-25 03:33:28 UTC
(In reply to xianwang from comment #25)
> (In reply to Laurent Vivier from comment #24)
> > Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides
> > 
> > This release should disable HTM by default.
> 
> Hi, Laurent, 
> I re test the scenario with the newest version as #comment7 and I didn't hit
> this issue any more
> version:
> Host p8:
> 3.10.0-837.el7.ppc64le
> qemu-kvm-rhev-2.10.0-18.el7.ppc64le
> SLOF-20170724-5.git89f519f.el8.ppc64le
> 
> Host p9:
> 4.14.0-29.el7a.ppc64le
> qemu-kvm-rhev-2.10.0-18.el7.ppc64le
> SLOF-20170724-2.git89f519f.el7.noarch
> # ppc64_cpu --smt=off
> # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode
> 
> Guest:
> 3.10.0-837.el7.ppc64le
> 
> the qemu cli is as following:
> /usr/libexec/qemu-kvm \
> -name 'avocado-vt-vm1' \
> -sandbox off \
> -machine pseries-rhel7.5.0,max-cpu-compat=power8 \
> -nodefaults \
> -vga std \
> -chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
> -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
> -device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
> -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \
> -drive
> id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,
> file=/home/xianwang/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 \
> -device
> scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-
> id=0,lun=0,bootindex=0 \
> -device
> virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,
> bus=pci.0,addr=11 \
> -netdev
> tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
> -m 8G \
> -smp 8 \
> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
> -device usb-mouse,id=input1,bus=usb1.0,port=2 \
> -device usb-kbd,id=input2,bus=usb1.0,port=3 \
> -vnc :1 \
> -incoming tcp:0:5801 \
> -qmp tcp:0:8881,server,nowait \
> -monitor stdio \
> -rtc base=utc,clock=host \
> -boot order=cdn,once=c,menu=on,strict=on \
> -enable-kvm \
> 
> Do migration from p9->p8
> 
Sorry, I have a mistake here, it should be "Do migration from p8->p9"
> result:
> migration completed and vm works well.

Comment 28 xianwang 2018-01-25 03:36:12 UTC
(In reply to Laurent Vivier from comment #26)
> (In reply to xianwang from comment #25)
> > (In reply to Laurent Vivier from comment #24)
> > > Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides
> > > 
> > > This release should disable HTM by default.
> ...
> > Do migration from p9->p8
> 
> The HTM problem is with p8 -> p9.
> 
> Could you re-test?
> 
> Thanks.

yes, I have tried p8->p9, result is pass just as #comment27

Comment 29 David Gibson 2018-01-28 22:47:55 UTC
Excellent, looks like this is handled by the fix for bug 1523414, as expected.

*** This bug has been marked as a duplicate of bug 1523414 ***

Comment 30 yilzhang 2018-01-29 09:33:24 UTC
I'm unable to reproduce this bug using the test scenario in Comment #0. I ran the case in Comment0 for twenty times, did not hit this bug.

Host kernel: 4.14.0-29.el7a.ppc64le
qemu-kvm:    qemu-kvm-rhev-2.10.0-18.el7

Comment 31 Minjia Cai 2018-02-05 04:10:52 UTC
I Hit this error with the following version on Power9 host, but guest is Power8-compat mode,It happened during guest installation for ping-pong migration:

Host:
kernel-4.14.0-33.el7a.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le

Guest:
RHEL-7.5-20180125.0-Server-ppc64le-dvd1.iso 
kernel-3.10.0-837.el7.ppc64le


QEMU command line:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-rhel7.5.0,max-cpu-compat=power8 \
-nodefaults \
-vga std \
-device spapr-pci-host-bridge,index=1   \
-device virtio-scsi-pci,bus=pci.1,id=scsi0,addr=0x3 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x05 \
-device pci-ohci,id=usb3,bus=pci.0,addr=0x06 \
-device pci-bridge,id=pci_bridge_1,bus=pci.0,addr=0xc,chassis_nr=1 \
-device pci-bridge,id=pci_bridge_2,bus=pci.0,addr=0xd,chassis_nr=2 \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci_bridge_1,iothread=iothread0,addr=0x07 \
-device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pci.0,addr=0x08 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/micai/p9.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/micai/RHEL-7.5-20180125.0-Server-ppc64le-dvd1.iso  \
-device scsi-cd,id=cd1,drive=drive_cd1,bus=virtio_scsi_pci1.0,channel=0,scsi-id=0,lun=0,bootindex=1 \
-device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,netdev=idjlQN53,vectors=10,mq=on,status=on,bus=pci.0,addr=0xa \
-netdev tap,id=idjlQN53,vhost=on,queues=4,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 4G,slots=4,maxmem=32G \
-incoming tcp:0:5800 \
-smp 8,cores=4,threads=1,sockets=2 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :2 \
-qmp tcp:0:8881,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \
-watchdog i6300esb \
-watchdog-action reset \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xb \

steps:
 1.# ppc64_cpu --smt=off
   # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode
 2.Start guest at src, enter the installation interface,
 3.start guest at des.
 4.During the installation, the ping-pong migration is performed.

the result:
[root@c155f3-u23 ~]# telnet 127.0.0.1 8881
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": "(qemu-kvm-rhev-2.10.0-18.el7)"}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"timestamp": {"seconds": 1517731476, "microseconds": 150918}, "event": "VNC_CONNECTED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5902", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "35566", "host": "10.0.0.1", "websocket": false}}}
{"timestamp": {"seconds": 1517731477, "microseconds": 289988}, "event": "VNC_INITIALIZED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5902", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "35566", "host": "10.0.0.1", "websocket": false}}}
{"timestamp": {"seconds": 1517731694, "microseconds": 757303}, "event": "RTC_CHANGE", "data": {"offset": -1}}
{"timestamp": {"seconds": 1517731701, "microseconds": 718556}, "event": "STOP"}
{"timestamp": {"seconds": 1517731733, "microseconds": 798718}, "event": "SHUTDOWN", "data": {"guest": false}}
Connection closed by foreign host.


[root@localhost micai]# sh xx.sh
QEMU 2.10.0 monitor - type 'help' for more information
[root@localhost micai]#

Comment 32 Laurent Vivier 2018-02-12 13:34:17 UTC
Ping-pong migration between P9 and P8 hosts with a P8 guest cannot be performed because of a bug in POWER9 hardware < DD2.2

See BZ 1536009 comment 20


Note You need to log in before you can comment on or make changes to this bug.