Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1529243

Summary:	Migration from P9 to P8, migration failed and qemu quit on dst end with "error while loading state for instance 0x0 of device 'ics'"
Product:	Red Hat Enterprise Linux 7	Reporter:	xianwang <xianwang>
Component:	qemu-kvm-rhev	Assignee:	David Gibson <dgibson>
Status:	CLOSED ERRATA	QA Contact:	xianwang <xianwang>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.5	CC:	abologna, ailan, bugproxy, dgibson, dgilbert, dzheng, hannsj_uhl, joserz, lmiksik, lvivier, michen, mrezanin, qzhang, sbobroff, virt-maint, xianwang
Target Milestone:	rc	Keywords:	Patch
Target Release:	7.5
Hardware:	ppc64le
OS:	Linux
Whiteboard:
Fixed In Version:	qemu-kvm-rhev-2.10.0-18.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-11 00:55:27 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1524884
Bug Blocks:	1399177, 1476742, 1478484, 1525303, 1527213

Description xianwang 2017-12-27 09:18:35 UTC

Description of problem:
Migration from P9 to P8, migration failed and qemu quit on dst end with "error while loading state for instance 0x0 of device 'ics'", on src host,"Migration status: completed" and "VM status: paused (postmigrate)" 

Version-Release number of selected component (if applicable):
Host:
P8:
3.10.0-823.el7.ppc64le
qemu-kvm-rhev-2.10.0-13.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

P9:
4.14.0-18.el7a.XIVE_fixes.ppc64le
qemu-kvm-rhev-2.10.0-12.el7.BZ1525866.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch
# ppc64_cpu --smt=off
# echo N > /sys/module/kvm_hv/parameters/indep_threads_mode

Guest:
3.10.0-823.el7.ppc64le

How reproducible:
3/3

Steps to Reproduce:
1.Boot a guest on p9 
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-rhel7.5.0,max-cpu-compat=power8 \
-nodefaults \
-vga std \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/mount_point/RHEL.7.5LE.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=11 \
-netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 8G \
-smp 8 \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/xianwang/mount_point/RHEL-7.5-20171215.0-Server-ppc64le-dvd1.iso \
-device scsi-cd,id=cd1,drive=drive_cd1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=1,bootindex=1 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :1 \
-qmp tcp:0:8881,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \

2.launch listening mode on p8
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-rhel7.5.0 \
-nodefaults \
-vga std \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/RHEL.7.5LE.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=11 \
-netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 8G \
-smp 8 \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/RHEL-7.5-20171215.0-Server-ppc64le-dvd1.iso \
-device scsi-cd,id=cd1,drive=drive_cd1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=1,bootindex=1 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :1 \
-incoming tcp:0:5801 \
-qmp tcp:0:8881,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \

3.on p9, do migration
(qemu) migrate -d tcp:10.16.69.75:5801

Actual results:
src(p9):
(qemu) info status 
VM status: paused (postmigrate)
(qemu) info migrate
globals: store-global-state=1, only_migratable=0, send-configuration=1, send-section-footer=1
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off 
Migration status: completed

dst(p8):
(qemu) qemu-kvm: Unable to restore KVM interrupt controller state for IRQs 4097: Invalid argument
qemu-kvm: error while loading state for instance 0x0 of device 'ics'
qemu-kvm: load of migration failed: Operation not permitted

Expected results:
migration success and vm works well on dst host.

Additional info:

Comment 4 David Gibson 2018-01-02 23:26:25 UTC

Right, I think this is almost certainly a dupe of one of the other migration bugs.

I'll let Laurent sort it out since he's more familiar with them.

Comment 5 David Gibson 2018-01-04 01:03:51 UTC

Looks like a dupe of bug 1524884.

Xianxian, can you retry with the fix from that bug?

Comment 6 Qunfang Zhang 2018-01-04 03:06:27 UTC

(In reply to David Gibson from comment #5)
> Looks like a dupe of bug 1524884.
> 
> Xianxian, can you retry with the fix from that bug?

Hi, David

Xianxian is on PTO this week. And we discussed this bug last week, she should already used the build with fix for 1524884 (4.14.0-18.el7a.XIVE_fixes.ppc64le). Please check comment 0:

Version-Release number of selected component (if applicable):
....

P9:
4.14.0-18.el7a.XIVE_fixes.ppc64le
qemu-kvm-rhev-2.10.0-12.el7.BZ1525866.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

Comment 7 xianwang 2018-01-08 10:04:19 UTC

I have re-test this scenario with latest version and can hit this issue yet, while bug 1525866 is already verified pass.
version:
Host:
P9:
4.14.0-20.el7a.ppc64le
qemu-kvm-rhev-2.10.0-16.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch
# ppc64_cpu --smt
SMT is off
# cat /sys/module/kvm_hv/parameters/indep_threads_mode
N

P8:
3.10.0-827.el7.ppc64le
qemu-kvm-rhev-2.10.0-16.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

steps are same with bug report.
result:
migration failed and vm running on src host
p9(src)
(qemu) info migrate
globals: store-global-state=1, only_migratable=0, send-configuration=1, send-section-footer=1
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off 
Migration status: failed
(qemu) info status 
VM status: running

p8:(dst)
(qemu) qemu-kvm: Unable to restore KVM interrupt controller state for IRQs 4097: Invalid argument
qemu-kvm: error while loading state for instance 0x0 of device 'ics'
qemu-kvm: load of migration failed: Operation not permitted

Comment 8 xianwang 2018-01-08 10:05:39 UTC

(In reply to xianwang from comment #7)
> I have re-test this scenario with latest version and can hit this issue yet,
> while bug 1525866 is already verified pass.
> version:
> Host:
> P9:
> 4.14.0-20.el7a.ppc64le
> qemu-kvm-rhev-2.10.0-16.el7.ppc64le
> SLOF-20170724-2.git89f519f.el7.noarch
> # ppc64_cpu --smt
> SMT is off
> # cat /sys/module/kvm_hv/parameters/indep_threads_mode
> N
> 
> P8:
> 3.10.0-827.el7.ppc64le
> qemu-kvm-rhev-2.10.0-16.el7.ppc64le
> SLOF-20170724-2.git89f519f.el7.noarch
> 
> steps are same with bug report.
> result:
> migration failed and vm running on src host
> p9(src)
> (qemu) info migrate
> globals: store-global-state=1, only_migratable=0, send-configuration=1,
> send-section-footer=1
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks:
> off compress: off events: off postcopy-ram: off x-colo: off release-ram: off
> return-path: off pause-before-switchover: off 
> Migration status: failed
> (qemu) info status 
> VM status: running
> 
> p8:(dst)
> (qemu) qemu-kvm: Unable to restore KVM interrupt controller state for IRQs
> 4097: Invalid argument
> qemu-kvm: error while loading state for instance 0x0 of device 'ics'
> qemu-kvm: load of migration failed: Operation not permitted

Additional:
guest:
3.10.0-827.el7.ppc64le

Comment 9 Laurent Vivier 2018-01-08 10:19:58 UTC

(In reply to David Gibson from comment #5)
> Looks like a dupe of bug 1524884.

BZ1524884 fixes bug with this error message:

  Unable to restore KVM interrupt controller state (0xff000000) for CPU 0

Here we have:

  Unable to restore KVM interrupt controller state for IRQs 4097

The first message is in icp_set_kvm_state() when a call to kvm_vcpu_ioctl(... 
KVM_SET_ONE_REG ...) fails and this is this call the kernel patch (now in kernel-alt-4.14.0-20.el7a) fixes.

The second message is in ics_set_kvm_state() when a call to ioctl(... KVM_SET_DEVICE_ATTR ...) fails. It seems we need another kernel fix.

Comment 10 Laurent Vivier 2018-01-08 15:22:06 UTC

I've no P9 host available for now, but I'm sure I have tested this in the past with no problem.

So the question is: what is the state of the source guest when you start the migration?

Is it:
- stopped before SLOF starts,
- executing SLOF or GRUB bootloader,
- booting kernel
- or running a fully booted OS?

Comment 11 xianwang 2018-01-09 03:14:18 UTC

(In reply to Laurent Vivier from comment #10)
> I've no P9 host available for now, but I'm sure I have tested this in the
> past with no problem.
> 
> So the question is: what is the state of the source guest when you start the
> migration?
> 
> Is it:
> - stopped before SLOF starts,
guest don't stopped before SLOF starts, it boot up successfully and run well on src host before migration.
> - executing SLOF or GRUB bootloader,
yes, executing SLOF and GRUB normally
> - booting kernel
yes, booting kernel normally
> - or running a fully booted OS?
yes, running a fully booted os, the following is the console output before migration, this time, the result is same with bug report:

# nc -U /tmp/console0 


Red Hat Enterprise Linux Server 7.5 Beta (Maipo)
Kernel 3.10.0-827.el7.ppc64le on an ppc64le

dhcp19-207 login: 

SLOF **********************************************************************
QEMU Starting
 Build Date = Oct  9 2017 02:32:33
 FW Version = mockbuild@ release 20170724
 Press "s" to enter Open Firmware.

Press F12 for boot menu.

Populating /vdevice methods
Populating /vdevice/vty@30000000
Populating /vdevice/nvram@71000000
Populating /pci@800000020000000
                     00 0000 (D) : 1234 1111    qemu vga
                     00 1800 (D) : 1af4 1004    virtio [ scsi ]
Populating /pci@800000020000000/scsi@3
       SCSI: Looking for devices
          100000000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
                     00 3000 (D) : 1033 0194    serial bus [ usb-xhci ]
                     00 8800 (D) : 1af4 1000    virtio [ net ]
Installing QEMU fb



Scanning USB 
  XHCI: Initializing
    USB mouse 
    USB Keyboard 
No console specified using screen & keyboard
     
  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /pci@800000020000000/scsi@3/disk@100000000000000 ...   Successfully loaded
CF000012
CF000015ch
Linux ppc64le
#1 SMP Fri Jan 5
Red Hat Enterprise Linux Server 7.5 Beta (Maipo)
Kernel 3.10.0-827.el7.ppc64le on an ppc64le

dhcp19-207 login:

Comment 12 Laurent Vivier 2018-01-09 09:35:47 UTC

I've dumped the migration stream on a P9 host (I have no P8 host in the same network to really do the migration) with analyze-migration.py script.

We have an 1024 IRQs array, and according to the error message, we can guess the ICS offset should be 4096. So the breaking IRQ should be the second in the array:

    "ics (4)": {
        "nr_irqs": "0x00000400",
        "irqs": [
            {
                "server": "0x00000000",
                "priority": "0x05",
                "saved_priority": "0x05",
                "status": "0x00",
                "flags": "0x02"
            },
            {
                "server": "0x00000002",
                "priority": "0x05",
                "saved_priority": "0x05",
                "status": "0x00",
                "flags": "0x02"
            },
...

ics_set_kvm_state() (comment 9) calls the kernel function kvm_device_ioctl() and then xics_set_attr() and xics_set_source(). The only possible exit reason in our case in this function is:
...
       if (prio != MASKED &&
            kvmppc_xics_find_server(xics->kvm, server) == NULL)
                return -EINVAL;
...

and kvmppc_xics_find_server() fails if it doesn't find any CPU with the given server number (in our case 2).

Comment 13 Laurent Vivier 2018-01-09 14:17:47 UTC

I think I didn't see the problem before because I only tested with "-smp 1".

I compared the server ids between P8 (real XICS) and P9 (XICS emulation)

P8: 0x00, 0x08, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38
P9: 0X00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0X07

Comment 14 Laurent Vivier 2018-01-09 15:08:05 UTC

The server numbers are set by QEMU, in the function icp_kvm_realize() using kvm_vcpu_enable_cap( KVM_CAP_IRQ_XICS ) and the vcpu_id as the server id.

On P9 vcpu ids are 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
on P8 vcpu ids are 0x00, 0x08, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38.

The vcpu ids is computed by spapr_cpu_core_realize() (in origin/master):
...
    for (i = 0; i < cc->nr_threads; i++) {
...
        cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i;
...

On P8, spapr->vsmt is 8 whereas on P9 it is 1.

So the default value for vmst on P9 is not the same as for P8.

Comment 15 Laurent Vivier 2018-01-09 15:10:00 UTC

Xianxian,

could you try to start your guest on P9 host with the following parameter and retry a migration:

  ... -machine pseries-rhel7.5.0,max-cpu-compat=power8,vsmt=8 ...

Thanks

Comment 16 Laurent Vivier 2018-01-09 16:16:11 UTC

The default value for vsmt comes from kvm:

    kvm_vm_check_extension(s, KVM_CAP_PPC_SMT)

and differs because of:

arch/powerpc/kvm/powerpc.c

    491 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
...
    567                         if (cpu_has_feature(CPU_FTR_ARCH_300))
    568                                 r = 1;
    569                         else
    570                                 r = threads_per_subcore;
...

This restriction comes from:

commit 45c940ba490df28cb87b993981a5f63df6bbb8db
Author: Paul Mackerras <paulus>
Date:   Fri Nov 18 17:43:30 2016 +1100

    KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcores
    
    With POWER9, each CPU thread has its own MMU context and can be
    in the host or a guest independently of the other threads; there is
    still however a restriction that all threads must use the same type
    of address translation, either radix tree or hashed page table (HPT).
    
    Since we only support HPT guests on a HPT host at this point, we
    can treat the threads as being independent, and avoid all of the
    work of coordinating the CPU threads.  To make this simpler, we
    introduce a new threads_per_vcore() function that returns 1 on
    POWER9 and threads_per_subcore on POWER7/8, and use that instead
    of threads_per_subcore or threads_per_core in various places.
    
    This also changes the value of the KVM_CAP_PPC_SMT capability on
    POWER9 systems from 4 to 1, so that userspace will not try to
    create VMs with multiple vcpus per vcore.  (If userspace did create
    a VM that thought it was in an SMT mode, the VM might try to use
    the msgsndp instruction, which will not work as expected.  In
    future it may be possible to trap and emulate msgsndp in order to
    allow VMs to think they are in an SMT mode, if only for the purpose
    of allowing migration from POWER8 systems.)
    
    With all this, we can now run guests on POWER9 as long as the host
    is running with HPT translation.  Since userspace currently has no
    way to request radix tree translation for the guest, the guest has
    no choice but to use HPT translation.

Comment 17 Laurent Vivier 2018-01-09 16:32:59 UTC

I think the only way to be sure VSMT is the same on both guest (src and dst) as it relies on the real hardware is to provide it on the command line if the hardware is not the same on both host (P8 <-> P9).

Andrea, does libvirt provide the vsmt paramater to QEMU (see comment 15)?

Comment 18 xianwang 2018-01-10 08:23:27 UTC

(In reply to Laurent Vivier from comment #15)
> Xianxian,
> 
> could you try to start your guest on P9 host with the following parameter
> and retry a migration:
> 
>   ... -machine pseries-rhel7.5.0,max-cpu-compat=power8,vsmt=8 ...
> 
> Thanks

I have tried with "-machine pseries-rhel7.5.0,max-cpu-compat=power8,vsmt=8", after migration completed, the vm reboot automatically with vm core file generated.

version and steps are same with comment 8.

after migration from p9->p8:
p9:
(qemu) info status 
VM status: paused (postmigrate)
(qemu) info migrate
Migration status: completed

p8:
(qemu) info status 
VM status: running
guest reboot automatically with core file generated as below:
[  202.901074] BUG: Bad rss-counter state mm:c0000001e6f5a500 idx:0 val:109
[  202.901086] BUG: Bad rss-counter state mm:c0000001e6f5a500 idx:1 val:74
[  202.905862] Oops: Exception in kernel mode, sig: 4 [#1]
[  202.905866] SMP NR_CPUS=2048 NUMA pSeries
[  202.905887] Modules linked in: fuse ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sg ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common scsi_transport_iscsi virtio_net virtio_scsi bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_pci virtio_ring i2c_core virtio dm_mirror dm_region_hash dm_log dm_mod
[  202.906027] Unable to handle kernel paging request for data at address 0xeba1ffe8e8010010
[  202.906031] Faulting instruction address: 0xc000000000519d94
[  202.906034] CPU: 7 PID: 2819 Comm: gsd-sharing Not tainted 3.10.0-827.el7.ppc64le #1
[  202.906037] task: c0000001eb4ec060 ti: c0000000050e4000 task.ti: c0000000050e4000
[  202.906038] NIP: c0000000001251f8 LR: c0000000001251f0 CTR: c000000000359c40
[  202.906040] REGS: c0000000050e7690 TRAP: 0700   Not tainted  (3.10.0-827.el7.ppc64le)
[  202.906041] MSR: 8000000000089033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28002844  XER: 00000000
[  202.906058] CFAR: c000000000a0d25c SOFTE: 1
GPR00: c0000000001251f0 c0000000050e7910 c000000001273f00 0000000000000001
GPR04: 0000000000000001 c0000001eb199110 c0000000012ca5a8 0000000000000000
GPR08: c000000000359e90 0000000000000000 0000000000000000 c000000000a30e40
GPR12: 0000000000002200 c000000003383f00 c0000000050e7c2c 0000000000000001
GPR16: c0000001f094ad00 0000000000000000 0000000000000000 0000000225c17d03
GPR20: 0000000000000000 0000000000000000 0000000000000000 00000100308987e0
GPR24: 0000000000000020 0000000000000000 c0000000050e7c20 0000000000000000
GPR28: 0000000000000000 0000000000000001 c0000000050e79f0 c0000001eb199108
[  202.906107] NIP [c0000000001251f8] add_wait_queue+0x68/0x80
[  202.906109] LR [c0000000001251f0] add_wait_queue+0x60/0x80
[  202.906110] Call Trace:
[  202.906118] [c0000000050e7910] [c000000000359cec] __pollwait+0xac/0x150 (unreliable)
[  202.906121] [c0000000050e7960] [c0000000003bcb28] eventfd_poll+0x48/0xb0
[  202.906123] [c0000000050e7990] [c00000000035b7b0] do_sys_poll+0x330/0x720
[  202.906125] [c0000000050e7dd0] [c00000000035bcc0] SyS_poll+0xa0/0x1d0
[  202.906127] [c0000000050e7e30] [c00000000000a184] system_call+0x38/0xb4
[  202.906128] Instruction dump:
[  202.906129] 60000000 e8bf0008 389f0008 7c7d1b78 387e0018 4841ad4d 60000000 7fe3fb78
[  202.906131] 7fa4eb78 488e802d 60000000 38210040 <f278f8d0> c0000001 ebc1fff0 ebe1fff8
[  202.906136] ---[ end trace b129aad70814cd7a ]---

Comment 19 Andrea Bolognani 2018-01-10 10:13:32 UTC

(In reply to Laurent Vivier from comment #17)
> I think the only way to be sure VSMT is the same on both guest (src and dst)
> as it relies on the real hardware is to provide it on the command line if
> the hardware is not the same on both host (P8 <-> P9).
> 
> Andrea, does libvirt provide the vsmt paramater to QEMU (see comment 15)?

It doesn't. I was not even aware of its existence up until now.

What does the parameter do? What values should be used?

Comment 20 Laurent Vivier 2018-01-10 11:40:05 UTC

VSMT stands for Virtual Simultaneous Multi-Threading.

POWER9 starts by default with 1, POWER8 with the value of the host (2, 4 or 8). To be able to migrate between host we must have the same value.

A patch has been added to set it on POWER9 (since 2.11):

commit fa98fbfcdfcb980b4a690b8bc93ab597935087b1
Author: Sam Bobroff <sam.bobroff.com>
Date:   Fri Aug 18 15:50:22 2017 +1000

    PPC: KVM: Support machine option to set VSMT mode
    
    KVM now allows writing to KVM_CAP_PPC_SMT which has previously been
    read only. Doing so causes KVM to act, for that VM, as if the host's
    SMT mode was the given value. This is particularly important on Power
    9 systems because their default value is 1, but they are able to
    support values up to 8.
    
    This patch introduces a way to control this capability via a new
    machine property called VSMT ("Virtual SMT"). If the value is not set
    on the command line a default is chosen that is, when possible,
    compatible with legacy systems.
    
    Note that the intialization of KVM_CAP_PPC_SMT has changed slightly
    because it has changed (in KVM) from a global capability to a
    VM-specific one. This won't cause a problem on older KVMs because VM
    capabilities fall back to global ones.

Sam, could you answer Andrea's question: what values should be used?

Comment 21 Jose Ricardo Ziviani 2018-01-11 01:56:04 UTC

Hello Laurent!

My patch "Check SMT based on KVM_CAP_PPC_SMT_POSSIBLE" is under review on qemu-ppc and might have some related with this issue.

Actually KVM supports KVM_CAP_PPC_SMT_POSSIBLE which returns the SMT mode supported by the underlying hardware. So, I think that we should get the value from that capability. In Power9, due to VMST, that returns 15 because it supports all existing modes (1 | 2 | 4 | 8).

Please, take a look on that review, your inputs will be very helpful for sure.

Thank you,

Jose Ricardo Ziviani

Comment 22 David Gibson 2018-01-11 04:34:46 UTC

Andrea,

To elaborate, vsmt is basically the spacing between the vcpu IDs of the first thread for each core.  Obviously it needs to be >= # of virtual threads per virtual core.  Because of some poor decisions I made ~6 years ago for POWER8 - at least with older host kernels - it has to be larger than the minimum if the guest has less threads per core than the host.

The vcpu ids are exposed to guests in several places (parameters to a number of hypercalls).  Well, technically it's the DT ids exposed to the guest, but those are currently equal to the vcpu ids and because of hypercalls handled in the host kernel without qemu intervention, they can't be made different without an awful lot of work.

That means we need to preserve the vcpu ids - and therefore their spacing.

The spacing used to depend on the host capabilities, so we were completely stuffed - vsmt was introduced as a parameter to address that.

The defaults for VSMT were designed to try to make as many existing cases keep working without intervention as we could manage.  I need to have a look and see if we can fix them somehow to mean libvirt won't need extra knowledge (at least in supported cases) or if we're going to have to teach libvirt about it.

Comment 23 Laurent Vivier 2018-01-11 09:52:31 UTC

(In reply to David Gibson from comment #22)
...
> The defaults for VSMT were designed to try to make as many existing cases
> keep working without intervention as we could manage.  I need to have a look
> and see if we can fix them somehow to mean libvirt won't need extra
> knowledge (at least in supported cases) or if we're going to have to teach
> libvirt about it.

For this BZ, what we need is to insure to have the same value on the P9 host and on the P8 host we want to migrate to. So I don't think libvirt (or qemu) can guess it (can it?).

So as we ask to use "max-cpu-compat=power8,caps-smt=off", we should also ask to use "vsmt" set to a value compatible with the destination POWER8 host.

For this I think we need two things:

- add information in the documentation
  (how to know the good value? on P8, "ppc64_cpu --threads-per-core"?)

- allows the user to provide this information with libvirt

Comment 24 Laurent Vivier 2018-01-11 11:17:20 UTC

(In reply to Jose Ricardo Ziviani from comment #21)
> Hello Laurent!
> 
> My patch "Check SMT based on KVM_CAP_PPC_SMT_POSSIBLE" is under review on
> qemu-ppc and might have some related with this issue.
> 
> Actually KVM supports KVM_CAP_PPC_SMT_POSSIBLE which returns the SMT mode
> supported by the underlying hardware. So, I think that we should get the
> value from that capability. In Power9, due to VMST, that returns 15 because
> it supports all existing modes (1 | 2 | 4 | 8).
> 
> Please, take a look on that review, your inputs will be very helpful for
> sure.

I've tested your patch in my case.

Used with "-smp threads=XX" on both side, I think it can solve the problem.

In the test I have done (threads=8), it works: vsmt is set to 8, smp_threads is set 8, so the "space" between vpcu_id is 1 (because the formula is (in spapr_cpu_core_realize()): cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i).

But another example doesn't work,

a P9 with your patch and threads=4
a P8 with threads=4
with "-smp 8,sockets=1,cores=2,threads=4" I have vpcu_id:

- on P9: 0x00, 0x01, 0x02, 0x03, ->  0x04, 0x05, 0x06, 0x07
- on P8: 0x00, 0x01, 0x02, 0x03, ->  0x08, 0x09, 0x0a, 0x0b

that is not good for the migration.

This because VSMT is set by default to 8 on the P8, and to 4 (smp_threads) on the P9. It should be set to the max, in fact (supposing the max is the same on the P8 host).

Comment 25 Andrea Bolognani 2018-01-11 13:11:39 UTC

(In reply to David Gibson from comment #22)
> Andrea,
> 
> To elaborate, vsmt is basically the spacing between the vcpu IDs of the
> first thread for each core.  Obviously it needs to be >= # of virtual
> threads per virtual core.  Because of some poor decisions I made ~6 years
> ago for POWER8 - at least with older host kernels - it has to be larger than
> the minimum if the guest has less threads per core than the host.
> 
> The vcpu ids are exposed to guests in several places (parameters to a number
> of hypercalls).  Well, technically it's the DT ids exposed to the guest, but
> those are currently equal to the vcpu ids and because of hypercalls handled
> in the host kernel without qemu intervention, they can't be made different
> without an awful lot of work.
> 
> That means we need to preserve the vcpu ids - and therefore their spacing.
> 
> The spacing used to depend on the host capabilities, so we were completely
> stuffed - vsmt was introduced as a parameter to address that.
> 
> The defaults for VSMT were designed to try to make as many existing cases
> keep working without intervention as we could manage.  I need to have a look
> and see if we can fix them somehow to mean libvirt won't need extra
> knowledge (at least in supported cases) or if we're going to have to teach
> libvirt about it.

Can't QEMU set vSMT to the number of threads per core in the guest,
regardless of whether the host is POWER8 or POWER9?

I guess it can't possibly be that simple, now can it? :)

Comment 26 Andrea Bolognani 2018-01-11 13:16:35 UTC

(In reply to Laurent Vivier from comment #23)
> So as we ask to use "max-cpu-compat=power8,caps-smt=off", we should also ask
> to use "vsmt" set to a value compatible with the destination POWER8 host.

caps-smt? Do you mean cap-htm?

Comment 27 Laurent Vivier 2018-01-11 13:45:26 UTC

(In reply to Andrea Bolognani from comment #26)
> (In reply to Laurent Vivier from comment #23)
> > So as we ask to use "max-cpu-compat=power8,caps-smt=off", we should also ask
> > to use "vsmt" set to a value compatible with the destination POWER8 host.
> 
> caps-smt? Do you mean cap-htm?

yes, to disable Hardware Transactional memory, and it must be on the P8 side, not the P9 side.

on P9, "-M max-cpu-compat=power8,vsmt=8"
on P8, "-M cap-htm=off"

Thank you for the correction

Comment 28 Laurent Vivier 2018-01-11 13:49:18 UTC

(In reply to Andrea Bolognani from comment #25)
...
> Can't QEMU set vSMT to the number of threads per core in the guest,
> regardless of whether the host is POWER8 or POWER9?
> 
> I guess it can't possibly be that simple, now can it? :)

I think with some modifications in José's patch (c#21) it should be possible to set automatically vsmt to the one of the CPU provided by max-cpu-compat. But I'm not sure it is easy (and then no need to teach libvirt about that).

Comment 29 Andrea Bolognani 2018-01-11 14:16:31 UTC

(In reply to Laurent Vivier from comment #27)
> yes, to disable Hardware Transactional memory, and it must be on the P8
> side, not the P9 side.
> 
> on P9, "-M max-cpu-compat=power8,vsmt=8"
> on P8, "-M cap-htm=off"

Just to clarify: libvirt doesn't really make a distinction between
POWER8 and POWER9 at that level, meaning both sides of the migration
will end up having

  -machine pseries,max-cpu-compat=power8,cap-htm=off,vsmt=8

Well, at least the first bit. There's no support in libvirt for
optional pSeries capabilities yet, so we'll be relying on newer
machine types disabling HTM by default for migration to work, at
least for the time being, and vSMT is being discussed right here.

Comment 30 David Gibson 2018-01-12 02:39:37 UTC

> Can't QEMU set vSMT to the number of threads per core in the guest,
> regardless of whether the host is POWER8 or POWER9?

> I guess it can't possibly be that simple, now can it? :)

Alas, no.

The reason we can't do that is that qemu would then no longer work with kvm on older kernels that advertise a "vsmt" value (not actually called that but basically the same thing), but don't allow it to be changed.

Comment 31 David Gibson 2018-01-15 07:28:47 UTC

I've tackled this problem today, and posted a suggested upstream fix.

Comment 32 Andrea Bolognani 2018-01-15 13:23:00 UTC

Upstream patches, for reference:

  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg02935.html

Comment 33 David Gibson 2018-01-18 00:22:52 UTC

Pull request send upstream, awaiting merge.

Comment 37 Miroslav Rezanina 2018-01-23 13:00:38 UTC

Fix included in qemu-kvm-rhev-2.10.0-18.el7

Comment 39 xianwang 2018-01-24 08:59:06 UTC

I have re-test three scenarios with the newest version, the result are same with comment18 that after migration, the vm reboot automatically with vmcore generated or vm hang, but the content of call trace is different from comment18, while there is not problem about "ics".
version:
Host p9:
4.14.0-29.el7a.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch
# ppc64_cpu --smt=off
# echo N > /sys/module/kvm_hv/parameters/indep_threads_mode

Host p8:
3.10.0-837.el7.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
SLOF-20170724-5.git89f519f.el8.ppc64le

scenario I:
the qemu cli and steps are same with bug reports.
results:
the vm reboot automatically with vmcore generated or vm hang

scenario II:
qemu cli as comment18("-machine pseries-rhel7.5.0,max-cpu-compat=power8,vsmt=8" on p9),steps are same with bug reports
result:
the vm reboot automatically with vmcore generated or vm hang

scenario III:
on P9, "-M max-cpu-compat=power8,vsmt=8"
on P8, "-M cap-htm=off"
result:
the vm reboot automatically with vmcore generated or vm hang

call trace:
[   99.184258] Call Trace:
[   99.184261] [c0000001fa03fcc0] [c0000001fa03fcf0] 0xc0000001fa03fcf0 (unreliable)
[   99.184264] [c0000001fa03fd00] [c00000000012a618] hrtimer_start_range_ns+0x4a8/0x630
[   99.184267] [c0000001fa03fd90] [c000000000185704] tick_nohz_stop_sched_tick+0x324/0x3f0
[   99.184269] [c0000001fa03fe40] [c0000000001866fc] tick_nohz_idle_enter+0xec/0x240
[   99.184271] [c0000001fa03fec0] [c000000000171c70] cpu_startup_entry+0x80/0x1e0
[   99.184274] [c0000001fa03ff20] [c0000000000504e0] start_secondary+0x310/0x340
[   99.184276] [c0000001fa03ff90] [c000000000009a6c] start_secondary_prolog+0x10/0x14
[   99.184277] Instruction dump:
[   99.184277] f92a0000 e9490000 794707e1 4182010c 79470765 418200b4 7d284b78 7ce93b78 
[   99.184280] e9490008 7faa4040 409eff60 e9490010 <e90a0000> 790607e1 40820040 e8ea0008 
[   99.184285] ---[ end trace eff9f753f14045c6 ]---
[   99.185076] virtio_net virtio1: input.0:id 161 is not a head!
[   99.185563] Unable to handle kernel paging request for data at address 0x00000000
[   99.185565] Faulting instruction address: 0xc000000000517018
[   99.186151] 
[   99.186155] Sending IPI to other CPUs
[   99.186158] Oops: Kernel access of bad area, sig: 11 [#2]
[   99.186159] SMP NR_CPUS=2048 NUMA pSeries
[   99.186164] Modules linked in: fuse ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sg ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_net virtio_scsi scsi_transport_iscsi bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_pci virtio_ring i2c_core virtio dm_mirror dm_region_hash dm_log dm_mod
[   99.186191] CPU: 7 PID: 614 Comm: systemd-journal Tainted: G      D        ------------   3.10.0-837.el7.ppc64le #1
[   99.186193] task: c000000006a68040 ti: c0000001f10f8000 task.ti: c0000001f10f8000
[   99.186195] NIP: c000000000517018 LR: c00000000051a290 CTR: c0000000003be8a0
[   99.186196] REGS: c0000001f10fba00 TRAP: 0300   Tainted: G      D        ------------    (3.10.0-837.el7.ppc64le)
[   99.186197] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 88000222  XER: 00000000
[   99.186202] CFAR: 000000000000257c DAR: 0000000000000000 DSISR: 40000000 SOFTE: 0 
GPR00: c00000000012ae70 c0000001f10fbc80 c000000001274500 c0000001e6ae90c0 
GPR04: c00000002820a8b0 c0000001f10fbdf0 0000000000000001 c0000001e6ae9000 
GPR08: c0000001ea4b78d0 c0000001e6ae9000 0000000000000000 c000000000a30e40 
GPR12: 0000000000000000 c000000003383f00 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR20: 0000000000000000 0000000000000000 0000000000000000 c00000002820b048 
GPR24: 0000000027510000 c00000002820a860 0000000000000000 c000000000dba860 
GPR28: 0000000000000000 c000000000dba860 c00000002820a8a0 c0000001e6ae90c0 
[   99.186220] NIP [c000000000517018] rb_erase+0x1d8/0x430
[   99.186222] LR [c00000000051a290] timerqueue_del+0x40/0xd0
[   99.186223] Call Trace:
[   99.186226] [c0000001f10fbcc0] [c00000000012ae70] hrtimer_try_to_cancel+0x1f0/0x280
[   99.186230] [c0000001f10fbd30] [c0000000003bdf3c] do_timerfd_settime+0x20c/0x4b0
[   99.186232] [c0000001f10fbdb0] [c0000000003be8fc] SyS_timerfd_settime+0x5c/0xd0
[   99.186234] [c0000001f10fbe30] [c00000000000a184] system_call+0x38/0xe4
[   99.186235] Instruction dump:
[   99.186236] f92a0000 e9490000 794707e1 4182010c 79470765 418200b4 7d284b78 7ce93b78 
[   99.186238] e9490008 7faa4040 409eff60 e9490010 <e90a0000> 790607e1 40820040 e8ea0008 
[   99.186242] ---[ end trace eff9f753f14045c7 ]---

Comment 40 Laurent Vivier 2018-01-24 09:28:33 UTC

(In reply to xianwang from comment #39)
> I have re-test three scenarios with the newest version, the result are same
> with comment18 that after migration, the vm reboot automatically with vmcore
> generated or vm hang, but the content of call trace is different from
> comment18, while there is not problem about "ics".
> version:
> Host p9:
> 4.14.0-29.el7a.ppc64le
> qemu-kvm-rhev-2.10.0-18.el7.ppc64le
> SLOF-20170724-2.git89f519f.el7.noarch
> # ppc64_cpu --smt=off
> # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode
> 
> Host p8:
> 3.10.0-837.el7.ppc64le
> qemu-kvm-rhev-2.10.0-18.el7.ppc64le
> SLOF-20170724-5.git89f519f.el8.ppc64le
> 
> scenario I:
> the qemu cli and steps are same with bug reports.
> results:
> the vm reboot automatically with vmcore generated or vm hang
> 
> scenario II:
> qemu cli as comment18("-machine
> pseries-rhel7.5.0,max-cpu-compat=power8,vsmt=8" on p9),steps are same with
> bug reports
> result:
> the vm reboot automatically with vmcore generated or vm hang
> 
> scenario III:
> on P9, "-M max-cpu-compat=power8,vsmt=8"
> on P8, "-M cap-htm=off"
> result:
> the vm reboot automatically with vmcore generated or vm hang
> 
> call trace:
> [   99.184258] Call Trace:
> [   99.184261] [c0000001fa03fcc0] [c0000001fa03fcf0] 0xc0000001fa03fcf0
> (unreliable)
> [   99.184264] [c0000001fa03fd00] [c00000000012a618]
> hrtimer_start_range_ns+0x4a8/0x630
> [   99.184267] [c0000001fa03fd90] [c000000000185704]
> tick_nohz_stop_sched_tick+0x324/0x3f0

This is another bug (potentially BZ 1533718).

Reading another BZs with same symptoms, I suspect a problem with virtio. Could you reproduce the original bug without the virtio devices, and then check the new release of QEMU fix the original bug (always without the virtio devices).

please test without "vsmt=8" and "cap-htm=off" because the default values should be the good ones now.

Thanks.

Comment 41 xianwang 2018-01-25 02:49:31 UTC

I would modify this bug to "verified" due to there is no "ics" problem, as for something about "virtio" devcie, I will continue to try and will add comment.

Comment 42 xianwang 2018-01-25 06:37:24 UTC

(In reply to Laurent Vivier from comment #40)
> (In reply to xianwang from comment #39)
> > I have re-test three scenarios with the newest version, the result are same
> > with comment18 that after migration, the vm reboot automatically with vmcore
> > generated or vm hang, but the content of call trace is different from
> > comment18, while there is not problem about "ics".
> > version:
> > Host p9:
> > 4.14.0-29.el7a.ppc64le
> > qemu-kvm-rhev-2.10.0-18.el7.ppc64le
> > SLOF-20170724-2.git89f519f.el7.noarch
> > # ppc64_cpu --smt=off
> > # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode
> > 
> > Host p8:
> > 3.10.0-837.el7.ppc64le
> > qemu-kvm-rhev-2.10.0-18.el7.ppc64le
> > SLOF-20170724-5.git89f519f.el8.ppc64le
> > 
> > scenario I:
> > the qemu cli and steps are same with bug reports.
> > results:
> > the vm reboot automatically with vmcore generated or vm hang
> > 
> > scenario II:
> > qemu cli as comment18("-machine
> > pseries-rhel7.5.0,max-cpu-compat=power8,vsmt=8" on p9),steps are same with
> > bug reports
> > result:
> > the vm reboot automatically with vmcore generated or vm hang
> > 
> > scenario III:
> > on P9, "-M max-cpu-compat=power8,vsmt=8"
> > on P8, "-M cap-htm=off"
> > result:
> > the vm reboot automatically with vmcore generated or vm hang
> > 
> > call trace:
> > [   99.184258] Call Trace:
> > [   99.184261] [c0000001fa03fcc0] [c0000001fa03fcf0] 0xc0000001fa03fcf0
> > (unreliable)
> > [   99.184264] [c0000001fa03fd00] [c00000000012a618]
> > hrtimer_start_range_ns+0x4a8/0x630
> > [   99.184267] [c0000001fa03fd90] [c000000000185704]
> > tick_nohz_stop_sched_tick+0x324/0x3f0
> 
> This is another bug (potentially BZ 1533718).
> 
> Reading another BZs with same symptoms, I suspect a problem with virtio.
> Could you reproduce the original bug without the virtio devices, and then
> check the new release of QEMU fix the original bug (always without the
> virtio devices).
> 
> please test without "vsmt=8" and "cap-htm=off" because the default values
> should be the good ones now.
> 
> Thanks.


Hi, laurent,

For released build qemu-kvm-rhev-2.10.0-13.el7 on p8 and p9:
a)"with virtio device", I can reproduce this bug;
b)"without virtio device", I can also reproduce this bug;

For released build qemu-kvm-rhev-2.10.0-18.el7 on p8 and p9: 
a)"with virtio device", the result is same as comment39;
b)"without virtio device", the result is same as comment39;
so, it seems this new bug is not related to virtio device;

additional:
"with virtio device" means qemu cli is same with bug report;
"without virtio device" means qemu cli is as following:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-rhel7.5.0,max-cpu-compat=power8 \
-nodefaults \
-vga std \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \
-device spapr-vscsi,reg=0x1000,id=scsi0 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 \
-device scsi-hd,drive=drive_image1,id=system-disk,bus=scsi0.0,channel=0,scsi-id=0,lun=0 \
-m 8G \
-smp 8 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :1 \
-qmp tcp:0:8881,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \

Comment 43 Laurent Vivier 2018-01-25 07:52:46 UTC

(In reply to xianwang from comment #42)
> (In reply to Laurent Vivier from comment #40)
> > (In reply to xianwang from comment #39)
> > > I have re-test three scenarios with the newest version, the result are same
> > > with comment18 that after migration, the vm reboot automatically with vmcore
> > > generated or vm hang, but the content of call trace is different from
> > > comment18, while there is not problem about "ics".
> > > version:
> > > Host p9:
> > > 4.14.0-29.el7a.ppc64le
> > > qemu-kvm-rhev-2.10.0-18.el7.ppc64le
> > > SLOF-20170724-2.git89f519f.el7.noarch
> > > # ppc64_cpu --smt=off
> > > # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode
> > > 
> > > Host p8:
> > > 3.10.0-837.el7.ppc64le
> > > qemu-kvm-rhev-2.10.0-18.el7.ppc64le
> > > SLOF-20170724-5.git89f519f.el8.ppc64le
> > > 
> > > scenario I:
> > > the qemu cli and steps are same with bug reports.
> > > results:
> > > the vm reboot automatically with vmcore generated or vm hang
> > > 
> > > scenario II:
> > > qemu cli as comment18("-machine
> > > pseries-rhel7.5.0,max-cpu-compat=power8,vsmt=8" on p9),steps are same with
> > > bug reports
> > > result:
> > > the vm reboot automatically with vmcore generated or vm hang
> > > 
> > > scenario III:
> > > on P9, "-M max-cpu-compat=power8,vsmt=8"
> > > on P8, "-M cap-htm=off"
> > > result:
> > > the vm reboot automatically with vmcore generated or vm hang
> > > 
> > > call trace:
> > > [   99.184258] Call Trace:
> > > [   99.184261] [c0000001fa03fcc0] [c0000001fa03fcf0] 0xc0000001fa03fcf0
> > > (unreliable)
> > > [   99.184264] [c0000001fa03fd00] [c00000000012a618]
> > > hrtimer_start_range_ns+0x4a8/0x630
> > > [   99.184267] [c0000001fa03fd90] [c000000000185704]
> > > tick_nohz_stop_sched_tick+0x324/0x3f0
> > 
> > This is another bug (potentially BZ 1533718).
> > 
> > Reading another BZs with same symptoms, I suspect a problem with virtio.
> > Could you reproduce the original bug without the virtio devices, and then
> > check the new release of QEMU fix the original bug (always without the
> > virtio devices).
> > 
> > please test without "vsmt=8" and "cap-htm=off" because the default values
> > should be the good ones now.
> > 
> > Thanks.
> 
> 
> Hi, laurent,
> 
> For released build qemu-kvm-rhev-2.10.0-13.el7 on p8 and p9:
> a)"with virtio device", I can reproduce this bug;
> b)"without virtio device", I can also reproduce this bug;
> 
> For released build qemu-kvm-rhev-2.10.0-18.el7 on p8 and p9: 
> a)"with virtio device", the result is same as comment39;
> b)"without virtio device", the result is same as comment39;
> so, it seems this new bug is not related to virtio device;

I agree, thank you.

Comment 45 errata-xmlrpc 2018-04-11 00:55:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104