Bug 1526266
| Summary: | qemu-kvm process on destination side quitted itself abnormally after live migration | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | yilzhang |
| Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> |
| Status: | CLOSED DUPLICATE | QA Contact: | xianwang <xianwang> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 7.5 | CC: | bugproxy, dgibson, dzheng, fnovak, hannsj_uhl, junli, knoel, lvivier, micai, michen, qzhang, virt-maint, xianwang, yhong, yilzhang |
| Target Milestone: | rc | ||
| Target Release: | 7.5 | ||
| Hardware: | ppc64le | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-01-28 22:47:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1523414 | ||
| Bug Blocks: | 1399177, 1476742 | ||
|
Description
yilzhang
2017-12-15 05:20:24 UTC
Yilin, Does this happen only on Power, or on x86 as well? Seems this bug only happens when the local filesystem of guest is nearly full, so the whole "Reproduce Steps" may be: 1. Start one guest on src host, the guest uses one NBD backend image as its system disk 2. Login guest, and write a big file to overwrite the guest's local filesystem [root@virt8-Guest3 ~]# dd if=/dev/zero of=/root/test bs=1M count=16000 oflag=sync dd: error writing ‘test’: No space left on device 13604+0 records in 13603+0 records out 14263844864 bytes (14 GB) copied, 403.345 s, 35.4 MB/s [root@virt8-Guest3 ~]# df -h /root Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhelaa-root 17G 17G 840K 100% / 3. Start qemu on des host with "incoming" option -incoming tcp:0:7777 4. Migrate guest from src to des (qemu) migrate tcp:10.0.1.8:7777 5. After a while, check migration status on source side, the migration completed successfully. (qemu) info status VM status: paused (postmigrate) 6. Check qemu-kvm process on destination side Actual results: Sometimes qemu-kvm process on destination side quitted itself abnormally after live migration The reproducible rate is low, about 30% And the above test was executed on Power9 hosts, and I will try it on x86. Thanks for the extra information. I'll await information on whether it also occurs on x86. x86 doesn't have this issue. I ran the test for more than 30 times and did not hit. Src host and Des host have the same kernel version and qemu-kvm version: Host kernel: 3.10.0-823.el7.x86_64 qemu-kvm: qemu-kvm-rhev-2.10.0-12.el7 Guest kernel: 3.10.0-799.el7.x86_64 This looks like it's ppc specific, so assigning to the POWER team. I have hit this issue when do migration from P8 to P9, after migration, qemu of P9(dst) quit automatically without any related message. version: Host: P8: 3.10.0-823.el7.ppc64le qemu-kvm-rhev-2.10.0-13.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch P9: 4.14.0-18.el7a.XIVE_fixes.ppc64le qemu-kvm-rhev-2.10.0-12.el7.BZ1525866.ppc64le SLOF-20170724-2.git89f519f.el7.noarch # ppc64_cpu --smt=off # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode Guest: 3.10.0-823.el7.ppc64le P8->P9 p8 qemu cli: /usr/libexec/qemu-kvm -monitor stdio -machine pseries-rhel7.5.0 -nodefaults -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/RHEL.7.5LE.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 p9 qemu cli: /usr/libexec/qemu-kvm -monitor stdio -machine pseries-rhel7.5.0,max-cpu-compat=power8 -nodefaults -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/mount_point/RHEL.7.5LE.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 -incoming tcp:0:5801 result: src: migration complete and gust is "VM status: paused (postmigrate)" dst: qemu quit automatically without any message. (In reply to xianwang from comment #7) > I have hit this issue when do migration from P8 to P9, after migration, qemu > of P9(dst) quit automatically without any related message. > version: > Host: > P8: > 3.10.0-823.el7.ppc64le > qemu-kvm-rhev-2.10.0-13.el7.ppc64le > SLOF-20170724-2.git89f519f.el7.noarch > > P9: > 4.14.0-18.el7a.XIVE_fixes.ppc64le > qemu-kvm-rhev-2.10.0-12.el7.BZ1525866.ppc64le > SLOF-20170724-2.git89f519f.el7.noarch > # ppc64_cpu --smt=off > # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode > > Guest: > 3.10.0-823.el7.ppc64le > > P8->P9 > p8 qemu cli: > /usr/libexec/qemu-kvm -monitor stdio -machine pseries-rhel7.5.0 -nodefaults > -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive > id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2, > file=/home/RHEL.7.5LE.qcow2 -device > scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi- > id=0,lun=0,bootindex=0 > p9 qemu cli: > /usr/libexec/qemu-kvm -monitor stdio -machine > pseries-rhel7.5.0,max-cpu-compat=power8 -nodefaults -device > virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive > id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2, > file=/home/xianwang/mount_point/RHEL.7.5LE.qcow2 -device > scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi- > id=0,lun=0,bootindex=0 -incoming tcp:0:5801 > > result: > src: migration complete and gust is "VM status: paused (postmigrate)" > dst: qemu quit automatically without any message. Img is shared for src and dst by nfs Yilin or Xianxian, I suspect the guest might be crashing, causing qemu to quit with a GUEST_PANICKED notification. Can you check this my connecting a QMP socket to qemu on both source and dest, and issueing the capabilities command to initialize it before attempting the migration. Also, if I understand correctly the situation in comment 7 is different from comment 0 - it us using NFS shared images instead of NBD. Is that right? This suggests the problem is not actually related to NBD. (In reply to David Gibson from comment #10) > Also, if I understand correctly the situation in comment 7 is different from > comment 0 - it us using NFS shared images instead of NBD. Is that right? > > This suggests the problem is not actually related to NBD. Hi, David, a)yes, I think so, this problem is not related to NBD, because I hit this problem with NFS not NBD. b)I have re-test this scenario with latest released version and also hit this bug, there is some event prompting in qmp version: Host: P8: 3.10.0-827.el7.ppc64le qemu-kvm-rhev-2.10.0-16.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch P9: 4.14.0-20.el7a.ppc64le qemu-kvm-rhev-2.10.0-16.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch qemu cli on p8: # /usr/libexec/qemu-kvm -monitor stdio -machine pseries-.0 -nodefaults -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/rhel75-ppc64le-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 -qmp tcp:0:8881,server,nowait -chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 qemu cli on p9: # /usr/libexec/qemu-kvm -monitor stdio -machine pseries-rhel7.5.0,max-cpu-compat=power8 -nodefaults -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 -incoming tcp:0:5801 -qmp tcp:0:8881,server,nowait -chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 Do migration from p8->p9 (qemu) migrate -d tcp:10.19.19.13:5801 resut: migration completed without error message P8: (qemu) info migrate Migration status: completed (qemu) info status VM status: paused (postmigrate) QMP: {"execute{"timestamp": {"seconds": 1515487838, "microseconds": 547447}, "event": "STOP"} {"timestamp": {"seconds": 1515488019, "microseconds": 818220}, "event": "SHUTDOWN", "data": {"guest": false}} P9: qemu quit automatically. QMP: {"execute":"qmp_capabilities"} {"return": {}} {"timestamp": {"seconds": 1515487838, "microseconds": 654350}, "event": "RESUME"} {"timestamp": {"seconds": 1515487844, "microseconds": 727193}, "event": "GUEST_PANICKED", "data": {"action": "pause"}} {"timestamp": {"seconds": 1515487844, "microseconds": 727278}, "event": "GUEST_PANICKED", "data": {"action": "poweroff"}} {"timestamp": {"seconds": 1515487844, "microseconds": 727322}, "event": "SHUTDOWN", "data": {"guest": true}} I have re-try several times with latest released version and hit this problem 100% when do migration from p8->p9, it blocks migration from p8->p9, so, I think it is serial enough to improve its priority. Host: P8: 3.10.0-827.el7.ppc64le qemu-kvm-rhev-2.10.0-16.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch P9: 4.14.0-20.el7a.ppc64le qemu-kvm-rhev-2.10.0-16.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch # ppc64_cpu --smt SMT is off # cat /sys/module/kvm_hv/parameters/indep_threads_mode N qemu cli on p8: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pseries-rhel7.5.0 \ -nodefaults \ -vga std \ -chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \ -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/rhel75-ppc64le-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \ -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=11 \ -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -m 8G \ -smp 8 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device usb-mouse,id=input1,bus=usb1.0,port=2 \ -device usb-kbd,id=input2,bus=usb1.0,port=3 \ -vnc :1 \ -qmp tcp:0:8881,server,nowait \ -monitor stdio \ -rtc base=utc,clock=host \ -boot order=cdn,once=c,menu=on,strict=on \ -enable-kvm \ qemu cli on p9: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pseries-rhel7.5.0,max-cpu-compat=power8 \ -nodefaults \ -vga std \ -chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \ -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \ -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=11 \ -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -m 8G \ -smp 8 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device usb-mouse,id=input1,bus=usb1.0,port=2 \ -device usb-kbd,id=input2,bus=usb1.0,port=3 \ -vnc :1 \ -incoming tcp:0:5801 \ -qmp tcp:0:8881,server,nowait \ -monitor stdio \ -rtc base=utc,clock=host \ -boot order=cdn,once=c,menu=on,strict=on \ -enable-kvm \ Do migration from p8->p9 (qemu) migrate -d tcp:10.19.19.13:5801 result is same with above(comment 11) Ok, so it's a guest crash triggering the problem. Can we please get a log of the guest console - it's vty device. A screenshot of the VGA after the crash might also be helpful. If you're unable to get the log / screenshot because qemu is quitting, you should be able to prevent that by adding the -no-shutdown option. (In reply to David Gibson from comment #13) > Ok, so it's a guest crash triggering the problem. > > Can we please get a log of the guest console - it's vty device. A > screenshot of the VGA after the crash might also be helpful. > > If you're unable to get the log / screenshot because qemu is quitting, you > should be able to prevent that by adding the -no-shutdown option. Hi, David, I re-try as same as comment 11 scenario, the console output log is as following: console output of p9(des): # nc -U /tmp/console0 [ 126.874418] Oops: Exception in kernel mode, sig: 4 [#1] [ 126.874622] SMP NR_CPUS=2048 NUMA pSeries [ 126.874814] Modules linked in: ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sg ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common scsi_transport_iscsi virtio_scsi virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [ 126.876702] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-827.el7.ppc64le #1 [ 126.876831] task: c0000000011fdb20 ti: c000000001270000 task.ti: c000000001270000 [ 126.876960] NIP: c00000000005acec LR: c000000000019e20 CTR: 0000000000000000 [ 126.877088] REGS: c000000001273ab0 TRAP: 0700 Not tainted (3.10.0-827.el7.ppc64le) [ 126.877215] MSR: 8000000100081033 <SF,ME,IR,DR,RI,LE> CR: 28002048 XER: 00000000 [ 126.877535] CFAR: c000000000019e1c SOFTE: 0 GPR00: 0000000100000000 c000000001273d30 c000000001273f00 c0000000011fe030 GPR04: 8000000100001033 0000000000000008 000000000000003f 0000000000000000 GPR08: 0000000000000001 c000000001273ea0 0000000000000000 c00000000fb80004 GPR12: 00003fffa8130000 c00000000fb80000 0000000000000000 c000000000dce800 GPR16: c000000000dce800 c000000000a0b4b4 c000000000dce800 c000000000dce800 GPR20: 0000000000000001 c000000000dce800 0000000000000000 0000000000000004 GPR24: c000000000dce800 c00000001ed64780 c000000001270000 c0000000012e8190 GPR28: c0000000011fe030 c0000000037dd640 c0000000011fdb20 c0000000011fdb20 [ 126.879381] NIP [c00000000005acec] tm_save_sprs+0x0/0x1c [ 126.879476] LR [c000000000019e20] __switch_to+0x140/0x470 [ 126.879569] Call Trace: [ 126.879671] [c000000001273d30] [c000000001273d90] init_thread_union+0x3d90/0x3f00 (unreliable) [ 126.879892] [c000000001273d90] [c000000000a098a8] __schedule+0x438/0xa30 [ 126.880034] [c000000001273e60] [c000000000a0b4b4] schedule_preempt_disabled+0x34/0xb0 [ 126.880214] [c000000001273e80] [c000000000171824] cpu_startup_entry+0x134/0x1e0 [ 126.880376] [c000000001273ee0] [c00000000000caec] rest_init+0x9c/0xb0 [ 126.880552] [c000000001273f00] [c000000000d23e74] start_kernel+0x4cc/0x4e8 [ 126.880690] [c000000001273f90] [c000000000009b6c] start_here_common+0x20/0xa8 [ 126.880826] Instruction dump: [ 126.880898] 60420000 39200005 7d234b78 4e800020 7c8000a6 38600001 786307c6 7c801839 [ 126.881135] 4082000c 7c841b78 7c800164 4e800020 <7c0022a6> f80304a8 7c0222a6 f80304b0 [ 126.881376] ---[ end trace c6d75c4b75b03c34 ]--- [ 126.884522] [ 128.884638] Kernel panic - not syncing: Fatal exception (In reply to xianwang from comment #14) ... > [ 126.879381] NIP [c00000000005acec] tm_save_sprs+0x0/0x1c The problem is related to the TM hardware bug. The Transactional Memory must be disabled on the P8 side to allow to migrate to P9 host. TM problem is tracked by BZ 1523414 The upstream patch series to fix it is: spapr: Capabilities infrastructure http://patchwork.ozlabs.org/patch/854868/ spapr: Treat Hardware Transactional Memory (HTM) as an optional capability http://patchwork.ozlabs.org/patch/854865/ spapr: Validate capabilities on migration http://patchwork.ozlabs.org/patch/854864/ Once this series applied, guest to be started to a P9 host from a P8 host must be started with: ... -machine cap-htm=false ... Hi, Laurent, I have tried with "-smp 1", other information is same with comment19, after migration, vm is running on destination host and after a while vm will hang with error message prompting on console,hmp and qmp: # nc -U /tmp/console0 Message from syslogd@localhost at Jan 12 01:41:38 ... kernel:kvmppc_emulate_mmio: emulation failed (7c6020ce) (qemu) info status VM status: paused (internal-error) Hi Xianxian, (In reply to xianwang from comment #21) > Hi, Laurent, > I have tried with "-smp 1", other information is same with comment19, after > migration, vm is running on destination host and after a while vm will hang > with error message prompting on console,hmp and qmp: > > # nc -U /tmp/console0 > Message from syslogd@localhost at Jan 12 01:41:38 ... > kernel:kvmppc_emulate_mmio: emulation failed (7c6020ce) > > (qemu) info status > VM status: paused (internal-error) This is another bug, please open a new BZ to track it. This one is related to HTM and I think we can set as duplicate of BZ 1523414. It is reproduced with 4.14.0-23.el7a.ppc64le and qemu-kvm-rhev-2.10.0-16.el7.ppc64le on beaker host(ibm-p9z-09.pnr.lab.eng.bos.redhat.com; ibm-p9b-07.pnr.lab.eng.bos.redhat.com) Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides This release should disable HTM by default. (In reply to Laurent Vivier from comment #24) > Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides > > This release should disable HTM by default. Hi, Laurent, I re test the scenario with the newest version as #comment7 and I didn't hit this issue any more version: Host p8: 3.10.0-837.el7.ppc64le qemu-kvm-rhev-2.10.0-18.el7.ppc64le SLOF-20170724-5.git89f519f.el8.ppc64le Host p9: 4.14.0-29.el7a.ppc64le qemu-kvm-rhev-2.10.0-18.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch # ppc64_cpu --smt=off # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode Guest: 3.10.0-837.el7.ppc64le the qemu cli is as following: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pseries-rhel7.5.0,max-cpu-compat=power8 \ -nodefaults \ -vga std \ -chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \ -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \ -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=11 \ -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -m 8G \ -smp 8 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device usb-mouse,id=input1,bus=usb1.0,port=2 \ -device usb-kbd,id=input2,bus=usb1.0,port=3 \ -vnc :1 \ -incoming tcp:0:5801 \ -qmp tcp:0:8881,server,nowait \ -monitor stdio \ -rtc base=utc,clock=host \ -boot order=cdn,once=c,menu=on,strict=on \ -enable-kvm \ Do migration from p9->p8 result: migration completed and vm works well. (In reply to xianwang from comment #25) > (In reply to Laurent Vivier from comment #24) > > Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides > > > > This release should disable HTM by default. ... > Do migration from p9->p8 The HTM problem is with p8 -> p9. Could you re-test? Thanks. (In reply to xianwang from comment #25) > (In reply to Laurent Vivier from comment #24) > > Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides > > > > This release should disable HTM by default. > > Hi, Laurent, > I re test the scenario with the newest version as #comment7 and I didn't hit > this issue any more > version: > Host p8: > 3.10.0-837.el7.ppc64le > qemu-kvm-rhev-2.10.0-18.el7.ppc64le > SLOF-20170724-5.git89f519f.el8.ppc64le > > Host p9: > 4.14.0-29.el7a.ppc64le > qemu-kvm-rhev-2.10.0-18.el7.ppc64le > SLOF-20170724-2.git89f519f.el7.noarch > # ppc64_cpu --smt=off > # echo N > /sys/module/kvm_hv/parameters/indep_threads_mode > > Guest: > 3.10.0-837.el7.ppc64le > > the qemu cli is as following: > /usr/libexec/qemu-kvm \ > -name 'avocado-vt-vm1' \ > -sandbox off \ > -machine pseries-rhel7.5.0,max-cpu-compat=power8 \ > -nodefaults \ > -vga std \ > -chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \ > -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \ > -device nec-usb-xhci,id=usb1,bus=pci.0,addr=06 \ > -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03 \ > -drive > id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2, > file=/home/xianwang/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 \ > -device > scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi- > id=0,lun=0,bootindex=0 \ > -device > virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53, > bus=pci.0,addr=11 \ > -netdev > tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ > -m 8G \ > -smp 8 \ > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > -device usb-mouse,id=input1,bus=usb1.0,port=2 \ > -device usb-kbd,id=input2,bus=usb1.0,port=3 \ > -vnc :1 \ > -incoming tcp:0:5801 \ > -qmp tcp:0:8881,server,nowait \ > -monitor stdio \ > -rtc base=utc,clock=host \ > -boot order=cdn,once=c,menu=on,strict=on \ > -enable-kvm \ > > Do migration from p9->p8 > Sorry, I have a mistake here, it should be "Do migration from p8->p9" > result: > migration completed and vm works well. (In reply to Laurent Vivier from comment #26) > (In reply to xianwang from comment #25) > > (In reply to Laurent Vivier from comment #24) > > > Could you retest with qemu-kvm-rhev-2.10.0-18.el7 on both sides > > > > > > This release should disable HTM by default. > ... > > Do migration from p9->p8 > > The HTM problem is with p8 -> p9. > > Could you re-test? > > Thanks. yes, I have tried p8->p9, result is pass just as #comment27 Excellent, looks like this is handled by the fix for bug 1523414, as expected. *** This bug has been marked as a duplicate of bug 1523414 *** I'm unable to reproduce this bug using the test scenario in Comment #0. I ran the case in Comment0 for twenty times, did not hit this bug. Host kernel: 4.14.0-29.el7a.ppc64le qemu-kvm: qemu-kvm-rhev-2.10.0-18.el7 I Hit this error with the following version on Power9 host, but guest is Power8-compat mode,It happened during guest installation for ping-pong migration:
Host:
kernel-4.14.0-33.el7a.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
Guest:
RHEL-7.5-20180125.0-Server-ppc64le-dvd1.iso
kernel-3.10.0-837.el7.ppc64le
QEMU command line:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-rhel7.5.0,max-cpu-compat=power8 \
-nodefaults \
-vga std \
-device spapr-pci-host-bridge,index=1 \
-device virtio-scsi-pci,bus=pci.1,id=scsi0,addr=0x3 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x05 \
-device pci-ohci,id=usb3,bus=pci.0,addr=0x06 \
-device pci-bridge,id=pci_bridge_1,bus=pci.0,addr=0xc,chassis_nr=1 \
-device pci-bridge,id=pci_bridge_2,bus=pci.0,addr=0xd,chassis_nr=2 \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci_bridge_1,iothread=iothread0,addr=0x07 \
-device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pci.0,addr=0x08 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/micai/p9.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/micai/RHEL-7.5-20180125.0-Server-ppc64le-dvd1.iso \
-device scsi-cd,id=cd1,drive=drive_cd1,bus=virtio_scsi_pci1.0,channel=0,scsi-id=0,lun=0,bootindex=1 \
-device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,netdev=idjlQN53,vectors=10,mq=on,status=on,bus=pci.0,addr=0xa \
-netdev tap,id=idjlQN53,vhost=on,queues=4,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 4G,slots=4,maxmem=32G \
-incoming tcp:0:5800 \
-smp 8,cores=4,threads=1,sockets=2 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :2 \
-qmp tcp:0:8881,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \
-watchdog i6300esb \
-watchdog-action reset \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xb \
steps:
1.# ppc64_cpu --smt=off
# echo N > /sys/module/kvm_hv/parameters/indep_threads_mode
2.Start guest at src, enter the installation interface,
3.start guest at des.
4.During the installation, the ping-pong migration is performed.
the result:
[root@c155f3-u23 ~]# telnet 127.0.0.1 8881
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": "(qemu-kvm-rhev-2.10.0-18.el7)"}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"timestamp": {"seconds": 1517731476, "microseconds": 150918}, "event": "VNC_CONNECTED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5902", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "35566", "host": "10.0.0.1", "websocket": false}}}
{"timestamp": {"seconds": 1517731477, "microseconds": 289988}, "event": "VNC_INITIALIZED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5902", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "35566", "host": "10.0.0.1", "websocket": false}}}
{"timestamp": {"seconds": 1517731694, "microseconds": 757303}, "event": "RTC_CHANGE", "data": {"offset": -1}}
{"timestamp": {"seconds": 1517731701, "microseconds": 718556}, "event": "STOP"}
{"timestamp": {"seconds": 1517731733, "microseconds": 798718}, "event": "SHUTDOWN", "data": {"guest": false}}
Connection closed by foreign host.
[root@localhost micai]# sh xx.sh
QEMU 2.10.0 monitor - type 'help' for more information
[root@localhost micai]#
Ping-pong migration between P9 and P8 hosts with a P8 guest cannot be performed because of a bug in POWER9 hardware < DD2.2 See BZ 1536009 comment 20 |