Description of problem: VM hangs after migration with 200 vCPUs. Version-Release number of selected component (if applicable): Host: hp-bl920gen8-01.khw.lab.eng.bos.redhat.com # uname -r 3.10.0-693.5.2.el7.x86_64 # rpm -q qemu-kvm-rhev qemu-kvm-rhev-2.10.0-13.el7.x86_64 # rpm -q seabios seabios-1.11.0-1.el7.x86_64 Guest: # uname -r 3.10.0-823.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1.Boot VM with 200 vCPUs: /usr/libexec/qemu-kvm \ -S \ -name 'RHEL7.5-1' \ -machine q35,kernel-irqchip=split \ -device intel-iommu,intremap=on,eim=on \ -m 8G \ -smp 200,maxcpus=384,sockets=2,cores=96,threads=2 \ -cpu SandyBridge,enforce \ -rtc base=localtime,clock=host,driftfix=slew \ -nodefaults \ -device AC97 \ -vga qxl \ -chardev socket,id=seabioslog_log,path=/tmp/seabios-log1,server,nowait \ -device isa-debugcon,chardev=seabioslog_log,iobase=0x402 \ -device usb-ehci,id=usb1 \ -device usb-tablet,id=usb-tablet1 \ -boot menu=on \ -enable-kvm \ -monitor stdio \ -device pcie-root-port,id=root1,chassis=1 \ -netdev tap,id=netdev0,vhost=on \ -device virtio-net-pci,mac=BA:BC:13:83:4F:1D,id=net0,netdev=netdev0,status=on \ -spice port=5900,disable-ticketing \ -qmp tcp:0:9999,server,nowait \ -drive file=/mnt/rhel75-seabios-virtio.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive_sysdisk,id=device_sysdisk,bus=root1,bootindex=1 \ -serial unix:/tmp/console1,server,nowait \ 2.Do migration on same host: (qemu) migrate -d tcp:127.0.0.1:1234 Actual results: Migration completed but destination VM hangs with following console output: [ 164.909287] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [kworker/u768:3:1431] [ 192.908390] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [kworker/u768:3:1431] [ 200.610440] INFO: rcu_sched self-detected stall on CPU[ 200.612466] INFO: rcu_sched detected stalls on CPUs/tasks: [ 224.904577] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [kworker/u768:3:1431] [ 252.896461] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [kworker/u768:3:1431] [ 280.889156] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [kworker/u768:3:1431] [ 308.887126] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [kworker/u768:3:1431] [ 336.885147] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [kworker/u768:3:1431] [ 364.884723] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [kworker/u768:3:1431] [ 380.591713] INFO: rcu_sched self-detected stall on CPU { 4} (t=240005 jiffies g=3663 c=3662 q=4324) Expected results: Destination VM works well after migration. Additional info: 1.It cannot be reproduced with pc machine type (pc and seabios support up to 240 vCPUs). 2.It cannot be reproduced with less vCPUs. 3.It can be reproduced with q35 and ovmf. # rpm -q OVMF OVMF-20171011-4.git92d07e48907f.el7.noarch 4.Installing RHEL7.5 failed to this host "hp-bl920gen8-01.khw.lab.eng.bos.redhat.com".
Additional info: Host: hp-bl920gen8-01.khw.lab.eng.bos.redhat.com # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 480 On-line CPU(s) list: 0-479 Thread(s) per core: 2 Core(s) per socket: 15 Socket(s): 16 NUMA node(s): 16 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-2890 v2 @ 2.80GHz Stepping: 7 CPU MHz: 3011.640 CPU max MHz: 3400.0000 CPU min MHz: 1200.0000 BogoMIPS: 5587.14 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 38400K NUMA node0 CPU(s): 0-14,240-254 NUMA node1 CPU(s): 15-29,255-269 NUMA node2 CPU(s): 30-44,270-284 NUMA node3 CPU(s): 45-59,285-299 NUMA node4 CPU(s): 60-74,300-314 NUMA node5 CPU(s): 75-89,315-329 NUMA node6 CPU(s): 90-104,330-344 NUMA node7 CPU(s): 105-119,345-359 NUMA node8 CPU(s): 120-134,360-374 NUMA node9 CPU(s): 135-149,375-389 NUMA node10 CPU(s): 150-164,390-404 NUMA node11 CPU(s): 165-179,405-419 NUMA node12 CPU(s): 180-194,420-434 NUMA node13 CPU(s): 195-209,435-449 NUMA node14 CPU(s): 210-224,450-464 NUMA node15 CPU(s): 225-239,465-479 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts # free -h total used free shared buff/cache available Mem: 11T 73G 11T 26M 75G 11T Swap: 4.0G 0B 4.0G
Only reproduces with q35, which is a tech preview on RHEL 7; moving to RHEL 8.
Virt QE, Does this reproduce on RHEL 8.0? Thanks.
In rhel8 host,can reproduce this bug. host version: qemu-kvm-core-3.1.0-2.module+el8+2606+2c716ad7.x86_64 kernel-4.18.0-57.el8.x86_64 seabios-1.11.1-3.module+el8+2603+0a5231c4.x86_64 test steps: 1.Boot VM with 200 vCPUs 2.Do migration on same host (qemu) migrate -d tcp:10.16.184.212:1234 Actual results: Migration completed but destination VM hangs with following console output [ 228.529493] watchdog: BUG: soft lockup - CPU#88 stuck for 22s! [kworker/u768:4:1825] [ 228.545497] Modules linked in: uinput fuse xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_counter tun bridge stp llc devlink nf_tables_set nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nft_chain_route_ipv6 nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nft_chain_route_ipv4 nf_conntrack ip6_tables ip_tables nft_compat ip_set nf_tables nfnetlink sunrpc iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul snd_intel8x0 ghash_clmulni_intel snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm i2c_i801 pcspkr snd_timer joydev lpc_ich snd soundcore xfs libcrc32c qxl drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci [ 228.556145] crc32c_intel virtio_net drm libata net_failover virtio_blk serio_raw failover dm_mirror dm_region_hash dm_log dm_mod [ 228.557922] CPU: 88 PID: 1825 Comm: kworker/u768:4 Kdump: loaded Not tainted 4.18.0-57.el8.x86_64 #1 [ 228.559298] Hardware name: Red Hat KVM, BIOS 1.11.1-3.module+el8+2603+0a5231c4 04/01/2014 [ 228.560549] Workqueue: writeback wb_workfn (flush-253:0) [ 228.561366] RIP: 0010:smp_call_function_single+0xce/0xf0 [ 228.562177] Code: 8b 4c 24 38 65 48 33 0c 25 28 00 00 00 75 34 c9 c3 48 89 d1 48 89 f2 48 89 e6 e8 7d fe ff ff 8b 54 24 18 83 e2 01 74 0b f3 90 <8b> 54 24 18 83 e2 01 75 f5 eb ca 8b 05 91 59 8c 01 85 c0 75 88 0f [ 228.564993] RSP: 0018:ffff9d2dc537f820 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 [ 228.566138] RAX: 0000000000000000 RBX: ffff91da5a94d4c8 RCX: 0000000000000830 [ 228.567222] RDX: 0000000000000001 RSI: 00000000000008fb RDI: 0000000000000830 [ 228.568304] RBP: ffff9d2dc537f870 R08: 00000000000000c6 R09: 0000000000000040 [ 228.569387] R10: ffff91da37d01918 R11: 0000000000000000 R12: 0000000000000058 [ 228.570472] R13: 00007f61e5b78000 R14: ffff91da817e9bb8 R15: 00007f61e5b77000 [ 228.571551] FS: 0000000000000000(0000) GS:ffff91dab3200000(0000) knlGS:0000000000000000 [ 228.572775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 228.573650] CR2: 00000000ffffffff CR3: 00000001ed40c000 CR4: 00000000000406e0 [ 228.574737] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 228.575823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 228.576911] Call Trace: [ 228.577304] ? flush_tlb_func_common.constprop.8+0x200/0x200 [ 228.578173] flush_tlb_mm_range+0xd0/0x120 [ 228.578807] ? vring_map_single.constprop.23+0x1b/0xc0 [ 228.579600] ptep_clear_flush+0x4e/0x60 [ 228.580199] page_mkclean_one+0xd7/0x190 [ 228.580807] rmap_walk_file+0xf7/0x260 [ 228.581388] page_mkclean+0xa4/0xc0 [ 228.581934] ? invalid_page_referenced_vma+0x80/0x80 [ 228.582689] ? pmdp_collapse_flush+0x10/0x10 [ 228.583353] clear_page_dirty_for_io+0xa0/0x290 [ 228.584057] write_cache_pages+0x192/0x450 [ 228.584741] ? xfs_vm_releasepage+0x80/0x80 [xfs] [ 228.585467] ? kvm_sched_clock_read+0x1a/0x30 [ 228.586145] ? sched_clock+0x5/0x10 [ 228.586684] ? sched_clock_cpu+0xc/0xb0 [ 228.587281] ? update_rq_clock+0xef/0x120 [ 228.587903] ? update_blocked_averages+0x105/0x490 [ 228.588635] ? __update_load_avg_cfs_rq.isra.39+0x194/0x1a0 [ 228.589513] xfs_vm_writepages+0x64/0xa0 [xfs] [ 228.590200] do_writepages+0x41/0xd0 [ 228.590752] __writeback_single_inode+0x3d/0x360 [ 228.591464] writeback_sb_inodes+0x1e3/0x450 [ 228.592129] __writeback_inodes_wb+0x5d/0xb0 [ 228.592784] wb_writeback+0x25f/0x2f0 [ 228.593354] ? cpumask_next+0x17/0x20 [ 228.593925] wb_workfn+0x342/0x400 [ 228.594450] ? __switch_to+0x8c/0x480 [ 228.595023] process_one_work+0x1a7/0x360 [ 228.595639] worker_thread+0x30/0x390 [ 228.596208] ? pwq_unbound_release_workfn+0xd0/0xd0 [ 228.597637] kthread+0x112/0x130 [ 228.598494] ? kthread_bind+0x30/0x30 [ 228.599064] ret_from_fork+0x35/0x40 Add info: 1)In src cmd: /usr/libexec/qemu-kvm \ -S \ -name 'RHEL8' \ -machine q35,kernel-irqchip=split \ -device intel-iommu,intremap=on,eim=on \ -m 8G \ -smp 200,maxcpus=384,sockets=2,cores=96,threads=2 \ -cpu SandyBridge,enforce \ -rtc base=localtime,clock=host,driftfix=slew \ -nodefaults \ -device AC97 \ -vga qxl \ -chardev socket,id=seabioslog_log,path=/tmp/seabios-log1,server,nowait \ -device isa-debugcon,chardev=seabioslog_log,iobase=0x402 \ -device usb-ehci,id=usb1 \ -device usb-tablet,id=usb-tablet1 \ -boot menu=on \ -enable-kvm \ -monitor stdio \ -device pcie-root-port,id=root1,chassis=1 \ -netdev tap,id=netdev0,vhost=on \ -device virtio-net-pci,mac=BA:BC:13:83:4F:1D,id=net0,netdev=netdev0,status=on \ -spice port=5900,disable-ticketing \ -qmp tcp:0:9999,server,nowait \ -drive file=/mnt/rhel8-seabios.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive_sysdisk,id=device_sysdisk,bus=root1,bootindex=1 \ -serial unix:/tmp/console1,server,nowait \ In dst cmd: /usr/libexec/qemu-kvm \ -name 'RHEL8-dst' \ -machine q35,kernel-irqchip=split \ -device intel-iommu,intremap=on,eim=on \ -m 8G \ -smp 200,maxcpus=384,sockets=2,cores=96,threads=2 \ -cpu SandyBridge,enforce \ -rtc base=localtime,clock=host,driftfix=slew \ -nodefaults \ -device AC97 \ -vga qxl \ -chardev socket,id=seabioslog_log,path=/tmp/seabios-log1,server,nowait \ -device isa-debugcon,chardev=seabioslog_log,iobase=0x402 \ -device usb-ehci,id=usb1 \ -device usb-tablet,id=usb-tablet1 \ -boot menu=on \ -enable-kvm \ -monitor stdio \ -device pcie-root-port,id=root1,chassis=1 \ -netdev tap,id=netdev0,vhost=on \ -device virtio-net-pci,mac=BA:BC:13:83:4F:1D,id=net0,netdev=netdev0,status=on \ -spice port=5901,disable-ticketing \ -qmp tcp:0:9998,server,nowait \ -drive file=/mnt/rhel8-seabios.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive_sysdisk,id=device_sysdisk,bus=root1,bootindex=1 \ -serial unix:/tmp/console2,server,nowait \ -incoming tcp:0:1234 \ 2)host name:lenovo-electron-sr850-01.khw.lab.eng.bos.redhat.com # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 224 On-line CPU(s) list: 0-223 Thread(s) per core: 2 Core(s) per socket: 28 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz Stepping: 4 CPU MHz: 1000.014 BogoMIPS: 4200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 39424K NUMA node0 CPU(s): 0-27,112-139 NUMA node1 CPU(s): 28-55,140-167 NUMA node2 CPU(s): 56-83,168-195 NUMA node3 CPU(s): 84-111,196-223 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts hwp_epp pku ospke flush_l1d
Also can reproduce it on rhel8, please check comment7
(In reply to jingzhao from comment #8) > Also can reproduce it on rhel8, please check comment7 Looks like qemu-kvm-3.1.0 (fast train) was tested. Is this also a problem with Virt:rhel with qemu-kvm-2.12.0?
(In reply to Karen Noel from comment #9) > (In reply to jingzhao from comment #8) > > Also can reproduce it on rhel8, please check comment7 > > Looks like qemu-kvm-3.1.0 (fast train) was tested. Is this also a problem > with Virt:rhel with qemu-kvm-2.12.0? I can reproduce this bug with qemu-kvm-2.12.0 reproduce version: qemu-kvm-2.12.0-51.module+el8+2608+a17c4bfe.x86_64 kernel-4.18.0-57.el8.x86_64 seabios-1.11.1-3.module+el8+2529+a9686a4d.x86_64 Guest:rhel8(kernel-4.18.0-57.el8.x86_64)
The issue is reproducible with RHEL-8.1 and QEMU-4.1, most likely it is still present upstream. Interestingly enough, it is reproducible with -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 (Full command line I was using: '-name fedora -nodefaults -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 -machine q35,accel=kvm -device intel-iommu,intremap=on,eim=on -cpu SandyBridge -drive file=/var/lib/libvirt/images/fedora26_vl1006.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -vnc :0 -vga std -m 8G -boot menu=on -net nic,model=e1000e -net bridge,br=br0 -monitor stdio') and not reproducible with -smp 200 My first guess would be that it is related to interrupt remapping in QEMU but honestly I don't know much about it. Cc: David, the main migration expert. David, do you know how to check that 'intel-iommu' device is migrating properly?
*** Bug 1485208 has been marked as a duplicate of this bug. ***
Bleh; hangs like that are horrible - I don't know the iommu. Adding Peter in because he knows intel_iommu. given the backtrace I'm assuming the problem is a loss of an IPI somewhere.
Sorry to respond late. Yes this seems to be IR related, maybe something on APIC ID? Though I can't tell if without further investigation. Vitaly, please feel free to assign it to me if you like.
(In reply to Peter Xu from comment #16) > Sorry to respond late. Yes this seems to be IR related, maybe something on > APIC ID? Though I can't tell if without further investigation. Vitaly, > please feel free to assign it to me if you like. Thank you or your suggestion Peter, I'm going to follow it and reassign to you in a hope that your backlog is shorter than mine :-) Feel free to reassign back, especially in case it turns out to be a KVM and not QEMU issue.
It turns out to be an APIC userspace bug. Fix posted upstream for initial review: https://patchwork.ozlabs.org/project/qemu-devel/list/?series=136156
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
Reproduce host version: qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 kernel-4.18.0-176.el8.x86_64 seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64 guest:rhel8.2.0 test steps: 1.Boot VM with 200 vCPUs 2.Do migration on same host (qemu) migrate -d tcp:10.73.2.18:1234 test results: Migration completed but destination VM hangs with following console output: [ 388.798188] watchdog: BUG: soft lockup - CPU#69 stuck for 23s! [kworker/69:0:426] ... [ 416.798169] watchdog: BUG: soft lockup - CPU#69 stuck for 23s! [kworker/69:0:426] Verified the bug with "qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64" version with the same test steps. test results: Destination VM works well after migration. Additional info: 1)host name:lenovo-sr950-01.lab.eng.pek2.redhat.com 2)boot VM with cmd: /usr/libexec/qemu-kvm \ -name 'RHEL8' \ -machine q35,kernel-irqchip=split \ -device intel-iommu,intremap=on,eim=on \ -m 8G \ -smp 200,maxcpus=384,sockets=2,cores=96,threads=2 \ -cpu SandyBridge,enforce \ -rtc base=localtime,clock=host,driftfix=slew \ -nodefaults \ -device AC97 \ -vga qxl \ -chardev socket,id=seabioslog_log,path=/tmp/seabios-log1,server,nowait \ -device isa-debugcon,chardev=seabioslog_log,iobase=0x402 \ -device usb-ehci,id=usb1 \ -device usb-tablet,id=usb-tablet1 \ -boot menu=on \ -enable-kvm \ -monitor stdio \ -device pcie-root-port,id=root1,slot=2,chassis=1,bus=pcie.0 \ -netdev tap,id=netdev0,vhost=on \ -device virtio-net-pci,mac=BA:BC:13:83:4F:1D,id=net0,netdev=netdev0,status=on,bus=root1 \ -spice port=5900,disable-ticketing \ -qmp tcp:0:9997,server,nowait \ -device pcie-root-port,id=root2,slot=2,chassis=2,bus=pcie.0 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/mnt/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device virtio-blk-pci,drive=drive_image1,id=device_sysdisk,bus=root2,bootindex=1 \ -serial unix:/tmp/console1,server,nowait \
Additional info: Verified the bug with q35 + ovmf. host version: qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64 kernel-4.18.0-176.el8.x86_64 edk2-ovmf-20190829git37eef91017ad-6.el8.noarch guest:rhel8.2.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017