Bug 2053584

Summary: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]
Product: Red Hat Enterprise Linux 9 Reporter: Li Xiaohui <xiaohli>
Component: qemu-kvmAssignee: Igor Mammedov <imammedo>
qemu-kvm sub component: Live Migration QA Contact: Li Xiaohui <xiaohli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: ailan, chayang, dgilbert, fjin, imammedo, jinzhao, leobras, mdean, meili, nanliu, peterx, pvlasin, quintela, virt-maint, ymankad
Version: 9.0Keywords: Regression, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-11.el9_0.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2065398 (view as bug list) Environment:
Last Closed: 2022-05-17 12:25:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: qemu-7.0
Embargoed:
Bug Depends On:    
Bug Blocks: 2065398    

Description Li Xiaohui 2022-02-11 14:40:55 UTC
escription of problem:
Hotplug virtio-serial-bus and virtio-serial-port on source host, then migrate guest. After migration, transfer data from dst host to guest via the serial port, will get call trace info in guest:
watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]


Version-Release number of selected component (if applicable):
hosts info: kernel-5.14.0-55.el9.x86_64 & qemu-kvm-6.2.0-7.el9.x86_64
guest info: kernel-4.18.0-364.el8.x86_64


How reproducible:
3/3


Steps to Reproduce:
1.Boot a guest with '-chardev' on source host as qemu command line[1];
-chardev socket,path=/tmp/foo,server=on,wait=off,id=foo \
2.Hot plug virtio-serial-bus
{"execute": "device_add", "arguments": {"driver": "virtio-serial-pci", "bus": "extra_root1", "id": "virtio-serial0", "max_ports": "31"}, "id": "zlSwE71g"}
3.Hot plug virtio-serial-port
{"execute": "device_add", "arguments": {"driver": "virtserialport", "bus": "virtio-serial0.0", "chardev": "foo", "name": "com.redhat.rhevm.vdsm", "id": "port0"}, "id": "idBjtFYX"}
4.Boot a guest with virtio-serial bus and port:
-chardev socket,path=/tmp/foo-dst,server=on,wait=off,id=foo \
-device virtio-serial-pci,id=virtio-serial0,max_ports=31,bus=extra_root1 \
-device virtserialport,chardev=foo,id=port0,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0 \
5.Migrate vm from src to dst host;
6.Transfer data from dst host to vm through the hotplugged port:
6.1 On dst host:
# dd if=/dev/zero of=/home/host-file bs=1M count=1024
# cat /home/host-file | nc -U /tmp/foo-dst
6.2 In vm:
[root@guest ~]# ls /dev/vport*
/dev/vport3p1
[root@guest ~]# cat /dev/vport3p1 > /home/guest-file


Actual results:
During step 6.2, guest hit call trace and watchdog info through 'cat' cmd, but guest still works, only can't transfer data through serial port:
[  132.046596] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]
[  132.047212] Modules linked in: virtio_console fuse xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common nfit libnvdimm kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass i2c_i801 pcspkr joydev virtio_balloon lpc_ich xfs libcrc32c nvme_tcp(X) nvme_fabrics nvme nvme_core sd_mod t10_pi sg bochs_drm drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul drm_kms_helper crc32_pclmul crc32c_intel ahci libahci libata syscopyarea sysfillrect sysimgblt fb_sys_fops ghash_clmulni_intel drm virtio_net serio_raw net_failover virtio_scsi failover dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx
[  132.047274]  iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler
[  132.055122] CPU: 3 PID: 2843 Comm: cat Kdump: loaded Tainted: G               X --------- -  - 4.18.0-364.el8.x86_64 #1
[  132.056013] Hardware name: Red Hat KVM/RHEL, BIOS 1.15.0-1.el9 04/01/2014
[  132.056579] RIP: 0010:virtqueue_get_buf+0x0/0x10
[  132.056971] Code: b8 8e 31 ed e8 61 aa f2 ff e9 e7 fe ff ff 29 d0 80 73 71 01 66 89 43 48 e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 <0f> 1f 44 00 00 31 d2 e9 04 fe ff ff 0f 1f 40 00 0f 1f 44 00 00 80
[  132.058511] RSP: 0018:ff84c6c98123fbc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  132.059136] RAX: 0000000000000000 RBX: ff4c0e7b09ce27c8 RCX: 0000000000000001
[  132.059727] RDX: 0000000000000000 RSI: ff84c6c98123fbd4 RDI: ff4c0e7b0d727b40
[  132.060324] RBP: ff4c0e7b0d727b40 R08: 0000000000000000 R09: ff4c0e7b09ce27c8
[  132.060908] R10: 0000000109ce2000 R11: 0000000000000000 R12: ff4c0e7b09ce27a8
[  132.061498] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000001
[  132.062097] FS:  00007f81b63fa680(0000) GS:ff4c0e7b7bb80000(0000) knlGS:0000000000000000
[  132.062761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  132.063243] CR2: 00007f81b5f1f710 CR3: 0000000102932005 CR4: 0000000000771ee0
[  132.063835] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  132.064437] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  132.065028] PKRU: 55555554
[  132.065259] Call Trace:
[  132.065472]  __send_control_msg.part.23+0xd8/0xf0 [virtio_console]
[  132.065986]  send_control_msg.isra.24+0x26/0x30 [virtio_console]
[  132.066485]  port_fops_open+0x1c0/0x1f0 [virtio_console]
[  132.066931]  chrdev_open+0xcb/0x1e0
[  132.067237]  ? cdev_default_release+0x20/0x20
[  132.067607]  do_dentry_open+0x132/0x340
[  132.067927]  path_openat+0x53e/0x14f0
[  132.068244]  do_filp_open+0x93/0x100
[  132.068552]  ? getname_flags+0x4a/0x1e0
[  132.068877]  ? __check_object_size+0xa8/0x16b
[  132.069239]  do_sys_open+0x184/0x220
[  132.069546]  do_syscall_64+0x5b/0x1a0
[  132.069859]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  132.070285] RIP: 0033:0x7f81b5f29272
[  132.070584] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 35 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[  132.072122] RSP: 002b:00007fff37b23970 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  132.072740] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f81b5f29272
[  132.073334] RDX: 0000000000000000 RSI: 00007fff37b24c89 RDI: 00000000ffffff9c
[  132.073920] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  132.074508] R10: 0000000000000000 R11: 0000000000000246 R12: 000056381dc29790
[  132.075096] R13: 00007fff37b23c30 R14: 0000000000000000 R15: 0000000000020000


Message from syslogd@localhost at Feb  8 17:39:59 ...

 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]

[  159.980980] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]
[  159.981501] Modules linked in: virtio_console fuse xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common nfit libnvdimm kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass i2c_i801 pcspkr joydev virtio_balloon lpc_ich xfs libcrc32c nvme_tcp(X) nvme_fabrics nvme nvme_core sd_mod t10_pi sg bochs_drm drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul drm_kms_helper crc32_pclmul crc32c_intel ahci libahci libata syscopyarea sysfillrect sysimgblt fb_sys_fops ghash_clmulni_intel drm virtio_net serio_raw net_failover virtio_scsi failover dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx
[  159.981571]  iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler
[  159.989418] CPU: 3 PID: 2843 Comm: cat Kdump: loaded Tainted: G             L X --------- -  - 4.18.0-364.el8.x86_64 #1
[  159.990306] Hardware name: Red Hat KVM/RHEL, BIOS 1.15.0-1.el9 04/01/2014
[  159.990867] RIP: 0010:__send_control_msg.part.23+0xcb/0xf0 [virtio_console]
[  159.991450] Code: 00 75 3b 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 ef e8 15 9b 8d cc eb 0e 48 89 ef e8 9b 93 8d cc 84 c0 75 c1 f3 90 <48> 8d 74 24 04 48 89 ef e8 08 a0 8d cc 48 85 c0 74 e0 eb ab e8 fc
[  159.992989] RSP: 0018:ff84c6c98123fbd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  159.993611] RAX: 0000000000000000 RBX: ff4c0e7b09ce27c8 RCX: 0000000000000001
[  159.994201] RDX: 0000000000000000 RSI: ff84c6c98123fbd4 RDI: ff4c0e7b0d727b40
[  159.994785] RBP: ff4c0e7b0d727b40 R08: 0000000000000000 R09: ff4c0e7b09ce27c8
[  159.995372] R10: 0000000109ce2000 R11: 0000000000000000 R12: ff4c0e7b09ce27a8
[  159.995957] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000001
[  159.996542] FS:  00007f81b63fa680(0000) GS:ff4c0e7b7bb80000(0000) knlGS:0000000000000000
[  159.997215] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  159.997691] CR2: 00007f81b5f1f710 CR3: 0000000102932005 CR4: 0000000000771ee0
[  159.998283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  159.998879] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  159.999468] PKRU: 55555554
[  159.999695] Call Trace:
[  159.999911]  send_control_msg.isra.24+0x26/0x30 [virtio_console]
[  160.000411]  port_fops_open+0x1c0/0x1f0 [virtio_console]
[  160.000857]  chrdev_open+0xcb/0x1e0
[  160.001166]  ? cdev_default_release+0x20/0x20
[  160.001525]  do_dentry_open+0x132/0x340
[  160.001850]  path_openat+0x53e/0x14f0
[  160.002169]  do_filp_open+0x93/0x100
[  160.002472]  ? getname_flags+0x4a/0x1e0
[  160.002789]  ? __check_object_size+0xa8/0x16b
[  160.003157]  do_sys_open+0x184/0x220
[  160.003466]  do_syscall_64+0x5b/0x1a0
[  160.003773]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  160.004206] RIP: 0033:0x7f81b5f29272
[  160.004514] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 35 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[  160.006054] RSP: 002b:00007fff37b23970 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  160.006680] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f81b5f29272
[  160.007272] RDX: 0000000000000000 RSI: 00007fff37b24c89 RDI: 00000000ffffff9c
[  160.007857] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  160.008458] R10: 0000000000000000 R11: 0000000000000246 R12: 000056381dc29790
[  160.009052] R13: 00007fff37b23c30 R14: 0000000000000000 R15: 0000000000020000


Message from syslogd@localhost at Feb  8 17:40:27 ...

 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]

[  167.632140] rcu: INFO: rcu_sched self-detected stall on CPU
[  167.632627] rcu: 	3-....: (59942 ticks this GP) idle=94a/1/0x4000000000000002 softirq=9923/9923 fqs=14983 
[  167.633426] 	(t=60000 jiffies g=17169 q=3483)
[  167.633802] NMI backtrace for cpu 3
[  167.634099] CPU: 3 PID: 2843 Comm: cat Kdump: loaded Tainted: G             L X --------- -  - 4.18.0-364.el8.x86_64 #1
[  167.634985] Hardware name: Red Hat KVM/RHEL, BIOS 1.15.0-1.el9 04/01/2014
[  167.635551] Call Trace:
[  167.635762]  <IRQ>
[  167.635945]  dump_stack+0x41/0x60
[  167.636225]  nmi_cpu_backtrace.cold.8+0x13/0x4f
[  167.636606]  ? lapic_can_unplug_cpu.cold.30+0x37/0x37
[  167.637039]  nmi_trigger_cpumask_backtrace+0xde/0xe0
[  167.637456]  rcu_dump_cpu_stacks+0xc8/0xf8
[  167.637806]  rcu_sched_clock_irq.cold.99+0x116/0x39d
[  167.638215]  ? timekeeping_advance+0x372/0x5a0
[  167.638592]  ? irqtime_account_process_tick.isra.15+0xd9/0x170
[  167.639080]  ? tick_sched_do_timer+0x50/0x50
[  167.639443]  update_process_times+0x55/0x80
[  167.639802]  tick_sched_handle+0x22/0x60
[  167.640135]  tick_sched_timer+0x37/0x70
[  167.640453]  __hrtimer_run_queues+0x100/0x280
[  167.640815]  hrtimer_interrupt+0x100/0x220
[  167.641165]  smp_apic_timer_interrupt+0x6a/0x130
[  167.641554]  apic_timer_interrupt+0xf/0x20
[  167.641901]  </IRQ>
[  167.642082] RIP: 0010:virtqueue_get_buf+0x0/0x10
[  167.642475] Code: b8 8e 31 ed e8 61 aa f2 ff e9 e7 fe ff ff 29 d0 80 73 71 01 66 89 43 48 e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 <0f> 1f 44 00 00 31 d2 e9 04 fe ff ff 0f 1f 40 00 0f 1f 44 00 00 80
[  167.643998] RSP: 0018:ff84c6c98123fbc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  167.644631] RAX: 0000000000000000 RBX: ff4c0e7b09ce27c8 RCX: 0000000000000001
[  167.645220] RDX: 0000000000000000 RSI: ff84c6c98123fbd4 RDI: ff4c0e7b0d727b40
[  167.645806] RBP: ff4c0e7b0d727b40 R08: 0000000000000000 R09: ff4c0e7b09ce27c8
[  167.646404] R10: 0000000109ce2000 R11: 0000000000000000 R12: ff4c0e7b09ce27a8
[  167.646987] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000001
[  167.647574]  __send_control_msg.part.23+0xd8/0xf0 [virtio_console]
[  167.648088]  send_control_msg.isra.24+0x26/0x30 [virtio_console]
[  167.648587]  port_fops_open+0x1c0/0x1f0 [virtio_console]
[  167.649032]  chrdev_open+0xcb/0x1e0
[  167.649323]  ? cdev_default_release+0x20/0x20
[  167.649692]  do_dentry_open+0x132/0x340
[  167.650019]  path_openat+0x53e/0x14f0
[  167.650324]  do_filp_open+0x93/0x100
[  167.650624]  ? getname_flags+0x4a/0x1e0
[  167.650954]  ? __check_object_size+0xa8/0x16b
[  167.651325]  do_sys_open+0x184/0x220
[  167.651623]  do_syscall_64+0x5b/0x1a0
[  167.651939]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  167.652359] RIP: 0033:0x7f81b5f29272
[  167.652666] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 35 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[  167.654200] RSP: 002b:00007fff37b23970 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  167.654824] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f81b5f29272
[  167.655424] RDX: 0000000000000000 RSI: 00007fff37b24c89 RDI: 00000000ffffff9c
[  167.656009] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  167.656597] R10: 0000000000000000 R11: 0000000000000246 R12: 000056381dc29790
[  167.657195] R13: 00007fff37b23c30 R14: 0000000000000000 R15: 0000000000020000


Expected results:
Transfer data successfully after migration.


Additional info:
1.Transfer data succeed on dst host after migration if boot vm with virtio serial bus and port (no hotplug) on source host;
2.Before migration, transfer data from src host to vm successfully with hotplugged serial bus and port.

Qemu command line:
/usr/libexec/qemu-kvm  \
-name "mouse-vm" \
-sandbox on \
-machine q35,memory-backend=pc.ram \
-cpu Icelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,waitpkg=on,rdpid=on,cldemote=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,avx-vnni=on,avx512-bf16=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,kvm_pv_unhalt=on \
-nodefaults  \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server=on,wait=off \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server=on,wait=off \
-chardev socket,path=/tmp/foo,server=on,wait=off,id=foo \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \
-device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \
-device pcie-root-port,port=0x20,chassis=21,id=extra_root0,bus=pcie.0,multifunction=on,addr=0x3 \
-device pcie-root-port,port=0x21,chassis=22,id=extra_root1,bus=pcie.0,addr=0x3.0x1 \
-device pcie-root-port,port=0x22,chassis=23,id=extra_root2,bus=pcie.0,addr=0x3.0x2 \
-device nec-usb-xhci,id=usb1,bus=root0,addr=0x0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1,addr=0x0 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0,write-cache=on \
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,netdev=tap0,bus=root2,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device virtio-balloon-pci,id=balloon0,bus=root3,addr=0x0 \
-device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \
-blockdev driver=file,auto-read-only=on,discard=unmap,aio=threads,cache.direct=on,cache.no-flush=off,filename=/mnt/nfs/rhel860-64-virtio-scsi.qcow2,node-name=drive_sys1 \
-blockdev driver=qcow2,node-name=drive_image1,read-only=off,cache.direct=on,cache.no-flush=off,file=drive_sys1 \
-netdev tap,id=tap0,vhost=on \
-m 4096 \
-object memory-backend-ram,id=pc.ram,size=4096M \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm  \
-qmp tcp:0:3333,server=on,wait=off \
-qmp tcp:0:9999,server=on,wait=off \
-qmp tcp:0:9888,server=on,wait=off \
-serial tcp:0:4444,server=on,wait=off \
-monitor stdio \
-msg timestamp=on \

Comment 1 Dr. David Alan Gilbert 2022-02-14 11:04:11 UTC
I wonder if this is the same problem as 2053526 - they're both failures with hotplug and migration, just different devices.

Again, can we have 'info pci' from both the source (after hotplug) and the destination.

Comment 2 Li Xiaohui 2022-02-16 04:49:09 UTC
(In reply to Dr. David Alan Gilbert from comment #1)
> I wonder if this is the same problem as 2053526 - they're both failures with
> hotplug and migration, just different devices.

I'm not sure. 
Note I tried this bug again, guest should be crashed because fail to reboot it after hit error when transfer data. 

> 
> Again, can we have 'info pci' from both the source (after hotplug) and the
> destination.

Attach them into attachment.

The diff pci info on source (after hotplugging) and destination (after migration):
<       BAR1: 32 bit memory at 0xfd600000 [0xfd600fff].
<       BAR4: 64 bit prefetchable memory at 0xfb200000 [0xfb203fff].
---
>       BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
>       BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe].

Comment 5 Dr. David Alan Gilbert 2022-02-16 09:15:27 UTC
Yes I think it is probably the same bug; the BARs on the destination are unprogrammed; but all the bus addressing matches correctly,
so I don't think there's an obvious commandline error.

Comment 6 Igor Mammedov 2022-02-21 07:17:28 UTC
(In reply to Dr. David Alan Gilbert from comment #5)
> Yes I think it is probably the same bug; the BARs on the destination are
> unprogrammed; but all the bus addressing matches correctly,
> so I don't think there's an obvious commandline error.

It looks like regression was introduced by a mix of fixes to PCIe and
ACPI hotplug support.
Bisection points to
  d5daff7d3126 pcie: implement slot power control for pcie root ports

Comment 7 Igor Mammedov 2022-02-23 16:03:53 UTC
Simplified steps to reproduce:

1. start VM with following QEMU CLI and let a guest OS boot completely (use q35 6.2 based machine type with ACPI PCI hotplug enabled by default)

  -M q35 -device pcie-root-port,port=0x20,chassis=21,id=extra_root0,bus=pcie.0,addr=0x3 -monitor stdio ...

2. hotplug a pci device at monitor prompt (other means shall also work)

(qemu) device_add virtio-serial-pci,id=virtio-serial0,max_ports=31,bus=extra_root0

3. check that hotplugged device is initialized

(qemu) info pci
...
  Bus  1, device   0, function 0:
    Class 1920: PCI device 1af4:1043
      PCI subsystem 1af4:1100
      IRQ 0, pin A
      BAR1: 32 bit memory at 0xfe800000 [0xfe800fff].
      BAR4: 64 bit prefetchable memory at 0xfe000000 [0xfe003fff].
      id "virtio-serial0"
...

4. migrate VM to file (live miration should also work)

(qemu) migrate "exec:gzip -c > STATEFILE.gz"
(qemu) quit

5. restore VM on target QEMU with following CLI: 
  -M q35 -device pcie-root-port,port=0x20,chassis=21,id=extra_root0,bus=pcie.0,addr=0x3 -monitor stdio \
  -device virtio-serial-pci,id=virtio-serial0,max_ports=31,bus=extra_root0  -incoming "exec: gzip -c -d STATEFILE.gz" \
  ...

6. Check that hotplugged device BARs are the same as in step 3
(qemu) info pci
...
  Bus  1, device   0, function 0:
    Class 1920: PCI device 1af4:1043
      PCI subsystem 1af4:1100
      IRQ 0, pin A
      BAR1: 32 bit memory at 0xfe800000 [0xfe800fff].
      BAR4: 64 bit prefetchable memory at 0xfe000000 [0xfe003fff].
      id "virtio-serial0"
...

Comment 8 Igor Mammedov 2022-02-23 16:40:07 UTC
*** Bug 2053526 has been marked as a duplicate of this bug. ***

Comment 9 Li Xiaohui 2022-02-24 10:43:56 UTC
According to Comment 6, I would add 'Regression' keyword. thanks

Comment 10 Li Xiaohui 2022-02-28 06:37:04 UTC
Hi Igor,
Shall we add exception+ for this bug and fix it on rhel9.0.0 since it blocked hotplug + migration scenarios.

Comment 11 Igor Mammedov 2022-03-01 08:02:18 UTC
Thanks fro remainder,
I've just requested exception for it

Comment 14 Li Xiaohui 2022-03-09 01:38:37 UTC
(In reply to Igor Mammedov from comment #13)
> Li Xiaohui,
> 
> can you verify it, please?

Having downloaded the scratch build, I will verify it later. Thanks.

> 
> scratch build:
> 
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=43641393
> http://download.eng.bos.redhat.com/brewroot/work/tasks/1393/43641393/qemu-
> kvm-core-6.2.0-11.el9_0.imammedo202203080850.x86_64.rpm

Comment 15 Li Xiaohui 2022-03-11 07:50:11 UTC
Hi Igor, I have tried the scratch build on hosts (kernel-5.14.0-70.el9.x86_64 & qemu-img-6.2.0-11.el9_0.imammedo202203080850.x86_64), test pass, the build should fix this bug.

Test following cases, all pass:
--> Running case(1/7): RHEL7-96931-[migration] Migration after hot-plug virtio-serial (3 min 48 sec)--- PASS.
--> Running case(2/7): RHEL7-10039-[migration] Do migration after hot plug vdisk (2 min 48 sec)--- PASS.
--> Running case(3/7): RHEL7-10040-[migration] Do migration after hot remove vdisk (4 min 36 sec)--- PASS.
--> Running case(4/7): RHEL7-10078-[migration] Migrate guest after hot plug/unplug memory balloon device (6 min 20 sec)--- PASS.
--> Running case(5/7): RHEL7-10079-[migration] Migrate guest after cpu hotplug/hotunplug in guest (RHEL only) (3 min 0 sec)--- PASS.
--> Running case(6/7): RHEL7-10047-[migration] Ping-pong live migration with large vcpu and memory values of guest (6 min 4 sec)--- PASS.
--> Running case(7/7): RHEL-178709-[migration] Basic migration test (4 min 24 sec)--- PASS.


BTW, I also have repeated above RHEL7-96931 & RHEL7-10039 for 5 times with checking pci info on source (after hotplugging) and destination host (after migration), they all work well, no difference about pci info.

Comment 20 Yanan Fu 2022-03-23 06:29:32 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 24 Li Xiaohui 2022-03-25 13:34:21 UTC
Verify this bug on kernel-5.14.0-70.3.1.el9_0.x86_64 & qemu-kvm-6.2.0-11.el9_0.2.x86_64.

Same test scenarios as Comment 15, cases pass.

Mark this bug as verified per the test results.

Comment 26 errata-xmlrpc 2022-05-17 12:25:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2307