RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2053584 - watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]
Summary: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Igor Mammedov
QA Contact: Li Xiaohui
URL:
Whiteboard:
: 2053526 (view as bug list)
Depends On:
Blocks: 2065398
TreeView+ depends on / blocked
 
Reported: 2022-02-11 14:40 UTC by Li Xiaohui
Modified: 2022-05-17 12:32 UTC (History)
15 users (show)

Fixed In Version: qemu-kvm-6.2.0-11.el9_0.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2065398 (view as bug list)
Environment:
Last Closed: 2022-05-17 12:25:28 UTC
Type: Bug
Target Upstream Version: qemu-7.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/rhel/src/qemu-kvm qemu-kvm merge_requests 124 0 None None None 2022-03-10 13:55:59 UTC
Red Hat Issue Tracker RHELPLAN-112038 0 None None None 2022-02-11 14:47:16 UTC
Red Hat Product Errata RHBA-2022:2307 0 None None None 2022-05-17 12:26:07 UTC

Description Li Xiaohui 2022-02-11 14:40:55 UTC
escription of problem:
Hotplug virtio-serial-bus and virtio-serial-port on source host, then migrate guest. After migration, transfer data from dst host to guest via the serial port, will get call trace info in guest:
watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]


Version-Release number of selected component (if applicable):
hosts info: kernel-5.14.0-55.el9.x86_64 & qemu-kvm-6.2.0-7.el9.x86_64
guest info: kernel-4.18.0-364.el8.x86_64


How reproducible:
3/3


Steps to Reproduce:
1.Boot a guest with '-chardev' on source host as qemu command line[1];
-chardev socket,path=/tmp/foo,server=on,wait=off,id=foo \
2.Hot plug virtio-serial-bus
{"execute": "device_add", "arguments": {"driver": "virtio-serial-pci", "bus": "extra_root1", "id": "virtio-serial0", "max_ports": "31"}, "id": "zlSwE71g"}
3.Hot plug virtio-serial-port
{"execute": "device_add", "arguments": {"driver": "virtserialport", "bus": "virtio-serial0.0", "chardev": "foo", "name": "com.redhat.rhevm.vdsm", "id": "port0"}, "id": "idBjtFYX"}
4.Boot a guest with virtio-serial bus and port:
-chardev socket,path=/tmp/foo-dst,server=on,wait=off,id=foo \
-device virtio-serial-pci,id=virtio-serial0,max_ports=31,bus=extra_root1 \
-device virtserialport,chardev=foo,id=port0,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0 \
5.Migrate vm from src to dst host;
6.Transfer data from dst host to vm through the hotplugged port:
6.1 On dst host:
# dd if=/dev/zero of=/home/host-file bs=1M count=1024
# cat /home/host-file | nc -U /tmp/foo-dst
6.2 In vm:
[root@guest ~]# ls /dev/vport*
/dev/vport3p1
[root@guest ~]# cat /dev/vport3p1 > /home/guest-file


Actual results:
During step 6.2, guest hit call trace and watchdog info through 'cat' cmd, but guest still works, only can't transfer data through serial port:
[  132.046596] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]
[  132.047212] Modules linked in: virtio_console fuse xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common nfit libnvdimm kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass i2c_i801 pcspkr joydev virtio_balloon lpc_ich xfs libcrc32c nvme_tcp(X) nvme_fabrics nvme nvme_core sd_mod t10_pi sg bochs_drm drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul drm_kms_helper crc32_pclmul crc32c_intel ahci libahci libata syscopyarea sysfillrect sysimgblt fb_sys_fops ghash_clmulni_intel drm virtio_net serio_raw net_failover virtio_scsi failover dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx
[  132.047274]  iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler
[  132.055122] CPU: 3 PID: 2843 Comm: cat Kdump: loaded Tainted: G               X --------- -  - 4.18.0-364.el8.x86_64 #1
[  132.056013] Hardware name: Red Hat KVM/RHEL, BIOS 1.15.0-1.el9 04/01/2014
[  132.056579] RIP: 0010:virtqueue_get_buf+0x0/0x10
[  132.056971] Code: b8 8e 31 ed e8 61 aa f2 ff e9 e7 fe ff ff 29 d0 80 73 71 01 66 89 43 48 e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 <0f> 1f 44 00 00 31 d2 e9 04 fe ff ff 0f 1f 40 00 0f 1f 44 00 00 80
[  132.058511] RSP: 0018:ff84c6c98123fbc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  132.059136] RAX: 0000000000000000 RBX: ff4c0e7b09ce27c8 RCX: 0000000000000001
[  132.059727] RDX: 0000000000000000 RSI: ff84c6c98123fbd4 RDI: ff4c0e7b0d727b40
[  132.060324] RBP: ff4c0e7b0d727b40 R08: 0000000000000000 R09: ff4c0e7b09ce27c8
[  132.060908] R10: 0000000109ce2000 R11: 0000000000000000 R12: ff4c0e7b09ce27a8
[  132.061498] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000001
[  132.062097] FS:  00007f81b63fa680(0000) GS:ff4c0e7b7bb80000(0000) knlGS:0000000000000000
[  132.062761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  132.063243] CR2: 00007f81b5f1f710 CR3: 0000000102932005 CR4: 0000000000771ee0
[  132.063835] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  132.064437] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  132.065028] PKRU: 55555554
[  132.065259] Call Trace:
[  132.065472]  __send_control_msg.part.23+0xd8/0xf0 [virtio_console]
[  132.065986]  send_control_msg.isra.24+0x26/0x30 [virtio_console]
[  132.066485]  port_fops_open+0x1c0/0x1f0 [virtio_console]
[  132.066931]  chrdev_open+0xcb/0x1e0
[  132.067237]  ? cdev_default_release+0x20/0x20
[  132.067607]  do_dentry_open+0x132/0x340
[  132.067927]  path_openat+0x53e/0x14f0
[  132.068244]  do_filp_open+0x93/0x100
[  132.068552]  ? getname_flags+0x4a/0x1e0
[  132.068877]  ? __check_object_size+0xa8/0x16b
[  132.069239]  do_sys_open+0x184/0x220
[  132.069546]  do_syscall_64+0x5b/0x1a0
[  132.069859]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  132.070285] RIP: 0033:0x7f81b5f29272
[  132.070584] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 35 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[  132.072122] RSP: 002b:00007fff37b23970 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  132.072740] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f81b5f29272
[  132.073334] RDX: 0000000000000000 RSI: 00007fff37b24c89 RDI: 00000000ffffff9c
[  132.073920] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  132.074508] R10: 0000000000000000 R11: 0000000000000246 R12: 000056381dc29790
[  132.075096] R13: 00007fff37b23c30 R14: 0000000000000000 R15: 0000000000020000


Message from syslogd@localhost at Feb  8 17:39:59 ...

 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]

[  159.980980] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]
[  159.981501] Modules linked in: virtio_console fuse xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common nfit libnvdimm kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass i2c_i801 pcspkr joydev virtio_balloon lpc_ich xfs libcrc32c nvme_tcp(X) nvme_fabrics nvme nvme_core sd_mod t10_pi sg bochs_drm drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul drm_kms_helper crc32_pclmul crc32c_intel ahci libahci libata syscopyarea sysfillrect sysimgblt fb_sys_fops ghash_clmulni_intel drm virtio_net serio_raw net_failover virtio_scsi failover dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx
[  159.981571]  iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler
[  159.989418] CPU: 3 PID: 2843 Comm: cat Kdump: loaded Tainted: G             L X --------- -  - 4.18.0-364.el8.x86_64 #1
[  159.990306] Hardware name: Red Hat KVM/RHEL, BIOS 1.15.0-1.el9 04/01/2014
[  159.990867] RIP: 0010:__send_control_msg.part.23+0xcb/0xf0 [virtio_console]
[  159.991450] Code: 00 75 3b 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 ef e8 15 9b 8d cc eb 0e 48 89 ef e8 9b 93 8d cc 84 c0 75 c1 f3 90 <48> 8d 74 24 04 48 89 ef e8 08 a0 8d cc 48 85 c0 74 e0 eb ab e8 fc
[  159.992989] RSP: 0018:ff84c6c98123fbd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  159.993611] RAX: 0000000000000000 RBX: ff4c0e7b09ce27c8 RCX: 0000000000000001
[  159.994201] RDX: 0000000000000000 RSI: ff84c6c98123fbd4 RDI: ff4c0e7b0d727b40
[  159.994785] RBP: ff4c0e7b0d727b40 R08: 0000000000000000 R09: ff4c0e7b09ce27c8
[  159.995372] R10: 0000000109ce2000 R11: 0000000000000000 R12: ff4c0e7b09ce27a8
[  159.995957] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000001
[  159.996542] FS:  00007f81b63fa680(0000) GS:ff4c0e7b7bb80000(0000) knlGS:0000000000000000
[  159.997215] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  159.997691] CR2: 00007f81b5f1f710 CR3: 0000000102932005 CR4: 0000000000771ee0
[  159.998283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  159.998879] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  159.999468] PKRU: 55555554
[  159.999695] Call Trace:
[  159.999911]  send_control_msg.isra.24+0x26/0x30 [virtio_console]
[  160.000411]  port_fops_open+0x1c0/0x1f0 [virtio_console]
[  160.000857]  chrdev_open+0xcb/0x1e0
[  160.001166]  ? cdev_default_release+0x20/0x20
[  160.001525]  do_dentry_open+0x132/0x340
[  160.001850]  path_openat+0x53e/0x14f0
[  160.002169]  do_filp_open+0x93/0x100
[  160.002472]  ? getname_flags+0x4a/0x1e0
[  160.002789]  ? __check_object_size+0xa8/0x16b
[  160.003157]  do_sys_open+0x184/0x220
[  160.003466]  do_syscall_64+0x5b/0x1a0
[  160.003773]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  160.004206] RIP: 0033:0x7f81b5f29272
[  160.004514] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 35 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[  160.006054] RSP: 002b:00007fff37b23970 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  160.006680] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f81b5f29272
[  160.007272] RDX: 0000000000000000 RSI: 00007fff37b24c89 RDI: 00000000ffffff9c
[  160.007857] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  160.008458] R10: 0000000000000000 R11: 0000000000000246 R12: 000056381dc29790
[  160.009052] R13: 00007fff37b23c30 R14: 0000000000000000 R15: 0000000000020000


Message from syslogd@localhost at Feb  8 17:40:27 ...

 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]

[  167.632140] rcu: INFO: rcu_sched self-detected stall on CPU
[  167.632627] rcu: 	3-....: (59942 ticks this GP) idle=94a/1/0x4000000000000002 softirq=9923/9923 fqs=14983 
[  167.633426] 	(t=60000 jiffies g=17169 q=3483)
[  167.633802] NMI backtrace for cpu 3
[  167.634099] CPU: 3 PID: 2843 Comm: cat Kdump: loaded Tainted: G             L X --------- -  - 4.18.0-364.el8.x86_64 #1
[  167.634985] Hardware name: Red Hat KVM/RHEL, BIOS 1.15.0-1.el9 04/01/2014
[  167.635551] Call Trace:
[  167.635762]  <IRQ>
[  167.635945]  dump_stack+0x41/0x60
[  167.636225]  nmi_cpu_backtrace.cold.8+0x13/0x4f
[  167.636606]  ? lapic_can_unplug_cpu.cold.30+0x37/0x37
[  167.637039]  nmi_trigger_cpumask_backtrace+0xde/0xe0
[  167.637456]  rcu_dump_cpu_stacks+0xc8/0xf8
[  167.637806]  rcu_sched_clock_irq.cold.99+0x116/0x39d
[  167.638215]  ? timekeeping_advance+0x372/0x5a0
[  167.638592]  ? irqtime_account_process_tick.isra.15+0xd9/0x170
[  167.639080]  ? tick_sched_do_timer+0x50/0x50
[  167.639443]  update_process_times+0x55/0x80
[  167.639802]  tick_sched_handle+0x22/0x60
[  167.640135]  tick_sched_timer+0x37/0x70
[  167.640453]  __hrtimer_run_queues+0x100/0x280
[  167.640815]  hrtimer_interrupt+0x100/0x220
[  167.641165]  smp_apic_timer_interrupt+0x6a/0x130
[  167.641554]  apic_timer_interrupt+0xf/0x20
[  167.641901]  </IRQ>
[  167.642082] RIP: 0010:virtqueue_get_buf+0x0/0x10
[  167.642475] Code: b8 8e 31 ed e8 61 aa f2 ff e9 e7 fe ff ff 29 d0 80 73 71 01 66 89 43 48 e9 cd fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 <0f> 1f 44 00 00 31 d2 e9 04 fe ff ff 0f 1f 40 00 0f 1f 44 00 00 80
[  167.643998] RSP: 0018:ff84c6c98123fbc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  167.644631] RAX: 0000000000000000 RBX: ff4c0e7b09ce27c8 RCX: 0000000000000001
[  167.645220] RDX: 0000000000000000 RSI: ff84c6c98123fbd4 RDI: ff4c0e7b0d727b40
[  167.645806] RBP: ff4c0e7b0d727b40 R08: 0000000000000000 R09: ff4c0e7b09ce27c8
[  167.646404] R10: 0000000109ce2000 R11: 0000000000000000 R12: ff4c0e7b09ce27a8
[  167.646987] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000001
[  167.647574]  __send_control_msg.part.23+0xd8/0xf0 [virtio_console]
[  167.648088]  send_control_msg.isra.24+0x26/0x30 [virtio_console]
[  167.648587]  port_fops_open+0x1c0/0x1f0 [virtio_console]
[  167.649032]  chrdev_open+0xcb/0x1e0
[  167.649323]  ? cdev_default_release+0x20/0x20
[  167.649692]  do_dentry_open+0x132/0x340
[  167.650019]  path_openat+0x53e/0x14f0
[  167.650324]  do_filp_open+0x93/0x100
[  167.650624]  ? getname_flags+0x4a/0x1e0
[  167.650954]  ? __check_object_size+0xa8/0x16b
[  167.651325]  do_sys_open+0x184/0x220
[  167.651623]  do_syscall_64+0x5b/0x1a0
[  167.651939]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  167.652359] RIP: 0033:0x7f81b5f29272
[  167.652666] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 35 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[  167.654200] RSP: 002b:00007fff37b23970 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  167.654824] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f81b5f29272
[  167.655424] RDX: 0000000000000000 RSI: 00007fff37b24c89 RDI: 00000000ffffff9c
[  167.656009] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  167.656597] R10: 0000000000000000 R11: 0000000000000246 R12: 000056381dc29790
[  167.657195] R13: 00007fff37b23c30 R14: 0000000000000000 R15: 0000000000020000


Expected results:
Transfer data successfully after migration.


Additional info:
1.Transfer data succeed on dst host after migration if boot vm with virtio serial bus and port (no hotplug) on source host;
2.Before migration, transfer data from src host to vm successfully with hotplugged serial bus and port.

Qemu command line:
/usr/libexec/qemu-kvm  \
-name "mouse-vm" \
-sandbox on \
-machine q35,memory-backend=pc.ram \
-cpu Icelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,waitpkg=on,rdpid=on,cldemote=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,avx-vnni=on,avx512-bf16=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,mpx=off,intel-pt=off,kvm_pv_unhalt=on \
-nodefaults  \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server=on,wait=off \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server=on,wait=off \
-chardev socket,path=/tmp/foo,server=on,wait=off,id=foo \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \
-device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \
-device pcie-root-port,port=0x20,chassis=21,id=extra_root0,bus=pcie.0,multifunction=on,addr=0x3 \
-device pcie-root-port,port=0x21,chassis=22,id=extra_root1,bus=pcie.0,addr=0x3.0x1 \
-device pcie-root-port,port=0x22,chassis=23,id=extra_root2,bus=pcie.0,addr=0x3.0x2 \
-device nec-usb-xhci,id=usb1,bus=root0,addr=0x0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1,addr=0x0 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0,write-cache=on \
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,netdev=tap0,bus=root2,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device virtio-balloon-pci,id=balloon0,bus=root3,addr=0x0 \
-device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \
-blockdev driver=file,auto-read-only=on,discard=unmap,aio=threads,cache.direct=on,cache.no-flush=off,filename=/mnt/nfs/rhel860-64-virtio-scsi.qcow2,node-name=drive_sys1 \
-blockdev driver=qcow2,node-name=drive_image1,read-only=off,cache.direct=on,cache.no-flush=off,file=drive_sys1 \
-netdev tap,id=tap0,vhost=on \
-m 4096 \
-object memory-backend-ram,id=pc.ram,size=4096M \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm  \
-qmp tcp:0:3333,server=on,wait=off \
-qmp tcp:0:9999,server=on,wait=off \
-qmp tcp:0:9888,server=on,wait=off \
-serial tcp:0:4444,server=on,wait=off \
-monitor stdio \
-msg timestamp=on \

Comment 1 Dr. David Alan Gilbert 2022-02-14 11:04:11 UTC
I wonder if this is the same problem as 2053526 - they're both failures with hotplug and migration, just different devices.

Again, can we have 'info pci' from both the source (after hotplug) and the destination.

Comment 2 Li Xiaohui 2022-02-16 04:49:09 UTC
(In reply to Dr. David Alan Gilbert from comment #1)
> I wonder if this is the same problem as 2053526 - they're both failures with
> hotplug and migration, just different devices.

I'm not sure. 
Note I tried this bug again, guest should be crashed because fail to reboot it after hit error when transfer data. 

> 
> Again, can we have 'info pci' from both the source (after hotplug) and the
> destination.

Attach them into attachment.

The diff pci info on source (after hotplugging) and destination (after migration):
<       BAR1: 32 bit memory at 0xfd600000 [0xfd600fff].
<       BAR4: 64 bit prefetchable memory at 0xfb200000 [0xfb203fff].
---
>       BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
>       BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe].

Comment 5 Dr. David Alan Gilbert 2022-02-16 09:15:27 UTC
Yes I think it is probably the same bug; the BARs on the destination are unprogrammed; but all the bus addressing matches correctly,
so I don't think there's an obvious commandline error.

Comment 6 Igor Mammedov 2022-02-21 07:17:28 UTC
(In reply to Dr. David Alan Gilbert from comment #5)
> Yes I think it is probably the same bug; the BARs on the destination are
> unprogrammed; but all the bus addressing matches correctly,
> so I don't think there's an obvious commandline error.

It looks like regression was introduced by a mix of fixes to PCIe and
ACPI hotplug support.
Bisection points to
  d5daff7d3126 pcie: implement slot power control for pcie root ports

Comment 7 Igor Mammedov 2022-02-23 16:03:53 UTC
Simplified steps to reproduce:

1. start VM with following QEMU CLI and let a guest OS boot completely (use q35 6.2 based machine type with ACPI PCI hotplug enabled by default)

  -M q35 -device pcie-root-port,port=0x20,chassis=21,id=extra_root0,bus=pcie.0,addr=0x3 -monitor stdio ...

2. hotplug a pci device at monitor prompt (other means shall also work)

(qemu) device_add virtio-serial-pci,id=virtio-serial0,max_ports=31,bus=extra_root0

3. check that hotplugged device is initialized

(qemu) info pci
...
  Bus  1, device   0, function 0:
    Class 1920: PCI device 1af4:1043
      PCI subsystem 1af4:1100
      IRQ 0, pin A
      BAR1: 32 bit memory at 0xfe800000 [0xfe800fff].
      BAR4: 64 bit prefetchable memory at 0xfe000000 [0xfe003fff].
      id "virtio-serial0"
...

4. migrate VM to file (live miration should also work)

(qemu) migrate "exec:gzip -c > STATEFILE.gz"
(qemu) quit

5. restore VM on target QEMU with following CLI: 
  -M q35 -device pcie-root-port,port=0x20,chassis=21,id=extra_root0,bus=pcie.0,addr=0x3 -monitor stdio \
  -device virtio-serial-pci,id=virtio-serial0,max_ports=31,bus=extra_root0  -incoming "exec: gzip -c -d STATEFILE.gz" \
  ...

6. Check that hotplugged device BARs are the same as in step 3
(qemu) info pci
...
  Bus  1, device   0, function 0:
    Class 1920: PCI device 1af4:1043
      PCI subsystem 1af4:1100
      IRQ 0, pin A
      BAR1: 32 bit memory at 0xfe800000 [0xfe800fff].
      BAR4: 64 bit prefetchable memory at 0xfe000000 [0xfe003fff].
      id "virtio-serial0"
...

Comment 8 Igor Mammedov 2022-02-23 16:40:07 UTC
*** Bug 2053526 has been marked as a duplicate of this bug. ***

Comment 9 Li Xiaohui 2022-02-24 10:43:56 UTC
According to Comment 6, I would add 'Regression' keyword. thanks

Comment 10 Li Xiaohui 2022-02-28 06:37:04 UTC
Hi Igor,
Shall we add exception+ for this bug and fix it on rhel9.0.0 since it blocked hotplug + migration scenarios.

Comment 11 Igor Mammedov 2022-03-01 08:02:18 UTC
Thanks fro remainder,
I've just requested exception for it

Comment 14 Li Xiaohui 2022-03-09 01:38:37 UTC
(In reply to Igor Mammedov from comment #13)
> Li Xiaohui,
> 
> can you verify it, please?

Having downloaded the scratch build, I will verify it later. Thanks.

> 
> scratch build:
> 
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=43641393
> http://download.eng.bos.redhat.com/brewroot/work/tasks/1393/43641393/qemu-
> kvm-core-6.2.0-11.el9_0.imammedo202203080850.x86_64.rpm

Comment 15 Li Xiaohui 2022-03-11 07:50:11 UTC
Hi Igor, I have tried the scratch build on hosts (kernel-5.14.0-70.el9.x86_64 & qemu-img-6.2.0-11.el9_0.imammedo202203080850.x86_64), test pass, the build should fix this bug.

Test following cases, all pass:
--> Running case(1/7): RHEL7-96931-[migration] Migration after hot-plug virtio-serial (3 min 48 sec)--- PASS.
--> Running case(2/7): RHEL7-10039-[migration] Do migration after hot plug vdisk (2 min 48 sec)--- PASS.
--> Running case(3/7): RHEL7-10040-[migration] Do migration after hot remove vdisk (4 min 36 sec)--- PASS.
--> Running case(4/7): RHEL7-10078-[migration] Migrate guest after hot plug/unplug memory balloon device (6 min 20 sec)--- PASS.
--> Running case(5/7): RHEL7-10079-[migration] Migrate guest after cpu hotplug/hotunplug in guest (RHEL only) (3 min 0 sec)--- PASS.
--> Running case(6/7): RHEL7-10047-[migration] Ping-pong live migration with large vcpu and memory values of guest (6 min 4 sec)--- PASS.
--> Running case(7/7): RHEL-178709-[migration] Basic migration test (4 min 24 sec)--- PASS.


BTW, I also have repeated above RHEL7-96931 & RHEL7-10039 for 5 times with checking pci info on source (after hotplugging) and destination host (after migration), they all work well, no difference about pci info.

Comment 20 Yanan Fu 2022-03-23 06:29:32 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 24 Li Xiaohui 2022-03-25 13:34:21 UTC
Verify this bug on kernel-5.14.0-70.3.1.el9_0.x86_64 & qemu-kvm-6.2.0-11.el9_0.2.x86_64.

Same test scenarios as Comment 15, cases pass.

Mark this bug as verified per the test results.

Comment 26 errata-xmlrpc 2022-05-17 12:25:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2307


Note You need to log in before you can comment on or make changes to this bug.