RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2054597 - Do operation to disk will hang in the guest of target host after hotplugging and migrating
Summary: Do operation to disk will hang in the guest of target host after hotplugging ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Igor Mammedov
QA Contact: Li Xiaohui
Jiri Herrmann
URL:
Whiteboard:
Depends On:
Blocks: 2062610
TreeView+ depends on / blocked
 
Reported: 2022-02-15 10:09 UTC by Meina Li
Modified: 2022-05-10 13:42 UTC (History)
16 users (show)

Fixed In Version: qemu-kvm-6.2.0-9.module+el8.6.0+14480+c0a3aa0f
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 2062610 (view as bug list)
Environment:
Last Closed: 2022-05-10 13:25:26 UTC
Type: Bug
Target Upstream Version: qemu-7.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-112314 0 None None None 2022-02-15 10:14:40 UTC
Red Hat Product Errata RHSA-2022:1759 0 None None None 2022-05-10 13:26:28 UTC

Description Meina Li 2022-02-15 10:09:32 UTC
Description of problem:
Do operation to disk will hang in the guest of target host after hotplugging and migrating

Version-Release number of selected component (if applicable):
both source host and target host:
libvirt-8.0.0-4.module+el8.6.0+14186+211b270d.x86_64
qemu-kvm-6.2.0-6.module+el8.6.0+14165+5e5e76ac.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare the src host and target host with the same disk image.
src host:
# ll /var/lib/avocado/data/avocado-vt/images/
total 4100692
-rw-r--r--. 1 root root 1122041856 Feb 15 03:44 jeos-27-x86_64.qcow2
-rw-r--r--. 1 root root  965848576 Feb 13 23:53 jeos-27-x86_64.qcow2.backup
-rw-r--r--. 1 root root  107374592 Feb 15 03:43 vdb_avocado-vt-vm1_smtest
target host:
# ll /var/lib/avocado/data/avocado-vt/images/
total 2078692
-rw-r--r--. 1 qemu qemu 2126446592 Feb 15 03:49 jeos-27-x86_64.qcow2
-rw-r--r--. 1 qemu qemu  107374592 Feb 15 03:44 vdb_avocado-vt-vm1_smtest
2. Attach vdb_avocado-vt-vm1_smtest to the guest.
# virsh attach-device avocado-vt-vm1 disk.xml
Device attached successfully
3. Migrate the guest to target host.
# virsh migrate --live --copy-storage-all --domain avocado-vt-vm1 --desturi qemu+ssh://10.8.2.181/system --verbose
Migration: [100 %]
4. In target host, do operations to the disk in guest.
# virsh console avocado-vt-vm1
Connected to domain 'avocado-vt-vm1'
Escape character is ^] (Ctrl + ])
...
# parted -s "/dev/vdb" mklabel msdos
---hang
or
# mkfs.ext4 /dev/vdb
mke2fs 1.45.6 (20-Mar-2020)
---hang

Actual results:
As above step 4, it will hang

Expected results:
Can do operations to the guest in target host successfully

Additional info:
1. It can also reproduce in RHEL 9: libvirt-8.0.0-4.el9.x86_64 and qemu-kvm-6.2.0-7.el9.x86_64.
2. Can do operations to the disk in the guest of source host successfully.
3. Can run successfully when the disk is cold-plugged to the guest.
3. After a long time, it may get some error info:
[  492.511195] INFO: task systemd-udevd:623 blocked for more than 120 seconds.
[  492.512269]       Not tainted 4.18.0-364.el8.x86_64 #1
[  492.512888] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.513825] task:systemd-udevd   state:D stack:    0 pid:  623 ppid:     1 flags:0x00000104
[  492.514821] Call Trace:
[  492.515148]  __schedule+0x2d1/0x830
[  492.515594]  schedule+0x35/0xa0
[  492.515978]  io_schedule+0x12/0x40
[  492.516406]  do_read_cache_page+0x4e7/0x740
[  492.516957]  ? blkdev_writepages+0x10/0x10
[  492.517507]  ? file_fdatawait_range+0x20/0x20
[  492.518080]  read_part_sector+0x38/0xda
[  492.518595]  read_lba+0x10f/0x220
[  492.519052]  efi_partition+0x1e4/0x6de
[  492.519545]  ? snprintf+0x49/0x60
[  492.519985]  ? is_gpt_valid.part.5+0x430/0x430
[  492.520566]  blk_add_partitions+0x164/0x3f0
[  492.521093]  bdev_disk_changed+0x6c/0xe0
[  492.521573]  __blkdev_get+0x321/0x340
[  492.522028]  blkdev_get+0x1a1/0x2c0
[  492.522451]  blkdev_get_by_dev+0x2f/0x40
[  492.522915]  blkdev_common_ioctl+0x80c/0x870
[  492.523437]  ? do_filp_open+0xa7/0x100
[  492.523896]  blkdev_ioctl+0x182/0x250
[  492.524353]  ? selinux_file_ioctl+0x17f/0x220
[  492.524908]  block_ioctl+0x39/0x40
[  492.525369]  do_vfs_ioctl+0xa4/0x680
[  492.525842]  ksys_ioctl+0x60/0x90
[  492.526298]  __x64_sys_ioctl+0x16/0x20
[  492.526789]  do_syscall_64+0x5b/0x1a0
[  492.527300]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  492.527944] RIP: 0033:0x7fa04be906db
[  492.528442] Code: Unable to access opcode bytes at RIP 0x7fa04be906b1.
[  492.529262] RSP: 002b:00007ffe248efdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  492.530618] RAX: ffffffffffffffda RBX: 00007ffe248efdf0 RCX: 00007fa04be906db
[  492.531801] RDX: 0000000000000000 RSI: 000000000000125f RDI: 0000000000000010
[  492.532979] RBP: 00007ffe248f0340 R08: 00007fa04ce7d3ec R09: 0000000000000000
[  492.534139] R10: 0000000000000000 R11: 0000000000000246 R12: 00005568c3e46520
[  492.535301] R13: 00007ffe248efe00 R14: 00007ffe248eff00 R15: 00005568c2540f2c

Comment 1 Li Xiaohui 2022-02-16 01:45:43 UTC
Recently I also filed two bugs about hotplug + migration on qemu-kvm-6.2.0-7.el9.x86_64, maybe they're same root cause, see below.
Bug 2053526 - Guest hit call trace during reboot after hotplug vdisks + migration
Bug 2053584 - watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]

Anyway, I would track these bugs later.

Comment 2 Li Xiaohui 2022-02-24 02:07:58 UTC
Hi Igor,
COuld you see is this bug for rhel8 same with below bug for rhel9?
Bug 2053584 - watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]?

Comment 3 Igor Mammedov 2022-02-24 07:48:42 UTC
(In reply to Li Xiaohui from comment #2)
> Hi Igor,
> COuld you see is this bug for rhel8 same with below bug for rhel9?
> Bug 2053584 - watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [cat:2843]?

Can you please provide the output of the HMP command 'info pci' from both the source (after hotplugging) and the destination.

Comment 4 Li Xiaohui 2022-02-28 10:13:36 UTC
Reproduce bug on rhel 8.6.0 (kernel-4.18.0-369.el8.x86_64 & qemu-kvm-6.2.0-8.module+el8.6.0+14324+050a5215.x86_64).

It should be same issue with bug 2053584 as:
1. diff the pci info on src host (after hotplug) and on dst host after migration:
<       BAR1: 32 bit memory at 0xfd600000 [0xfd600fff].
<       BAR4: 64 bit prefetchable memory at 0xfb200000 [0xfb203fff].
---
>       BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
>       BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe].

2. guest hang when operate the hotplugged vdisk after migration on dst host
3. guest hit call trace during reboot, but succeed to start finally:
call trace like:
*********************
entry_SYSCALL_64_after_hwframe+0x65/0xca
*********************

Comment 5 Li Xiaohui 2022-02-28 10:15:34 UTC
Hi Igor, could we get exception+ for this bug and fix it on rhel 8.6.0?

Comment 6 Igor Mammedov 2022-03-01 08:07:06 UTC
(In reply to Li Xiaohui from comment #5)
> Hi Igor, could we get exception+ for this bug and fix it on rhel 8.6.0?

done

Comment 8 Igor Mammedov 2022-03-01 08:10:57 UTC
Justification for exception:
 the bug is regression and breaks migration the latest machine type

Comment 13 Li Xiaohui 2022-03-11 07:55:40 UTC
Hi Igor, tried the scratch build on hosts (kernel-4.18.0-369.el8.x86_64 & qemu-img-6.2.0-7.el8.imammedo202203080816.x86_64), except some existed bugs, others work well. The build should fix this bug.

Test following cases, the four error cases due to existed bugs Bug 2043545 & Bug 2028337: 
--> Running case(1/7): RHEL7-96931-[migration] Migration after hot-plug virtio-serial (3 min 20 sec)--- PASS.
--> Running case(2/7): RHEL7-10039-[migration] Do migration after hot plug vdisk (3 min 32 sec)--- PASS.
--> Running case(3/7): RHEL7-10040-[migration] Do migration after hot remove vdisk (5 min 24 sec)--- PASS.
--> Running case(4/7): RHEL7-10078-[migration] Migrate guest after hot plug/unplug memory balloon device (5 min 16 sec)--- ERROR.
--> Running case(5/7): RHEL7-10079-[migration] Migrate guest after cpu hotplug/hotunplug in guest (RHEL only) (7 min 0 sec)--- ERROR.
--> Running case(6/7): RHEL7-10047-[migration] Ping-pong live migration with large vcpu and memory values of guest (6 min 0 sec)--- ERROR.
--> Running case(7/7): RHEL-178709-[migration] Basic migration test (3 min 28 sec)--- ERROR.

BTW, I also have repeated above RHEL7-96931 & RHEL7-10039 for 10 times with checking pci info on source (after hotplugging) and destination host (after migration), they all work well, no difference about pci info.

Comment 14 Igor Mammedov 2022-03-11 10:24:22 UTC
(In reply to Li Xiaohui from comment #13)
> Hi Igor, tried the scratch build on hosts (kernel-4.18.0-369.el8.x86_64 &
> qemu-img-6.2.0-7.el8.imammedo202203080816.x86_64), except some existed bugs,
> others work well. The build should fix this bug.
> 
> Test following cases, the four error cases due to existed bugs Bug 2043545 &
> Bug 2028337: 
> --> Running case(1/7): RHEL7-96931-[migration] Migration after hot-plug
> virtio-serial (3 min 20 sec)--- PASS.
> --> Running case(2/7): RHEL7-10039-[migration] Do migration after hot plug
> vdisk (3 min 32 sec)--- PASS.
> --> Running case(3/7): RHEL7-10040-[migration] Do migration after hot remove
> vdisk (5 min 24 sec)--- PASS.


> --> Running case(4/7): RHEL7-10078-[migration] Migrate guest after hot
> plug/unplug memory balloon device (5 min 16 sec)--- ERROR.
> --> Running case(5/7): RHEL7-10079-[migration] Migrate guest after cpu
> hotplug/hotunplug in guest (RHEL only) (7 min 0 sec)--- ERROR.
> --> Running case(6/7): RHEL7-10047-[migration] Ping-pong live migration with
> large vcpu and memory values of guest (6 min 0 sec)--- ERROR.
> --> Running case(7/7): RHEL-178709-[migration] Basic migration test (3 min
> 28 sec)--- ERROR.

these are not relevant for this BZ

> BTW, I also have repeated above RHEL7-96931 & RHEL7-10039 for 10 times with
> checking pci info on source (after hotplugging) and destination host (after
> migration), they all work well, no difference about pci info.

please, also test RHEL9.0 Bug 2053584 and report results there
so PMs could decide on granting an exception.

Comment 15 Li Xiaohui 2022-03-11 12:28:46 UTC
(In reply to Igor Mammedov from comment #14)

> > BTW, I also have repeated above RHEL7-96931 & RHEL7-10039 for 10 times with
> > checking pci info on source (after hotplugging) and destination host (after
> > migration), they all work well, no difference about pci info.
> 
> please, also test RHEL9.0 Bug 2053584 and report results there
> so PMs could decide on granting an exception.

Test also pass on RHEL9.0.0, I have added the test results in https://bugzilla.redhat.com/show_bug.cgi?id=2053584#c15, please check.

Comment 17 Yanan Fu 2022-03-17 03:15:46 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 18 Li Xiaohui 2022-03-21 11:59:44 UTC
Verify bug on qemu-kvm-6.2.0-9.module+el8.6.0+14480+c0a3aa0f.x86_64, same test steps as Comment 13, test pass. So mark this bug as verified.

Comment 23 errata-xmlrpc 2022-05-10 13:25:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1759


Note You need to log in before you can comment on or make changes to this bug.