This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1817965 - Live post-copy migration of the vm with failover VF device fails.
Summary: Live post-copy migration of the vm with failover VF device fails.
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: Yanhui Ma
Jiri Herrmann
URL:
Whiteboard:
: 1817986 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-27 10:47 UTC by Yanghang Liu
Modified: 2023-09-22 16:14 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.Live post-copy migration of VMs with failover VFs fails Currently, attempting to post-copy migrate a running virtual machine (VM) fails if the VM uses a device with the virtual function (VF) failover capability enabled. To work around the problem, use the standard migration type, rather than post-copy migration.
Clone Of:
Environment:
Last Closed: 2023-09-22 16:14:22 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-7335 0 None Migrated None 2023-09-22 16:14:15 UTC

Description Yanghang Liu 2020-03-27 10:47:27 UTC
Description of problem:
live post-copy migration of the vm with failover VF device fails.


Version-Release number of selected component (if applicable):
guest:
4.18.0-192.el8.x86_64
host:
4.18.0-192.el8.x86_64
qemu-kvm-4.2.0-16.module+el8.2.0+6092+4f2391c1.x86_64

How reproducible:
100%

Steps to Reproduce:
1.On source host,create NetXtreme BCM57810 VF and set the mac address of the VF
# ip link set enp131s0f0 vf 0  mac 22:2b:62:bb:a9:82

2.start a source guest with NetXtreme BCM57810 VF which enables failover
/usr/libexec/qemu-kvm -name rhel8-2 -M q35 -enable-kvm \
-monitor stdio \
-nodefaults \
-m 4G \
-boot menu=on \
-cpu Haswell-noTSX-IBRS \
-device pcie-root-port,id=root.1,chassis=1,addr=0x2.0,multifunction=on \
-device pcie-root-port,id=root.2,chassis=2,addr=0x2.1 \
-device pcie-root-port,id=root.3,chassis=3,addr=0x2.2 \
-device pcie-root-port,id=root.4,chassis=4,addr=0x2.3 \
-device pcie-root-port,id=root.5,chassis=5,addr=0x2.4 \
-device pcie-root-port,id=root.6,chassis=6,addr=0x2.5 \
-device pcie-root-port,id=root.7,chassis=7,addr=0x2.6 \
-device pcie-root-port,id=root.8,chassis=8,addr=0x2.7 \
-smp 2,sockets=1,cores=2,threads=2,maxcpus=4 \
-qmp tcp:0:5555,server,nowait \
-blockdev node-name=back_image,driver=file,cache.direct=on,cache.no-flush=off,filename=/nfsmount/migra_test/192.qcow2,aio=threads \
-blockdev node-name=drive-virtio-disk0,driver=qcow2,cache.direct=on,cache.no-flush=off,file=back_image \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=disk0,bus=root.1 \
-device VGA,id=video1,bus=root.2  \
-vnc :0 \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \
-device vfio-pci,host=0000:83:01.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \


3.On the source host,check the network info in guest
# ifconfig
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.73.33.214  netmask 255.255.254.0  broadcast 10.73.33.255
        inet6 2620:52:0:4920:202b:62ff:febb:a982  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::202b:62ff:febb:a982  prefixlen 64  scopeid 0x20<link>
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 5087  bytes 377754 (368.9 KiB)
        RX errors 0  dropped 5  overruns 0  frame 0
        TX packets 101  bytes 11887 (11.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 4950  bytes 359401 (350.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 180 (180.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 137  bytes 18353 (17.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 99  bytes 11707 (11.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfc800000-fc807fff  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



4.On target host,create 82599ES VF and set the mac address of the VF
# ip link set enp6s0f0  vf 0  mac 22:2b:62:bb:a9:82

5.start a target guest in listening mode in order to  wait for migrating from source guest
...
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \
-device vfio-pci,host=0000:06:10.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \
-incoming tcp:0:5800 \

6.On source host and target host, set postcopy mode on
(qemu) migrate_set_capability postcopy-ram on



7.Migrate guest from source host to target host.
(qemu) migrate -d tcp:10.73.73.61:5800


8.
Before the migration is completed, change into postcopy mode:
(qemu) migrate_start_postcopy


9.The migration is completed  within several seconds after starting the post-copy

10.check the migration info on source host
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 13638 milliseconds
downtime: 6 milliseconds
setup: 7059 milliseconds
transferred ram: 643240 kbytes
throughput: 801.15 mbps
remaining ram: 0 kbytes
total ram: 4211528 kbytes
duplicate: 894347 pages
skipped: 0 pages
normal: 158535 pages
normal bytes: 634140 kbytes
dirty sync count: 2
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 29470
postcopy request count: 446



11.check the migration info on the target host
(qemu) 
qemu-kvm: VFIO_MAP_DMA: -14
qemu-kvm: VFIO_MAP_DMA: -14
qemu-kvm: VFIO_MAP_DMA: -14
qemu-kvm: VFIO_MAP_DMA: -14
qemu-kvm: VFIO_MAP_DMA: -14
qemu-kvm: warning: vfio 0000:06:10.0: failed to setup container for group 63: memory listener initialization failed: Region vga.vram: vfio_dma_map(0x55943303cee0, 0xa0000, 0x10000, 0x7f05f0a00000) = -14 (Bad address)


 
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 0 milliseconds



12.On the target host,check the network info in guest
# ifconfig
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.73.33.214  netmask 255.255.254.0  broadcast 10.73.33.255
        inet6 2620:52:0:4920:202b:62ff:febb:a982  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::202b:62ff:febb:a982  prefixlen 64  scopeid 0x20<link>
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 5087  bytes 377754 (368.9 KiB)
        RX errors 0  dropped 5  overruns 0  frame 0
        TX packets 101  bytes 11887 (11.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 4950  bytes 359401 (350.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 180 (180.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


Actual results:
live post-copy migration of the vm with failover VF device fails.

Expected results:
live post-copy migration of the vm with failover VF device is complete.
The failover VF works well on both the source host and target host.

Comment 1 Ariel Adam 2020-04-01 10:12:45 UTC
*** Bug 1817986 has been marked as a duplicate of this bug. ***

Comment 2 Juan Quintela 2020-04-07 09:08:42 UTC
Postcopy has a different path for memory assignment.  Looking why it is different than normal precopy (at that point it should be the same).  But obviously it is not being.

Comment 3 Juan Quintela 2020-06-17 07:49:32 UTC
Moving to next version.
We are out of time for this version.

Comment 7 yalzhang@redhat.com 2021-03-05 12:46:35 UTC
Migrating vm with hostdev device + teaming setting will met the same issue, refer to bug 1927984#c13

Comment 8 Yanghang Liu 2021-06-25 03:49:09 UTC
This bug can still be reproduced in the following test env:

host:
qemu-kvm-6.0.0-21.module+el8.5.0+11555+e0ab0d09.x86_64
4.18.0-316.el8.x86_64
libvirt-7.4.0-1.module+el8.5.0+11218+83343022.x86_64
guest:
4.18.0-314.el8.x86_64

Comment 10 John Ferlan 2021-09-08 21:27:55 UTC
Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 11 Juan Quintela 2021-09-09 11:27:09 UTC
@lvivier, could you hande this one?

Thanks, Juan.

Comment 12 RHEL Program Management 2021-09-27 07:26:59 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 15 Laurent Vivier 2022-02-02 11:18:42 UTC
Could you retest with qemu-kvm-6.2.0-6.el9 ?

Thanks

Comment 18 yalzhang@redhat.com 2022-02-09 07:48:54 UTC
Summarize the current status about "Migrate with post-copy" as below. I'm not sure if there is an existing bug for step 3.

Env:

Sriov card: 82599
Guest kernel: 5.14.0-55.el9.x86_64
Source and target host package:
# rpm -q libvirt qemu-kvm
libvirt-8.0.0-3.el9.x86_64
qemu-kvm-6.2.0-7.el9.x86_64

Migrate with post-copy:

1. Migrate succeed;
# virsh migrate rhel9 qemu+ssh://${target_host}/system --live --verbose --p2p  --postcopy --timeout 5 --timeout-postcopy --bandwidth 4 --postcopy-bandwidth 4
Migration: [100 %]

2. During migration after hostdev unregistered, ping can not work(track in bug 1789206);

3. After migration, guest show 2 interfaces, hostdev interface is not exists. Vm xml show both bridge and hostdev interfaces. VM's network function is broken.
# dmesg | grep register
……
[    3.052376] virtio_net virtio0 eth0: failover master:eth0 registered
[    3.055692] virtio_net virtio0 eth0: failover standby slave:eth1 registered
[    8.834592] virtio_net virtio0 enp1s0: failover primary slave:eth0 registered
[   32.035167] virtio_net virtio0 enp1s0: failover primary slave:enp3s0 unregistered

Comment 19 yalzhang@redhat.com 2022-02-10 02:30:37 UTC
Hi Laurent, please help to check the result in above comment item 3, after migration with postcopy, there are only 2 interfaces on the vm. If we migrate *without* postcopy, no such issue and there will be 3 interfaces.

Comment 20 Laurent Vivier 2022-02-22 16:44:30 UTC
(In reply to yalzhang from comment #19)
> Hi Laurent, please help to check the result in above comment item 3, after
> migration with postcopy, there are only 2 interfaces on the vm. If we
> migrate *without* postcopy, no such issue and there will be 3 interfaces.

I think there is a real issue. I'm sorry I didn't have the time to work on that but it's always on my todo list.

Comment 22 RHEL Program Management 2022-03-27 07:27:17 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 29 RHEL Program Management 2023-03-27 07:28:03 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 32 RHEL Program Management 2023-09-22 16:11:36 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 33 RHEL Program Management 2023-09-22 16:14:22 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.


Note You need to log in before you can comment on or make changes to this bug.