Bug 1789206
Summary: | ping can not always work during live migration of vm with VF | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Yanghang Liu <yanghliu> |
Component: | qemu-kvm | Assignee: | Laurent Vivier <lvivier> |
qemu-kvm sub component: | Networking | QA Contact: | Yanhui Ma <yama> |
Status: | CLOSED MIGRATED | Docs Contact: | Jiri Herrmann <jherrman> |
Severity: | medium | ||
Priority: | medium | CC: | aadam, ailan, chayang, jfreiman, jherrman, jinzhao, juzhang, lvivier, pezhang, phou, quintela, virt-maint, yalzhang, yama, yanqzhan |
Version: | 9.0 | Keywords: | MigratedToJIRA, Reopened, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
.Host network cannot ping VMs with VFs during live migration
When live migrating a virtual machine (VM) with a configured virtual function (VF), such as a VMs that uses virtual SR-IOV software, the network of the VM is not visible to other devices and the VM cannot be reached by commands such as `ping`. After the migration is finished, however, the problem no longer occurs.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2023-09-22 16:14:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yanghang Liu
2020-01-09 03:31:05 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks The problem could be that the driver of the NIC updates the MAC filter to soon. I have a tool to test this here. https://github.com/jensfr/netfailover_driver_detect with a description of the problem. I can help with the test or do it myself given access to your system. Re-assigning to Juan We are out of time. Moving to the next version. This bug can be reproduced with 82599ES in (1) host env: qemu-kvm version : qemu-kvm-5.1.0-8.module+el8.3.0+8141+3cd9cd43.x86_64 kernel version : 4.18.0-238.el8.x86_64 (2) vm env: kernel version : 4.18.0-238.el8.x86_64 Hi Can you test what is the result of this test: https://github.com/jensfr/netfailover_driver_detect Just to be sure if the problem is that it enables the destination link too soon? Thanks, Juan. Hi Juan, Very sorry for the late reply. > Can you test what is the result of this test: > https://github.com/jensfr/netfailover_driver_detect I have tried to test this problem with the tool in the link you provided. My test step is as follows: (1) Start a host named HOST_A with 82599ES PF The IP address of 82599ES PF : 10.73.33.142 The PCI address of 82599ES PF:0000:05:00.0 The Interface name of 82599ES PF : enp5s0f0 The mac address of 82599ES PF : 00:1b:21:c3:d0:3c # ifconfig ... enp5s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.142 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 fe80::aa6d:d0a0:3531:a542 prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:4920:f167:beaf:81b8:fd6 prefixlen 64 scopeid 0x0<global> ether 00:1b:21:c3:d0:3c txqueuelen 1000 (Ethernet) RX packets 1885137 bytes 1288877814 (1.2 GiB) RX errors 0 dropped 468 overruns 0 frame 0 TX packets 2445247 bytes 3114882646 (2.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 # lshw -c network -businfo Bus info Device Class Description ======================================================== pci@0000:05:00.0 enp5s0f0 network 82599ES 10-Gigabit SFI/SFP+ Network (2) Start another host named HOST_B with BCM57810 PF The IP address of BCM57810 PF: 10.73.33.244 The PCI address of BCM57810 PF:0000:82:00.0 The Interface name of BCM57810 PF: enp130s0f0 The mac address of BCM57810 PF: 00:0a:f7:05:82:c0 # ifconfig ... enp130s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.244 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 fe80::20a:f7ff:fe05:82c0 prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:4920:20a:f7ff:fe05:82c0 prefixlen 64 scopeid 0x0<global> ether 00:0a:f7:05:82:c0 txqueuelen 1000 (Ethernet) RX packets 5449 bytes 853348 (833.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1537 bytes 172913 (168.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 58 memory 0xc9000000-c97fffff # lshw -c network -businfo Bus info Device Class Description ========================================================== pci@0000:82:00.0 enp130s0f0 network NetXtreme II BCM57810 10 Gigabit (3) Check if two hosts can communicate with each other HOST_A(10.73.33.142) can ping HOST_B(10.73.33.244) successfully HOST_B(10.73.33.244) can ping HOST_A(10.73.33.142) successfully (4) Create a VF based on 82599ES PF/BCM57810 PF BCM57810 PF: # echo 1 > /sys/bus/pci/devices/0000\:82\:00.0/sriov_numvfs # ip link set enp130s0f0 vf 0 mac 22:2b:62:bb:a9:82 # ip link show enp130s0f0 3: enp130s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:0a:f7:05:82:c0 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 22:2b:62:bb:a9:82 brd ff:ff:ff:ff:ff:ff, tx rate 10000 (Mbps), max_tx_rate 10000Mbps, spoof checking on, link-state auto 82599ES PF: # echo 1 > /sys/bus/pci/devices/0000\:05\:00.0/sriov_numvfs # ip link show enp5s0f0 9: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:1b:21:c3:d0:3c brd ff:ff:ff:ff:ff:ff vf 0 link/ether ce:6e:d6:ae:c9:93 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off (5) Run the tool provided in the link https://github.com/jensfr/netfailover_driver_detect > The Interface name of 82599ES PF : enp5s0f0 > The mac address of 82599ES PF : 00:1b:21:c3:d0:3c > The Interface name of BCM57810 PF: enp130s0f0 > The mac address of BCM57810 PF: 00:0a:f7:05:82:c0 On HOST_A: # ./is_legacy -d enp5s0f0 -t 20 On HOST_B: # ./send_packet -d enp130s0f0 -A 00:0a:f7:05:82:c0 -B 00:1b:21:c3:d0:3c (6) check the output after running the tool On HOST_B: # ./send_packet -d enp130s0f0 -A 00:0a:f7:05:82:c0 -B 00:1b:21:c3:d0:3c .......... On HOST_A: # ./is_legacy -d enp5s0f0 -t 20 timed out As shown in step 6, I do not run this tool successfully. Juan, could you please help check my test steps provided above ? Feel free to tell me if I have not used this tool correctly or have missed any operations. Thanks for your help in advance. Hi Yanghang Just back from holidays, will try to "explain" your failure. Later, Juan. Hi Can I get access to the two machines that show this bug? Once there, some more questions: - can you send me the whole network configuration of both source and destination hosts? specifically, where is the virtio-net device comunicating vs the SR-IOV one? - What networking are you using really, because as far as I can understand you are mixing intel card in one side and Broadcom on the other side but comment 11 seems to indicate that you are trying to migrate from Intel to Broadcom. - Only with source, without doing migration. If you unplug the SR-IOV network card from source, ping continues working (it appears than not, so this should be a network configuration). (In reply to Juan Quintela from comment #15) In order to prevent the problems caused by different/incorrect network configurations used on the qemu side, I have tried to used libvirt to reproduce this problem: Test env: source host:dell-per730-27.lab.eng.pek2.redhat.com target host:dell-per730-28.lab.eng.pek2.redhat.com The step step: 1. create a bridge based on a PF # lspci -v -s 06:00.0 06:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) ... Capabilities: [160] Single Root I/O Virtualization (SR-IOV) # nmcli connection add type bridge ifname br0 con-name br0 stp off autoconnect yes # nmcli connection add type bridge-slave ifname enp6s0f0 con-name enp6s0f0 master br0 autoconnect yes # systemctl restart NetworkManager note: We need to create a bridge on both source host and target host. The PF on the souce host and the target can be different. 2. setup an bridge network for vm based on the bridge # vim failover_bridge_network.xml <network> <name>failover-bridge</name> <forward mode='bridge'/> <bridge name='br0'/> </network> # virsh net-define failover_bridge_network.xml Network failover-bridge defined from failover_bridge_network.xml # virsh net-autostart failover-bridge Network failover-bridge marked as autostarted # virsh net-start failover-bridge Network failover-bridge started # virsh net-list Name State Autostart Persistent ------------------------------------------------------------- ... failover-bridge active yes yes note: We need to create the bridge network on both source host and target host. make sure the bridge network name on source host and target host are the same. make sure the bridge network is active when do the failover vf migration 3. create a VF # echo 1 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs note: We need to create the VF on both source host and target host. The PF on the souce host and the target can be different. 4. setup a hostdev network based on the PF # vim failover_vf_network.xml <network> <name>failover-vf</name> <forward mode='hostdev' managed='yes'> <pf dev='enp6s0f0'/> </forward> </network> # virsh net-define failover_vf_network.xml Network failover-vf defined from failover_vf_network.xml # virsh net-autostart failover-vf Network failover-vf marked as autostarted # virsh net-start failover-vf Network failover-vf started # virsh net-list Name State Autostart Persistent ------------------------------------------------------------- ... failover-vf active yes yes note: We need to create the hostdev network on both source host and target host. make sure the hostdev network name on source host and target host are the same. make sure the hostdev network is active when do the failover vf migration 5. use virt-install or virt-maneger to install a domain The following is an example of virt-install cmdline: # virt-install --machine=q35 --noreboot --name=Bug1789206 --memory=4096 --vcpus=4 --disk path=/nfs/images/RHEL84.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --graphics type=vnc,port=5900,listen=0.0.0.0 --network none -l http://download.eng.pek2.redhat.com/nightly/rhel-8/RHEL-8/latest-RHEL-8.4.0/compose/BaseOS/x86_64/os/ note: We only need to create the domain on the source host make sure there is no domain with the same domain name on the tatget host make sure both the source host and the target host can access the same nfs shared dir 6. add the failover vf and failover virtio-net device to the domain failover virtio-net device xml: <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='failover-bridge'/> <model type='virtio'/> <teaming type='persistent'/> <alias name='ua-test'/> </interface> failover vf xml: <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='failover-vf'/> <teaming type='transient' persistent='ua-test'/> </interface> # virsh edit failover_vm add the failover virtio-net device xml and failover vf xml into the domain xml note: make sure the mac address of the 2 interfaces are the same; make sure the model of the bridge type interface is virtio; make sure the 'persistent' of the hostdev interface point to the persistent bridge interface; 7. start the domain # virsh start Bug1789206 # ps -ef | grep -i qemu-kvm -netdev tap,fd=39,id=hostua-test,vhost=on,vhostfd=40 -device virtio-net-pci,failover=on,netdev=hostua-test,id=ua-test,mac=52:54:00:aa:1c:ef,bus=pci.1,addr=0x0 -device vfio-pci,host=0000:06:10.0,id=hostdev0,bus=pci.7,addr=0x0,failover_pair_id=ua-test 8. check the failover device status in the vm # ifconfig enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.134 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 2620:52:0:4920:4f2a:c7ed:de11:b8db prefixlen 64 scopeid 0x0<global> inet6 fe80::85b0:7444:2539:538f prefixlen 64 scopeid 0x20<link> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 267 bytes 37547 (36.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 117 bytes 18422 (17.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp1s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::df09:886b:dce5:a748 prefixlen 64 scopeid 0x20<link> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 179 bytes 15531 (15.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 30 bytes 5920 (5.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp7s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.134 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 fe80::aca9:c6bc:3119:7818 prefixlen 64 scopeid 0x20<link> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 88 bytes 22016 (21.5 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 87 bytes 12502 (12.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 # dmesg | grep -i failover [ 7.679714] virtio_net virtio0 eth0: failover master:eth0 registered [ 7.685373] virtio_net virtio0 eth0: failover standby slave:eth1 registered [ 13.482718] virtio_net virtio0 enp1s0: failover primary slave:eth0 registered 9. ping the virtual machine from the third host # ping 10.73.33.134 10. migrate the vm from the source host to the target host # virsh migrate Bug1789206 --live --verbose qemu+ssh://10.73.73.73/system 11. check the ping status ... 64 bytes from 10.73.33.188: icmp_seq=63 ttl=61 time=52.1 ms 64 bytes from 10.73.33.188: icmp_seq=64 ttl=61 time=55.1 ms 64 bytes from 10.73.33.188: icmp_seq=75 ttl=61 time=52.8 ms <--- when migration is completed,ping works again 64 bytes from 10.73.33.188: icmp_seq=76 ttl=61 time=51.4 ms 64 bytes from 10.73.33.188: icmp_seq=77 ttl=61 time=49.8 ms 64 bytes from 10.73.33.188: icmp_seq=78 ttl=61 time=50.9 ms 64 bytes from 10.73.33.188: icmp_seq=79 ttl=61 time=50.0 ms 64 bytes from 10.73.33.188: icmp_seq=80 ttl=61 time=79.0 ms Hi Juan, Could we move the "Internal Target Release" of this bug to RHEL8.5 ? It seems that we need more time to confirm this ping problem, so this BZ cannot be fixed on RHEL8.4. Hi Yanghang your setup seems correct. Nothing looks obviously wrong to me. I am going to access your machines and see what is going on. Thanks for doing the setup. Later, Juan. hi Just a heads-up. Inside the machine. How do you configure them? I see this: [root@bootp-73-33-188 network-scripts]# ip route default via 10.73.33.254 dev enp1s0 proto dhcp metric 100 default via 10.73.33.254 dev enp1s0nsby proto dhcp metric 101 10.73.32.0/23 dev enp1s0 proto kernel scope link src 10.73.33.188 metric 100 10.73.32.0/23 dev enp1s0nsby proto kernel scope link src 10.73.33.188 metric 101 And that configuration is wrong. 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:aa:1c:ef brd ff:ff:ff:ff:ff:ff inet 10.73.33.188/23 brd 10.73.33.255 scope global dynamic noprefixroute enp1s0 valid_lft 41265sec preferred_lft 41265sec inet6 2620:52:0:4920:fe22:38bb:73a4:c9a5/64 scope global dynamic noprefixroute valid_lft 2591514sec preferred_lft 604314sec inet6 fe80::6b98:4a67:ddc:66d/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: enp1s0nsby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp1s0 state UP group default qlen 1000 link/ether 52:54:00:aa:1c:ef brd ff:ff:ff:ff:ff:ff inet 10.73.33.188/23 brd 10.73.33.255 scope global dynamic noprefixroute enp1s0nsby valid_lft 41265sec preferred_lft 41265sec inet6 fe80::f306:608:a04f:fd6f/64 scope link noprefixroute valid_lft forever preferred_lft forever Both the bonding(failover) device (enp1s0) and the virtio device (enp1s0nsby) have an assigned address (the same). Only the enp1s0 should have it assigned. I am trying to tame the NetworkManager configuration, but the problem is there. (yes, current configuration they don't have the vf device still assigned, but that is how I find the machine). I am planning to play with this machine until I can understand why it is behaving this way, as I don't know either where the missconfiguration is. My initial conclusion is that the problem is a configuration problem, not a failover problem, but can't yet explain where the missconfiguration is. Later, Juan. Hi I think this should be already fixed with the backport of fixes that laurent did. Could you re-try and close it if it works. Later, Juan. Could you re-test with RHEL-AV-8.5.0 to see if the problem has been fixed by the rebase? Thanks Hi It seems that this problem still exist in the following test environment: Test Version: host: 4.18.0-316.el8.x86_64 qemu-kvm-6.0.0-20.module+el8.5.0+11499+199527ef.x86_64 guest: 4.18.0-314.el8.x86_64 Test Device: on the source host:82599ES on the target host:BCM57810 Related log: # ping 10.73.33.153 PING 10.73.33.153 (10.73.33.153) 56(84) bytes of data. 64 bytes from 10.73.33.153: icmp_seq=1 ttl=58 time=0.776 ms 64 bytes from 10.73.33.153: icmp_seq=2 ttl=58 time=0.805 ms 64 bytes from 10.73.33.153: icmp_seq=3 ttl=58 time=0.810 ms 64 bytes from 10.73.33.153: icmp_seq=4 ttl=58 time=0.775 ms 64 bytes from 10.73.33.153: icmp_seq=5 ttl=58 time=0.808 ms 64 bytes from 10.73.33.153: icmp_seq=6 ttl=58 time=0.803 ms 64 bytes from 10.73.33.153: icmp_seq=7 ttl=58 time=0.803 ms 64 bytes from 10.73.33.153: icmp_seq=8 ttl=58 time=0.837 ms 64 bytes from 10.73.33.153: icmp_seq=9 ttl=58 time=0.799 ms 64 bytes from 10.73.33.153: icmp_seq=10 ttl=58 time=0.793 ms 64 bytes from 10.73.33.153: icmp_seq=11 ttl=58 time=0.789 ms 64 bytes from 10.73.33.153: icmp_seq=12 ttl=58 time=0.796 ms 64 bytes from 10.73.33.153: icmp_seq=13 ttl=58 time=1.02 ms 64 bytes from 10.73.33.153: icmp_seq=14 ttl=58 time=0.793 ms 64 bytes from 10.73.33.153: icmp_seq=15 ttl=58 time=0.822 ms 64 bytes from 10.73.33.153: icmp_seq=16 ttl=58 time=0.791 ms 64 bytes from 10.73.33.153: icmp_seq=17 ttl=58 time=0.832 ms <--- when "virtio_net virtio2 enp4s0: failover primary slave:enp5s0 unregistered" is outputed in source guest vm dmesg,ping will not work until the migration is completed. 64 bytes from 10.73.33.153: icmp_seq=29 ttl=58 time=0.889 ms <--- when migration is completed,ping works again 64 bytes from 10.73.33.153: icmp_seq=30 ttl=58 time=0.794 ms 64 bytes from 10.73.33.153: icmp_seq=31 ttl=58 time=0.796 ms ^C --- 10.73.33.153 ping statistics --- 31 packets transmitted, 20 received, 35.4839% packet loss, time 747ms rtt min/avg/max/mdev = 0.775/0.816/1.019/0.059 ms After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Hi Laurent, Do you plan to fix this bug ? If yes, could we re-open this bug ? (In reply to Yanghang Liu from comment #27) > Hi Laurent, > > > Do you plan to fix this bug ? > > If yes, could we re-open this bug ? Yes, it needs at least further analysis Hi Laurent, do you think the scenario can be simplified to "hotunplug the hostdev interface, and check the network status on the vm with the backup bridge type interface"? I remembered I have encountered this issue when test unplug on our 82599ES system when the feature was introduced long time ago. but it seems fixed now, the ping can keep working after the hostdev interface unregistered, check the steps below: [on host] # rpm -q libvirt qemu-kvm libvirt-7.6.0-1.module+el8.5.0+12097+2c77910b.x86_64 qemu-kvm-6.0.0-26.module+el8.5.0+12044+525f0ebc.x86_64 # lspci | grep Eth ... 82:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) [on guest] [root@vm-179-142 ~]# ifconfig -a enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.179.142 netmask 255.255.254.0 broadcast 10.73.179.255 inet6 fe80::8e2a:7cb5:c796:3c38 prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:49b2:b1f:5bf7:68f9:4de0 prefixlen 64 scopeid 0x0<global> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 8859 bytes 573031 (559.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 264 bytes 37312 (36.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 4523 bytes 285469 (278.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 47 bytes 10190 (9.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.179.142 netmask 255.255.254.0 broadcast 10.73.179.255 inet6 fe80::d9bc:7e3e:4179:6e03 prefixlen 64 scopeid 0x20<link> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 4336 bytes 287562 (280.8 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 217 bytes 27122 (26.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ... (keep ping on guest, then hotunplug the hostdev interface and check the ping status) [root@vm-179-142 ~]# ping www.baidu.com PING www.a.shifen.com (182.61.200.7) 56(84) bytes of data. 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=1 ttl=48 time=3.79 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=2 ttl=48 time=6.53 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=3 ttl=48 time=6.17 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=4 ttl=48 time=3.15 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=5 ttl=48 time=3.13 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=6 ttl=48 time=3.22 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=7 ttl=48 time=3.12 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=8 ttl=48 time=3.21 ms [ 369.112558] pcieport 0000:00:02.4: pciehp: Slot(0-4): Attention button pressed [ 369.113922] pcieport 0000:00:02.4: pciehp: Slot(0-4): Powering off due to button press 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=9 ttl=48 time=5.32 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=10 ttl=48 time=3.09 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=11 ttl=48 time=3.13 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=12 ttl=48 time=13.7 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=13 ttl=48 time=4.61 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=14 ttl=48 time=3.39 ms [ 374.642245] virtio_net virtio2 enp4s0: failover primary slave:enp5s0 unregistered 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=15 ttl=48 time=3.33 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=16 ttl=48 time=4.24 ms 64 bytes from 182.61.200.7 (182.61.200.7): icmp_seq=17 ttl=48 time=3.20 ms .... --- www.a.shifen.com ping statistics --- 123 packets transmitted, 123 received, 0% packet loss, time 128408ms rtt min/avg/max/mdev = 3.087/3.731/17.630/1.852 ms I will test migration once I get another system. (In reply to yalzhang from comment #30) > Hi Laurent, do you think the scenario can be simplified to "hotunplug the > hostdev interface, and check the network status on the vm with the backup > bridge type interface"? I remembered I have encountered this issue when test Yes, I think... > unplug on our 82599ES system when the feature was introduced long time ago. > but it seems fixed now, the ping can keep working after the hostdev > interface unregistered, check the steps below: I can see this problem when the host configuration doesn't disable "spoof checking". (Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+) In fact it depends if the card has an internal switch or relies on an external switch for the VF. Some configuration don't allow to have several NICs with the same MAC (the virtio-net and the VFIO). Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. Removed the ITR from all bugs as part of the change. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Can you show your network configuration: ifconfig -a inside the guest and ip link on the host. On #comment 21 I addressed that configuration shown on comment 18 is wrong. It can't be that you have both vfio and virtio devices with ip address. Only the "failover(bonding)" device should have configured one IP. Later, Juan. I tried the bug on following network cards and machines with rhel9, but I didn't reproduce the bug. [root@dell-per440-25 home]# lspci -s 0000:3b:00.1 3b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) [root@dell-per730-28 tests]# lspci -s 0000:06:00.1 06:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) [root@dell-per440-22 ~]# lspci -s 0000:3b:00.1 3b:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) [root@dell-per440-25 home]# rpm -q qemu-kvm qemu-kvm-6.2.0-8.el9.x86_64 [root@dell-per440-25 home]# uname -r 5.14.0-57.kpq0.el9.x86_64 [root@dell-per440-25 home]# rpm -q libvirt libvirt-8.0.0-3.el9.x86_64 hotunplug vf and keep ping vm from before hotunplug # virsh qemu-monitor-command rhel90 '{"execute":"device_del","arguments":{"id":"hostdev0"}}' [root@vm-74-194 ~]# ifconfig enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.74.194 netmask 255.255.252.0 broadcast 10.73.75.255 inet6 2620:52:0:4948:f68e:38ff:fec3:9090 prefixlen 64 scopeid 0x0<global> inet6 fe80::f68e:38ff:fec3:9090 prefixlen 64 scopeid 0x20<link> ether f4:8e:38:c3:90:90 txqueuelen 1000 (Ethernet) RX packets 174477 bytes 15284807 (14.5 MiB) RX errors 0 dropped 4922 overruns 0 frame 0 TX packets 1299 bytes 109928 (107.3 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.200.132 netmask 255.255.255.0 broadcast 192.168.200.255 inet6 fe80::7629:599b:a503:e9df prefixlen 64 scopeid 0x20<link> inet6 2001::ca9a:3558:8328:e3e0 prefixlen 64 scopeid 0x0<global> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 22650 bytes 2265604 (2.1 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3374 bytes 328154 (320.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.200.132 netmask 255.255.255.0 broadcast 192.168.200.255 inet6 fe80::17be:e17c:345e:a239 prefixlen 64 scopeid 0x20<link> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 22754 bytes 2276133 (2.1 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3325 bytes 325312 (317.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ping works well before hotunplug and after hotunplug. # ping 192.168.200.132 PING 192.168.200.132 (192.168.200.132) 56(84) bytes of data. 64 bytes from 192.168.200.132: icmp_seq=1 ttl=64 time=0.229 ms 64 bytes from 192.168.200.132: icmp_seq=2 ttl=64 time=0.215 ms 64 bytes from 192.168.200.132: icmp_seq=3 ttl=64 time=0.136 ms 64 bytes from 192.168.200.132: icmp_seq=4 ttl=64 time=0.205 ms 64 bytes from 192.168.200.132: icmp_seq=5 ttl=64 time=0.203 ms 64 bytes from 192.168.200.132: icmp_seq=6 ttl=64 time=0.203 ms 64 bytes from 192.168.200.132: icmp_seq=7 ttl=64 time=0.199 ms 64 bytes from 192.168.200.132: icmp_seq=8 ttl=64 time=0.202 ms 64 bytes from 192.168.200.132: icmp_seq=9 ttl=64 time=0.204 ms (In reply to Juan Quintela from comment #37) > Can you show your network configuration: > > ifconfig -a inside the guest Finally I can reproduce the issue with live migration. [root@localhost ~]# ping 192.168.200.79 PING 192.168.200.79 (192.168.200.79) 56(84) bytes of data. 64 bytes from 192.168.200.79: icmp_seq=1 ttl=64 time=0.105 ms 64 bytes from 192.168.200.79: icmp_seq=2 ttl=64 time=0.112 ms 64 bytes from 192.168.200.79: icmp_seq=3 ttl=64 time=0.091 ms 64 bytes from 192.168.200.79: icmp_seq=4 ttl=64 time=0.092 ms [ 396.306453] virtio_net virtio2 enp4s0: failover primary slave:enp5s0 unregistered From 192.168.200.132 icmp_seq=150 Destination Host Unreachable From 192.168.200.132 icmp_seq=151 Destination Host Unreachable From 192.168.200.132 icmp_seq=152 Destination Host Unreachable Here are ifconfig -a in the guest: [root@localhost ~]# ifconfig -a enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.200.132 netmask 255.255.255.0 broadcast 192.168.200.255 inet6 fe80::7629:599b:a503:e9df prefixlen 64 scopeid 0x20<link> inet6 2001::ca9a:3558:8328:e3e0 prefixlen 64 scopeid 0x0<global> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 1162 bytes 115157 (112.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 535 bytes 45966 (44.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 1239 bytes 126915 (123.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 236 bytes 26624 (26.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.200.132 netmask 255.255.255.0 broadcast 192.168.200.255 inet6 fe80::6564:75b3:1b28:8516 prefixlen 64 scopeid 0x20<link> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 210 bytes 23262 (22.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 195 bytes 14986 (14.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 186 bytes 20648 (20.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 186 bytes 20648 (20.1 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > and ip link > Here are ip link on host [root@dell-per440-25 ~]# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master switch state UP mode DEFAULT group default qlen 1000 link/ether f4:ee:08:0d:6e:bf brd ff:ff:ff:ff:ff:ff altname enp4s0f0 3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether f4:ee:08:0d:6e:c0 brd ff:ff:ff:ff:ff:ff altname enp4s0f1 4: enp59s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 90:e2:ba:05:63:5e brd ff:ff:ff:ff:ff:ff 5: enp59s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP mode DEFAULT group default qlen 1000 link/ether 90:e2:ba:05:63:5f brd ff:ff:ff:ff:ff:ff vf 0 link/ether e2:94:f9:9e:1e:f8 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off 6: switch: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether f4:ee:08:0d:6e:bf brd ff:ff:ff:ff:ff:ff 7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 90:e2:ba:05:63:5f brd ff:ff:ff:ff:ff:ff 8: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:4f:51:aa brd ff:ff:ff:ff:ff:ff 49: enp59s0f1v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether e2:94:f9:9e:1e:f8 brd ff:ff:ff:ff:ff:ff > on the host. On #comment 21 I addressed that configuration shown on comment > 18 is wrong. It can't be that you have both vfio and virtio devices with ip > address. Only the "failover(bonding)" device should have configured one IP. > In our test, it seems we always have both vfio and virtio devices with the same ip address. > Later, Juan. Laurent Vivier is the maintainer of VF, so I am letting this to him. I still think that the Network Configuration is wrong, but I don't have hardware to check anymore. Could you post the configuration (xml and qemu command line when running) of the guest? Thanks, Juan. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |