Bug 2091528
| Summary: | the network in win2016/win2022 guest can't work after failover vf migraion between MT2892 network cards | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Yanhui Ma <yama> |
| Component: | virtio-win | Assignee: | ybendito |
| virtio-win sub component: | virtio-win-prewhql | QA Contact: | Yanhui Ma <yama> |
| Status: | CLOSED MIGRATED | Docs Contact: | Jiri Herrmann <jherrman> |
| Severity: | medium | ||
| Priority: | medium | CC: | chayang, coli, gfialova, jinzhao, juzhang, lvivier, qizhu, virt-maint, yalzhang, ybendito, yvugenfi |
| Version: | 9.1 | Keywords: | MigratedToJIRA, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Windows | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-08-01 08:08:35 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Move to sst_virtualization_windows pool as the problem occurs only with windows guest. Perhaps the problem is related to the one seen with BZ 2090712? (In reply to Laurent Vivier from comment #3) > Perhaps the problem is related to the one seen with BZ 2090712? Yes, the BZs are related. The way failover works now, protocol driver installation that is used to facilitate the binding in Windows guest needs to know exact PNP ID of the card to bind to. And this is a list that is part of the installation. We need to add additional NICs to the list on the first stage and open new BZ to work on generic mechanism to identify the card that should binded to virtio-net device (In reply to Yvugenfi from comment #4) > (In reply to Laurent Vivier from comment #3) > > Perhaps the problem is related to the one seen with BZ 2090712? > > Yes, the BZs are related. > The way failover works now, protocol driver installation that is used to > facilitate the binding in Windows guest needs to know exact PNP ID of the > card to bind to. And this is a list that is part of the installation. > We need to add additional NICs to the list on the first stage and open new > BZ to work on generic mechanism to identify the card that should binded to > virtio-net device Thanks for your explanation. Shall I assign the bug to you? (In reply to Yanhui Ma from comment #5) > (In reply to Yvugenfi from comment #4) > > (In reply to Laurent Vivier from comment #3) > > > Perhaps the problem is related to the one seen with BZ 2090712? > > > > Yes, the BZs are related. > > The way failover works now, protocol driver installation that is used to > > facilitate the binding in Windows guest needs to know exact PNP ID of the > > card to bind to. And this is a list that is part of the installation. > > We need to add additional NICs to the list on the first stage and open new > > BZ to work on generic mechanism to identify the card that should binded to > > virtio-net device > > Thanks for your explanation. Shall I assign the bug to you? Assigning to Yuri, he is a feature owner. Failover vf migration is only supported in RHV and it is technical preview. It is not supported in OSP and CNV. So set the priority to medium. If anything wrong, please correct me. Should be fixed in build 239 https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=53700760 Hi Yuri,
Seems I can still reproduce the bug with following packages version:
qemu-kvm-7.2.0-14.el9_2.x86_64
virtio-win driver:
100.93.104.23900
After migration, ping fails, the "network and sharing center" can't be opened, and the failover vf device is disabled and can't be enabled. See attachment please.
C:\Windows\system32>ipconfig /all
Windows IP Configuration
Host Name . . . . . . . . . . . . : WIN-5UFQ492T212
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : lab.eng.pek2.redhat.com
Ethernet adapter Ethernet 30:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : ConnectX Family mlx5Gen Virtual Function
Physical Address. . . . . . . . . : 52-54-00-AA-1C-EF
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::3893:79f2:7b37:e13%41(Preferred)
IPv4 Address. . . . . . . . . . . : 192.168.43.200(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Lease Obtained. . . . . . . . . . : Tuesday, July 18, 2023 5:57:22 AM
Lease Expires . . . . . . . . . . : Tuesday, July 18, 2023 6:07:21 AM
Default Gateway . . . . . . . . . : 192.168.43.2
DHCP Server . . . . . . . . . . . : 192.168.43.6
DHCPv6 IAID . . . . . . . . . . . : 693261312
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-2C-45-FB-94-9A-E9-2D-4B-32-11
DNS Servers . . . . . . . . . . . : 192.168.43.2
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Ethernet 28:
Connection-specific DNS Suffix . : lab.eng.pek2.redhat.com
Description . . . . . . . . . . . : Red Hat VirtIO Ethernet Adapter #28
Physical Address. . . . . . . . . : 52-54-00-01-22-22
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IPv6 Address. . . . . . . . . . . : 2620:52:0:49d2:f895:47e1:a96a:6b53(Preferred)
Link-local IPv6 Address . . . . . : fe80::f895:47e1:a96a:6b53%18(Preferred)
IPv4 Address. . . . . . . . . . . : 10.73.210.159(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.254.0
Lease Obtained. . . . . . . . . . : Tuesday, July 18, 2023 5:51:48 AM
Lease Expires . . . . . . . . . . : Tuesday, July 18, 2023 5:51:48 PM
Default Gateway . . . . . . . . . : fe80::52c7:903:533b:88e1%18
10.73.211.254
DHCP Server . . . . . . . . . . . : 10.73.2.108
DHCPv6 IAID . . . . . . . . . . . : 559043584
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-2C-45-FB-94-9A-E9-2D-4B-32-11
DNS Servers . . . . . . . . . . . : 10.72.17.5
10.68.5.26
NetBIOS over Tcpip. . . . . . . . : Enabled
Ethernet adapter Ethernet 29:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Red Hat VirtIO Ethernet Adapter #29
Physical Address. . . . . . . . . : 52-54-00-AA-1C-EF
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::d1da:916:56dd:5440%29(Preferred)
Autoconfiguration IPv4 Address. . : 169.254.84.64(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
DHCPv6 IAID . . . . . . . . . . . : 491934720
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-2C-45-FB-94-9A-E9-2D-4B-32-11
DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
fec0:0:0:ffff::2%1
fec0:0:0:ffff::3%1
NetBIOS over Tcpip. . . . . . . . : Enabled
C:\Windows\system32>ping 192.168.43.6
Pinging 192.168.43.6 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Ping statistics for 192.168.43.6:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),
Here is a bug maybe related with 'ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off', and the bug was fixed in qemu-kvm-8.0.0-7.el9. But for failover vf migration, there is one qemu-kvm crash bug in qemu-kvm-8.0.0-7.el9. https://issues.redhat.com/browse/RHEL-832 Bug 2128929 - [rhel9.2] hotplug/hotunplug mlx vdpa device to the occupied addr port, then qemu core dump occurs after shutdown guest (In reply to Yanhui Ma from comment #19) > Here is a bug maybe related with > 'ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off', and the bug was fixed > in qemu-kvm-8.0.0-7.el9. But for failover vf migration, there is one > qemu-kvm crash bug in qemu-kvm-8.0.0-7.el9. > https://issues.redhat.com/browse/RHEL-832 > > > Bug 2128929 - [rhel9.2] hotplug/hotunplug mlx vdpa device to the occupied > addr port, then qemu core dump occurs after shutdown guest I think the BZ https://bugzilla.redhat.com/show_bug.cgi?id=2128929 is not related to failover problem The mentioned BZ is for _plug_ problem into wrong/occupied address. _Our_ problem is for _unplug_ of VF during migration with failover. |
Description of problem: After live migration win2016/win2022 guest with a failover vf between MT2892 network cards on both src and dst hosts, ping will fail in windows guest and the "network and sharing center" can't be opened, IP may will lost after a while. The guest also can't be rebooted, it will cause black screen. Version-Release number of selected component (if applicable): # rpm -q qemu-kvm qemu-kvm-7.0.0-3.el9.x86_64 # uname -r 5.14.0-92.el9.x86_64 host nic info: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] How reproducible: 100% Steps to Reproduce: 1.create vf on both src host and dst host echo 1 > /sys/bus/pci/devices/0000\:1a\:00.1/sriov_numvfs 2.create failover-vf and failover-bridge network on both src and dst host # virsh net-dumpxml failover-bridge <network connections='1'> <name>failover-bridge</name> <uuid>1943a508-b0b7-4274-be5a-6f0143d10f40</uuid> <forward mode='bridge'/> <bridge name='br0'/> </network> # virsh net-dumpxml failover-vf <network connections='1'> <name>failover-vf</name> <uuid>4319b666-8f4b-410a-886f-17b6df772224</uuid> <forward mode='hostdev' managed='yes'> <address type='pci' domain='0x0000' bus='0x1a' slot='0x08' function='0x2'/> </forward> </network> 3.boot win216/win2022 guest with failover vf on src host <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='failover-bridge'/> <model type='virtio'/> <teaming type='persistent'/> <alias name='ua-test'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </interface> <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='failover-vf'/> <teaming type='transient' persistent='ua-test'/> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> </interface> 4. live migrating the guest 5. after migration, check the network in guest 6. reboot the guest or scan the hardware changes via Device manager in guest Actual results: After step 5, ping will fail, can't open "network and sharing center", see the attachment. # ipconfig /all Windows IP Configuration Host Name . . . . . . . . . . . . : WIN-A1AR6C3G7HJ Primary Dns Suffix . . . . . . . : Node Type . . . . . . . . . . . . : Hybrid IP Routing Enabled. . . . . . . . : No WINS Proxy Enabled. . . . . . . . : No DNS Suffix Search List. . . . . . : lab.eng.pek2.redhat.com Ethernet adapter Ethernet Instance 0 9: Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . : lab.eng.pek2.redhat.com Description . . . . . . . . . . . : Red Hat VirtIO Ethernet Adapter #3 Physical Address. . . . . . . . . : 52-54-00-AA-1C-EF DHCP Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes Ethernet adapter Ethernet Instance 0 13: Connection-specific DNS Suffix . : Description . . . . . . . . . . . : ConnectX Family mlx5Gen Virtual Function #3 Physical Address. . . . . . . . . : 52-54-00-AA-1C-EF DHCP Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes Link-local IPv6 Address . . . . . : fe80::a5f1:2227:8917:709f%17(Preferred) IPv4 Address. . . . . . . . . . . : 192.168.43.200(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 Lease Obtained. . . . . . . . . . : Monday, May 30, 2022 3:58:52 AM Lease Expires . . . . . . . . . . : Monday, May 30, 2022 4:08:52 AM Default Gateway . . . . . . . . . : 192.168.43.2 DHCP Server . . . . . . . . . . . : 192.168.43.6 DHCPv6 IAID . . . . . . . . . . . : 853941549 DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-29-3A-24-B9-9A-9B-AB-AE-4E-E3 DNS Servers . . . . . . . . . . . : 192.168.43.2 NetBIOS over Tcpip. . . . . . . . : Enabled Ethernet adapter Ethernet: Connection-specific DNS Suffix . : lab.eng.pek2.redhat.com Description . . . . . . . . . . . : Red Hat VirtIO Ethernet Adapter #2 Physical Address. . . . . . . . . : 52-54-00-01-22-22 DHCP Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes IPv6 Address. . . . . . . . . . . : 2620:52:0:49d2:78cd:cab3:51da:bf42(Preferred) Link-local IPv6 Address . . . . . : fe80::78cd:cab3:51da:bf42%23(Preferred) IPv4 Address. . . . . . . . . . . : 10.73.211.223(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.254.0 Lease Obtained. . . . . . . . . . : Monday, May 30, 2022 3:58:36 AM Lease Expires . . . . . . . . . . : Tuesday, May 31, 2022 3:58:35 AM Default Gateway . . . . . . . . . : fe80::52c7:903:533b:88e1%23 10.73.211.254 DHCP Server . . . . . . . . . . . : 10.73.2.108 DHCPv6 IAID . . . . . . . . . . . : 122835968 DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-29-3A-24-B9-9A-9B-AB-AE-4E-E3 DNS Servers . . . . . . . . . . . : 10.73.2.107 10.73.2.108 10.66.127.10 NetBIOS over Tcpip. . . . . . . . : Enabled # ping 192.168.43.6 Pinging 192.168.43.6 with 32 bytes of data: Reply from 192.168.43.200: Destination host unreachable. Request timed out. Request timed out. Request timed out. Ping statistics for 192.168.43.6: Packets: Sent = 4, Received = 1, Lost = 3 (75% loss) # ping 192.168.43.101 Pinging 192.168.43.101 with 32 bytes of data: Request timed out. Request timed out. Reply from 192.168.43.200: Destination host unreachable. Reply from 192.168.43.200: Destination host unreachable. Ping statistics for 192.168.43.101: Packets: Sent = 4, Received = 2, Lost = 2 (50% loss) After step 6, the guest can't be rebooted, it will cause black screen. Expected results: After migration, the ping src host ip and dst host ip can work well. Additional info: RHEL9.1 guest doesn't have the issue.