Bug 2091528

Summary: the network in win2016/win2022 guest can't work after failover vf migraion between MT2892 network cards
Product: Red Hat Enterprise Linux 9 Reporter: Yanhui Ma <yama>
Component: virtio-winAssignee: ybendito
virtio-win sub component: virtio-win-prewhql QA Contact: Yanhui Ma <yama>
Status: CLOSED MIGRATED Docs Contact: Jiri Herrmann <jherrman>
Severity: medium    
Priority: medium CC: chayang, coli, gfialova, jinzhao, juzhang, lvivier, qizhu, virt-maint, yalzhang, ybendito, yvugenfi
Version: 9.1Keywords: MigratedToJIRA, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-01 08:08:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yanhui Ma 2022-05-30 08:36:03 UTC
Description of problem:

After live migration win2016/win2022 guest with a failover vf between MT2892 network cards on both src and dst hosts, ping will fail in windows guest and the "network and sharing center" can't be opened,  IP may will lost after a while. The guest also can't be rebooted, it will cause black screen.

Version-Release number of selected component (if applicable):

# rpm -q qemu-kvm
qemu-kvm-7.0.0-3.el9.x86_64
# uname -r
5.14.0-92.el9.x86_64
host nic info:
Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

How reproducible:
100%

Steps to Reproduce:
1.create vf on both src host and dst host

echo 1 > /sys/bus/pci/devices/0000\:1a\:00.1/sriov_numvfs

2.create failover-vf and failover-bridge network on both src and dst host

# virsh net-dumpxml failover-bridge 
<network connections='1'>
  <name>failover-bridge</name>
  <uuid>1943a508-b0b7-4274-be5a-6f0143d10f40</uuid>
  <forward mode='bridge'/>
  <bridge name='br0'/>
</network>

# virsh net-dumpxml failover-vf
<network connections='1'>
  <name>failover-vf</name>
  <uuid>4319b666-8f4b-410a-886f-17b6df772224</uuid>
  <forward mode='hostdev' managed='yes'>
    <address type='pci' domain='0x0000' bus='0x1a' slot='0x08' function='0x2'/>
  </forward>
</network>

3.boot win216/win2022 guest with failover vf on src host
    <interface type='network'>
      <mac address='52:54:00:aa:1c:ef'/>
      <source network='failover-bridge'/>
      <model type='virtio'/>
      <teaming type='persistent'/>
      <alias name='ua-test'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:aa:1c:ef'/>
      <source network='failover-vf'/>
      <teaming type='transient' persistent='ua-test'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </interface>

4. live migrating the guest
5. after migration, check the network in guest
6. reboot the guest or scan the hardware changes via Device manager in guest


Actual results:

After step 5, ping will fail, can't open "network and sharing center", see the attachment.
# ipconfig /all
Windows IP Configuration

   Host Name . . . . . . . . . . . . : WIN-A1AR6C3G7HJ
   Primary Dns Suffix  . . . . . . . : 
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : lab.eng.pek2.redhat.com

Ethernet adapter Ethernet Instance 0 9:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . : lab.eng.pek2.redhat.com
   Description . . . . . . . . . . . : Red Hat VirtIO Ethernet Adapter #3
   Physical Address. . . . . . . . . : 52-54-00-AA-1C-EF
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes

Ethernet adapter Ethernet Instance 0 13:

   Connection-specific DNS Suffix  . : 
   Description . . . . . . . . . . . : ConnectX Family mlx5Gen Virtual Function #3
   Physical Address. . . . . . . . . : 52-54-00-AA-1C-EF
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::a5f1:2227:8917:709f%17(Preferred) 
   IPv4 Address. . . . . . . . . . . : 192.168.43.200(Preferred) 
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Lease Obtained. . . . . . . . . . : Monday, May 30, 2022 3:58:52 AM
   Lease Expires . . . . . . . . . . : Monday, May 30, 2022 4:08:52 AM
   Default Gateway . . . . . . . . . : 192.168.43.2
   DHCP Server . . . . . . . . . . . : 192.168.43.6
   DHCPv6 IAID . . . . . . . . . . . : 853941549
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-29-3A-24-B9-9A-9B-AB-AE-4E-E3
   DNS Servers . . . . . . . . . . . : 192.168.43.2
   NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . : lab.eng.pek2.redhat.com
   Description . . . . . . . . . . . : Red Hat VirtIO Ethernet Adapter #2
   Physical Address. . . . . . . . . : 52-54-00-01-22-22
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   IPv6 Address. . . . . . . . . . . : 2620:52:0:49d2:78cd:cab3:51da:bf42(Preferred) 
   Link-local IPv6 Address . . . . . : fe80::78cd:cab3:51da:bf42%23(Preferred) 
   IPv4 Address. . . . . . . . . . . : 10.73.211.223(Preferred) 
   Subnet Mask . . . . . . . . . . . : 255.255.254.0
   Lease Obtained. . . . . . . . . . : Monday, May 30, 2022 3:58:36 AM
   Lease Expires . . . . . . . . . . : Tuesday, May 31, 2022 3:58:35 AM
   Default Gateway . . . . . . . . . : fe80::52c7:903:533b:88e1%23
                                       10.73.211.254
   DHCP Server . . . . . . . . . . . : 10.73.2.108
   DHCPv6 IAID . . . . . . . . . . . : 122835968
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-29-3A-24-B9-9A-9B-AB-AE-4E-E3
   DNS Servers . . . . . . . . . . . : 10.73.2.107
                                       10.73.2.108
                                       10.66.127.10
   NetBIOS over Tcpip. . . . . . . . : Enabled


# ping 192.168.43.6
Pinging 192.168.43.6 with 32 bytes of data:
Reply from 192.168.43.200: Destination host unreachable.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 192.168.43.6:
    Packets: Sent = 4, Received = 1, Lost = 3 (75% loss)

# ping 192.168.43.101
Pinging 192.168.43.101 with 32 bytes of data:
Request timed out.
Request timed out.
Reply from 192.168.43.200: Destination host unreachable.
Reply from 192.168.43.200: Destination host unreachable.

Ping statistics for 192.168.43.101:
    Packets: Sent = 4, Received = 2, Lost = 2 (50% loss)

After step 6, the guest can't be rebooted, it will cause black screen.

Expected results:

After migration, the ping src host ip and dst host ip can work well.
Additional info:

RHEL9.1 guest doesn't have the issue.

Comment 2 Laurent Vivier 2022-05-31 07:26:46 UTC
Move to sst_virtualization_windows pool as the problem occurs only with windows guest.

Comment 3 Laurent Vivier 2022-05-31 15:13:50 UTC
Perhaps the problem is related to the one seen with BZ 2090712?

Comment 4 Yvugenfi@redhat.com 2022-06-23 09:36:14 UTC
(In reply to Laurent Vivier from comment #3)
> Perhaps the problem is related to the one seen with BZ 2090712?

Yes, the BZs are related.
The way failover works now, protocol driver installation that is used to facilitate the binding in Windows guest needs to know exact PNP ID of the card to bind to. And this is a list that is part of the installation. 
We need to add additional NICs to the list on the first stage and open new BZ to work on generic mechanism to identify the card that should binded to virtio-net device

Comment 5 Yanhui Ma 2022-06-23 09:51:30 UTC
(In reply to Yvugenfi from comment #4)
> (In reply to Laurent Vivier from comment #3)
> > Perhaps the problem is related to the one seen with BZ 2090712?
> 
> Yes, the BZs are related.
> The way failover works now, protocol driver installation that is used to
> facilitate the binding in Windows guest needs to know exact PNP ID of the
> card to bind to. And this is a list that is part of the installation. 
> We need to add additional NICs to the list on the first stage and open new
> BZ to work on generic mechanism to identify the card that should binded to
> virtio-net device

Thanks for your explanation. Shall I assign the bug to you?

Comment 6 Yvugenfi@redhat.com 2022-06-25 07:55:59 UTC
(In reply to Yanhui Ma from comment #5)
> (In reply to Yvugenfi from comment #4)
> > (In reply to Laurent Vivier from comment #3)
> > > Perhaps the problem is related to the one seen with BZ 2090712?
> > 
> > Yes, the BZs are related.
> > The way failover works now, protocol driver installation that is used to
> > facilitate the binding in Windows guest needs to know exact PNP ID of the
> > card to bind to. And this is a list that is part of the installation. 
> > We need to add additional NICs to the list on the first stage and open new
> > BZ to work on generic mechanism to identify the card that should binded to
> > virtio-net device
> 
> Thanks for your explanation. Shall I assign the bug to you?

Assigning to Yuri, he is a feature owner.

Comment 11 Yanhui Ma 2023-03-24 04:37:27 UTC
Failover vf migration is only supported in RHV and it is technical preview. It is not supported in OSP and CNV. So set the priority to medium.
If anything wrong, please correct me.

Comment 12 ybendito 2023-07-13 09:56:12 UTC
Should be fixed in build 239 https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=53700760

Comment 13 Yanhui Ma 2023-07-18 07:28:35 UTC
Hi Yuri,

Seems I can still reproduce the bug with following packages version:
qemu-kvm-7.2.0-14.el9_2.x86_64
virtio-win driver:
100.93.104.23900

After migration, ping fails, the "network and sharing center" can't be opened, and the failover vf device is disabled and can't be enabled. See attachment please.


C:\Windows\system32>ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : WIN-5UFQ492T212
   Primary Dns Suffix  . . . . . . . : 
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : lab.eng.pek2.redhat.com

Ethernet adapter Ethernet 30:

   Connection-specific DNS Suffix  . : 
   Description . . . . . . . . . . . : ConnectX Family mlx5Gen Virtual Function
   Physical Address. . . . . . . . . : 52-54-00-AA-1C-EF
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::3893:79f2:7b37:e13%41(Preferred) 
   IPv4 Address. . . . . . . . . . . : 192.168.43.200(Preferred) 
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Lease Obtained. . . . . . . . . . : Tuesday, July 18, 2023 5:57:22 AM
   Lease Expires . . . . . . . . . . : Tuesday, July 18, 2023 6:07:21 AM
   Default Gateway . . . . . . . . . : 192.168.43.2
   DHCP Server . . . . . . . . . . . : 192.168.43.6
   DHCPv6 IAID . . . . . . . . . . . : 693261312
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-2C-45-FB-94-9A-E9-2D-4B-32-11
   DNS Servers . . . . . . . . . . . : 192.168.43.2
   NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Ethernet 28:

   Connection-specific DNS Suffix  . : lab.eng.pek2.redhat.com
   Description . . . . . . . . . . . : Red Hat VirtIO Ethernet Adapter #28
   Physical Address. . . . . . . . . : 52-54-00-01-22-22
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   IPv6 Address. . . . . . . . . . . : 2620:52:0:49d2:f895:47e1:a96a:6b53(Preferred) 
   Link-local IPv6 Address . . . . . : fe80::f895:47e1:a96a:6b53%18(Preferred) 
   IPv4 Address. . . . . . . . . . . : 10.73.210.159(Preferred) 
   Subnet Mask . . . . . . . . . . . : 255.255.254.0
   Lease Obtained. . . . . . . . . . : Tuesday, July 18, 2023 5:51:48 AM
   Lease Expires . . . . . . . . . . : Tuesday, July 18, 2023 5:51:48 PM
   Default Gateway . . . . . . . . . : fe80::52c7:903:533b:88e1%18
                                       10.73.211.254
   DHCP Server . . . . . . . . . . . : 10.73.2.108
   DHCPv6 IAID . . . . . . . . . . . : 559043584
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-2C-45-FB-94-9A-E9-2D-4B-32-11
   DNS Servers . . . . . . . . . . . : 10.72.17.5
                                       10.68.5.26
   NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Ethernet 29:

   Connection-specific DNS Suffix  . : 
   Description . . . . . . . . . . . : Red Hat VirtIO Ethernet Adapter #29
   Physical Address. . . . . . . . . : 52-54-00-AA-1C-EF
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::d1da:916:56dd:5440%29(Preferred) 
   Autoconfiguration IPv4 Address. . : 169.254.84.64(Preferred) 
   Subnet Mask . . . . . . . . . . . : 255.255.0.0
   Default Gateway . . . . . . . . . : 
   DHCPv6 IAID . . . . . . . . . . . : 491934720
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-2C-45-FB-94-9A-E9-2D-4B-32-11
   DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   NetBIOS over Tcpip. . . . . . . . : Enabled

C:\Windows\system32>ping 192.168.43.6

Pinging 192.168.43.6 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 192.168.43.6:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

Comment 19 Yanhui Ma 2023-07-21 09:11:08 UTC
Here is a bug maybe related with 'ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off', and the bug was fixed in qemu-kvm-8.0.0-7.el9. But for failover vf migration, there is one qemu-kvm crash bug in qemu-kvm-8.0.0-7.el9.
https://issues.redhat.com/browse/RHEL-832


Bug 2128929 - [rhel9.2] hotplug/hotunplug mlx vdpa device to the occupied addr port, then qemu core dump occurs after shutdown guest

Comment 20 ybendito 2023-07-21 11:13:33 UTC
(In reply to Yanhui Ma from comment #19)
> Here is a bug maybe related with
> 'ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off', and the bug was fixed
> in qemu-kvm-8.0.0-7.el9. But for failover vf migration, there is one
> qemu-kvm crash bug in qemu-kvm-8.0.0-7.el9.
> https://issues.redhat.com/browse/RHEL-832
> 
> 
> Bug 2128929 - [rhel9.2] hotplug/hotunplug mlx vdpa device to the occupied
> addr port, then qemu core dump occurs after shutdown guest

I think the BZ https://bugzilla.redhat.com/show_bug.cgi?id=2128929 is not related to failover problem
The mentioned BZ is for _plug_ problem into wrong/occupied address.
_Our_ problem is for _unplug_ of VF during migration with failover.