Bug 2126106 - Failed to attach SR-IOV network interfaces when live migrating a VM
Summary: Failed to attach SR-IOV network interfaces when live migrating a VM
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.10.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.10.6
Assignee: Petr Horáček
QA Contact: Yossi Segev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-12 12:45 UTC by Juan Orti
Modified: 2023-09-19 04:26 UTC (History)
2 users (show)

Fixed In Version: v4.10.6-30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-30 14:01:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 6581 0 None Merged SR-IOV Migration: Move attach SRIOV devices to virt-handler 2022-10-03 15:32:36 UTC
Github kubevirt kubevirt pull 8560 0 None open [release-0.49] SR-IOV Migration: Move attach SRIOV devices to virt-handler 2022-10-03 15:32:36 UTC
Red Hat Issue Tracker CNV-21197 0 None None None 2022-12-28 10:25:13 UTC
Red Hat Knowledge Base (Solution) 6975398 0 None None None 2022-09-12 13:08:05 UTC

Description Juan Orti 2022-09-12 12:45:59 UTC
Description of problem:
When live migrating a VM with several SR-IOV network interfaces, some of the NICs fail to be attached in the target host with this error:

    Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params)

It looks like virt-launcher tries to hot-plug the host-devices in a stage too early, when the lock is held by remoteDispatchDomainMigratePrepare3Params where domain modification is not allowed.

Version-Release number of selected component (if applicable):
OpenShift 4.10.23
kubevirt-hyperconverged-operator.v4.10.3
sriov-network-operator.4.10.0-202207192148

How reproducible:
Always in customer environment.

Steps to Reproduce:
1. Have a VMI with several SR-IOV NICs:

~~~
          interfaces:
          - bridge: {}
            macAddress: aa:bb:cc:dd:ee:00
            model: virtio
            name: nic-1
          - macAddress: aa:bb:cc:dd:ee:01
            model: virtio
            name: nic-2
            pciAddress: "0000:20:00.0"
            sriov: {}
          - macAddress: aa:bb:cc:dd:ee:02
            model: virtio
            name: nic-3
            pciAddress: "0000:21:00.0"
            sriov: {}
          - macAddress: aa:bb:cc:dd:ee:03
            model: virtio
            name: nic-4
            pciAddress: "0000:22:00.0"
            sriov: {}
          - macAddress: aa:bb:cc:dd:ee:04
            model: virtio
            name: nic-5
            pciAddress: "0000:23:00.0"
            sriov: {}
          - macAddress: aa:bb:cc:dd:ee:05
            model: virtio
            name: nic-6
            pciAddress: "0000:24:00.0"
            sriov: {}
          - macAddress: aa:bb:cc:dd:ee:06
            model: virtio
            name: nic-7
            pciAddress: "0000:25:00.0"
            sriov: {}
~~~

2. Live migrate the VM
3. After the migration, verify if the VM has all the NICs connected and check the virt-launcher pod log

Actual results:
Some hot-plug operations fail:

~~~
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-3 (\u0026{pci 0x0000 0x60 0x12 0x1    })","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:08.284556Z"}
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-4 (\u0026{pci 0x0000 0x60 0x09 0x1    })","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:08.581104Z"}
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-5 (\u0026{pci 0x0000 0x60 0x19 0x5    })","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:08.827619Z"}
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-6 (\u0026{pci 0x0000 0x60 0x0e 0x2    })","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:09.102456Z"}
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-7 (\u0026{pci 0x0000 0x60 0x11 0x2    })","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:09.404308Z"}
{"component":"virt-launcher","kind":"","level":"error","msg":"failed to hot-plug host-devices","name":"vm-01","namespace":"test-ns","pos":"live-migration-target.go:42","reason":"failed to attach host-device \u003chostdev type=\"pci\" managed=\"no\"\u003e\u003csource\u003e\u003caddress type=\"pci\" domain=\"0x0000\" bus=\"0x60\" slot=\"0x0e\" function=\"0x4\"\u003e\u003c/address\u003e\u003c/source\u003e\u003caddress type=\"pci\" domain=\"0x0000\" bus=\"0x20\" slot=\"0x00\" function=\"0x0\"\u003e\u003c/address\u003e\u003calias name=\"ua-sriov-nic-2\"\u003e\u003c/alias\u003e\u003c/hostdev\u003e, err: virError(Code=68, Domain=10, Message='Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params)')\n","timestamp":"2022-09-06T13:12:09.404356Z","uid":"74afee88-afa9-494f-8a9a-fe004033bfd0"}
~~~

Expected results:
NICs attached successfully

Additional info:
There are some recent changes in how the SR-IOV devices are attached:

https://github.com/kubevirt/kubevirt/pull/6581

Can they be backported to 4.10?

Comment 1 Kedar Bidarkar 2022-09-14 12:14:12 UTC
We think, this is a Networking Component related Bug. Please re-assign component, if you feel otherwise.

Comment 12 Petr Horáček 2023-01-30 14:01:29 UTC
4.10.6 has been shipped live a while back. Cleaning up.

Comment 13 Red Hat Bugzilla 2023-09-19 04:26:14 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.