Created attachment 1447811 [details] logs Description of problem: [SR-IOV] - VF leakage when shutting down a VM from powering UP state It is allowed to shutdown/poweroff the VM before it's reached UP state, we can shut down the VM from powering UP state, if doing it with SR-IOV vNIC, the VF will leak and engine will consider it as in use until we reboot the host. 'Cannot edit host NIC VFs configuration. The selected network interface enp8s0f0 has VFs that are in use.' Engine thinks that the VF is taken, although the VM is down and seems to be free(appears on host and in the UI). But we can't use this VF or change the number of VFs on the associated PF. Version-Release number of selected component (if applicable): vdsm-4.20.29-1.el7ev.x86_64 How reproducible: Seems to be 100% Steps to Reproduce: 1. Enable 1 VF on a capable sr-iov host 2. Start VM with sr-iov vNIC 3. Shut down the VM when it's in 'powering UP' state(before it's UP) 4. Try to change the number of Vfs back to zero Actual results: Cannot edit host NIC VFs configuration. The selected network interface enp8s0f0 has VFs that are in use. VF has leaked. Didn't released properly. Expected results: Should work. If we allow to shut down the VM from powering up state, then we should handle the release of the VF/s in such case. Additional info: Discovered during automation run(as we didn't wait for the VM be fully UP before shutting it down) on teardown stage.
2018-06-04 23:27:43,464+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-12) [] VM '6c4bc71c-565d-442d-8ae6-99c563840109'(golden_env_mixed_virtio_1_0) moved from 'PoweringUp' --> 'Down' 2018-06-04 23:27:43,550+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-12) [] EVENT_ID: VM_DOWN(61), VM golden_env_mixed_virtio_1_0 is down. 2018-06-04 23:27:54,227+03 WARN [org.ovirt.engine.core.bll.network.host.UpdateHostNicVfsConfigCommand] (default task-18) [host_nics_syncAction_d7212f9e-a9b3-4] Validation of action 'UpdateHostNicVfsConfig' failed for user admin@internal-authz. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__HOST_NIC_VFS_CONFIG,ACTION_TYPE_FAILED_NUM_OF_VFS_CANNOT_BE_CHANGED,$nicName enp3s0f1 2018-06-04 23:27:54,228+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-18) [] Operation Failed: [Cannot edit host NIC VFs configuration. The selected network interface enp3s0f1 has VFs that are in use.]
I suspect that this might be an Engine bug, not a Vdsm one. Is this new in 4.2? (VM startup code has changed considerably in Engine) Does refresh capabilities (or Engine restart) reset the leak?
(In reply to Dan Kenigsberg from comment #2) > I suspect that this might be an Engine bug, not a Vdsm one. > > Is this new in 4.2? (VM startup code has changed considerably in Engine) > Does refresh capabilities (or Engine restart) reset the leak? I don't know if it's new. It's an edge case i guess, but we do allow it, so it's a problem and it's 100 reproducible. Refresh caps doesn't reset the leak, engine restart does.
Verified on - 4.2.5-0.1.el7ev with vdsm-4.20.33-1.el7ev.x86_64
This bugzilla is included in oVirt 4.2.5 release, published on July 30th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.