Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1703792

Summary: [SR-IOV] - Engine doesn't send refresh caps on VM shutdown with VF vNIC
Product: [oVirt] ovirt-engine Reporter: Michael Burman <mburman>
Component: BLL.NetworkAssignee: Dan Kenigsberg <danken>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.3.3.5CC: bugs, dagur, eraviv, michal.skrivanek
Target Milestone: ovirt-4.3.3-1Keywords: Automation, AutomationBlocker, Regression
Target Release: 4.3.3.6Flags: pm-rhel: ovirt-4.3+
pm-rhel: blocker?
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.3.6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-17 08:33:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1701898    
Bug Blocks:    

Description Michael Burman 2019-04-28 11:20:30 UTC
Description of problem:
[SR-IOV] - Engine doesn't send refresh caps on VM shutdown with VF vNIC

Our automation SR-IOV tests has found new regression and automation blocker in SR-IOV feature. 
This first has been seen on rhvm-4.3.3.5-0.1.el7.noarch

On VM shutdown with VF vNIC, vdsm doesn't get any request for refresh caps from engine and engine doesn't aware that this VF is not free and released on the host(re-attached). Becasue of that this VF is considered as used and all other SR-IOV tests fail becasue of that. 

Version-Release number of selected component (if applicable):
rhvm-4.3.3.5-0.1.el7.noarch
rhvm-4.3.3.6-0.1.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Start VM with SR-IOV vNIC(VF) - pci=passthroguh vNIC - refresh caps is sent and VF gone from the host as it should.
2. Shutdonw the VM - no refresh caps event is sent and engine is consider this VF is used, but it is free on the host. vdsm didn't get any refresh caps request from engine on VM shutdown.

Actual results:
vdsm didn't get refresh caps request on VM shutdown

Expected results:
refresh cap must be sent and VF should be available on the engine

Additional info:
This is a new regression(from 4.3.3.5), blocker and urgent bug. SR-IOV feature is blocked and can't be tested. Please resolve ASAP.

Comment 1 Michael Burman 2019-04-28 11:28:58 UTC
For logs info:

refresh caps on Start VM at Apr 28, 2019, 2:22:38 PM
Successfully refreshed the capabilities of host host_mixed_1.

2019-04-28 14:22:38,531+0300 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.getCapabilities succeeded in 1.82 seconds (__init__:312)

2019-04-28 14:22:38,730+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3099) [7f127ad9] EVENT_ID: H
OST_REFRESHED_CAPABILITIES(606), Successfully refreshed the capabilities of host host_mixed_1.


shutdown VM initiated at Apr 28, 2019, 2:25:33 PM
VM shutdown initiated by admin@internal-authz on VM golden_env_mixed_virtio_1 (Host: host_mixed_1).

Apr 28, 2019, 2:25:38 PM
VM golden_env_mixed_virtio_1 is down. Exit message: Admin shut down from the engine

2019-04-28 14:25:37,952+0300 INFO  (jsonrpc/1) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:xx.00.00.00,44050, vmId=4ab0c11c-fa49-4a53-9cb3-11b8a1da439c (api:
48)
2019-04-28 14:25:37,954+0300 INFO  (jsonrpc/1) [api.virt] FINISH destroy return={'status': {'message': 'Machine destroyed', 'code': 0}} from=::ffff:xx.00.00.00,44050, vmI
d=4ab0c11c-fa49-4a53-9cb3-11b8a1da439c (api:54)
2019-04-28 14:25:37,954+0300 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call VM.destroy succeeded in 0.01 seconds (__init__:312)

no caps on engine or vdsm logs on VM shutdown as should be

Comment 3 Michael Burman 2019-04-28 11:46:47 UTC
Manual WA is to do a manual refresh caps on the host.

Comment 4 Michal Skrivanek 2019-04-29 08:50:16 UTC
I doubt it's a regression, there was no intentional change in this area I know of. Do you have results of 4.3.3.4 passing and 4.3.3.5 failing? Can you please compare that with EAP 7.2.0 vs 7.2.1 - i.e. bug 1701898

Comment 5 Michael Burman 2019-04-29 08:55:22 UTC
(In reply to Michal Skrivanek from comment #4)
> I doubt it's a regression, there was no intentional change in this area I
> know of. Do you have results of 4.3.3.4 passing and 4.3.3.5 failing? Can you
> please compare that with EAP 7.2.0 vs 7.2.1 - i.e. bug 1701898

4.3.3.4 and 4.3.3.3 are PASS
We suspect that it is caused by moving to new jboss indeed and this is caused by BZ 1701898, but no one yet confirmed that yet. I have reported BZ 1701898 a week ago and no one yet have took action on it. 
We started first to see this when moved to 4.3.3.5 + new jboss EAP 7.2.1

Comment 8 Michael Burman 2019-05-07 11:10:42 UTC
Moving to ON_QA, this should be fixed with the new jboss - eap7-wildfly-7.2.1-6.GA_redhat_00004.1.el7eap.noarch

Comment 9 Michael Burman 2019-05-07 11:28:23 UTC
Verified on - 4.3.3.6-0.1.el7 and vdsm-4.30.13-1.el7ev.x86_64 with 
eap7-wildfly-7.2.1-6.GA_redhat_00004.1.el7eap.noarch
eap7-wildfly-java-jdk8-7.2.1-6.GA_redhat_00004.1.el7eap.noarch

Comment 10 RHEL Program Management 2019-05-07 11:28:28 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.