Bug 1454624

Summary: SRIOV Minor update OSP10 to OSP10z3 failed when PF assign to instance
Product: Red Hat OpenStack Reporter: Eran Kuris <ekuris>
Component: puppet-tripleoAssignee: Brent Eagles <beagles>
Status: CLOSED ERRATA QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: amuller, aschultz, atelang, beagles, ekuris, fbaudin, jjoyce, jschluet, mburns, mcornea, oblaut, rhel-osp-director-maint, samccann, skramaja, slinaber, supadhya, tvignaud, yrachman
Target Milestone: z4Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-5.6.0-6.el7ost Doc Type: Release Note
Doc Text:
Workaround: Before you upgrade or update OpenStack, delete the guest that attached to the PF. Then you can proceed to update or upgrade and it will pass.
Story Points: ---
Clone Of:
: 1454634 (view as bug list) Environment:
Last Closed: 2017-09-06 17:09:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1482390, 1485452    
Bug Blocks: 1454634, 1479029    
Attachments:
Description Flags
openstack stack failures list --long overcloud none

Description Eran Kuris 2017-05-23 08:13:40 UTC
Created attachment 1281362 [details]
openstack stack failures list --long overcloud

Description of problem:
Deployed OSP10- with ovs 2.5 (1 controller,2 computes) and created 3 types of instances. normal port, direct port (VF), direct-physical port (PF port).
When I ran an update to OSP10z3 with ovs 2.6  the process failed because the system could not find the PF nic.  

   Warning: Scope(Class[Nova]): Could not look up qualified variable '::nova::scheduler::filter::cpu_allocation_ratio'; class ::nova::scheduler::filter has not been evaluated
    Warning: Scope(Class[Nova]): Could not look up qualified variable '::nova::scheduler::filter::ram_allocation_ratio'; class ::nova::scheduler::filter has not been evaluated
    Warning: Scope(Class[Nova]): Could not look up qualified variable '::nova::scheduler::filter::disk_allocation_ratio'; class ::nova::scheduler::filter has not been evaluated
    Warning: Scope(Class[Nova::Compute]): compute_manager is marked as deprecated in Nova but still needed when Ironic is used. It will be removed once Nova removes it.
    Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::host'; class ::nova::vncproxy has not been evaluated
    Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::vncproxy_protocol'; class ::nova::vncproxy has not been evaluated
    Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::port'; class ::nova::vncproxy has not been evaluated
    Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::vncproxy_path'; class ::nova::vncproxy has not been evaluated
    Warning: Scope(Class[Ceilometer]): Both $metering_secret and $telemetry_secret defined, using $telemetry_secret
    Warning: Scope(Class[Ceilometer::Agent::Compute]): This class is deprecated. Please use ceilometer::agent::polling with compute namespace instead.
    Error: /sys/class/net/p1p1/device/sriov_numvfs doesn't exist. Check if p1p1 is a valid network interface supporting SR-IOV
    Error: /Stage[main]/Tripleo::Host::Sriov/Sriov_vf_config[p1p1:5]/ensure: change from absent to present failed: /sys/class/net/p1p1/device/sriov_numvfs doesn't exist. Check if p1p1 is a valid network interface supporting SR-IOV                                                                                                                                                      
    Warning: /Firewall[998 log all]: Skipping because of failed dependencies
    Warning: /Firewall[999 drop all]: Skipping because of failed dependencies


I think the issue is relevant to upgrade process too from OSP-10 to OSP11 

Version-Release number of selected component (if applicable):

python-neutron-lib-0.4.0-1.el7ost.noarch
openstack-neutron-common-9.2.0-2.el7ost.noarch
puppet-neutron-9.5.0-1.el7ost.noarch
openstack-neutron-9.2.0-2.el7ost.noarch
python-neutronclient-6.0.0-2.el7ost.noarch
openstack-neutron-ml2-9.2.0-2.el7ost.noarch
openstack-neutron-openvswitch-9.2.0-2.el7ost.noarch
python-neutron-9.2.0-2.el7ost.noarch
openstack-tripleo-heat-templates-5.2.0-15.el7ost.noarch
How reproducible:
always

Steps to Reproduce:
1.deploy SRIOV setup osp10 latest (at least 2 computes)
2.create on overcloud 3 types of instances. normal port, direct port (VF), direct-physical port (PF port)
3.run update process to osp10-z3 with ovs2.6 

openstack overcloud deploy --update-plan-only \
--templates \
--environment-file "$HOME/extra_env.yaml" \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /home/stack/ospd-10-multiple-nic-vlans-ovs-dpdk-single-port/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/neutron-ovs-dpdk.yaml \
--log-file overcloud_install.log &> overcloud_install.log


OpenStack overcloud update stack -i overcloud
Actual results:
failed 

Expected results:
update success

Additional info:

Comment 1 Eran Kuris 2017-05-23 08:36:47 UTC
Workaround: Delete the PF instance and run again update/upgrade process and it will pass.

Comment 13 Brent Eagles 2017-07-14 12:51:11 UTC
Hi, the backport has not merged upstream yet. We'll push and try and get it in today.

Comment 15 Brent Eagles 2017-07-14 17:12:04 UTC
patches merge u/s should be in next respin

Comment 16 Brent Eagles 2017-07-14 17:27:01 UTC
Posted downstream patch in case we are not planning on a rebase before next release.

https://code.engineering.redhat.com/gerrit/#/c/112349/

Comment 19 Eran Kuris 2017-08-22 05:41:39 UTC
blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1482390

Comment 20 Eran Kuris 2017-09-04 05:21:54 UTC
Fixed verified on minor update from OSP10Z3 to latest OSP10Z4 
puppet-tripleo-5.6.1-2.el7ost.noarch
$ nova list
+--------------------------------------+------+--------+------------+------------
| ID                                   | Name | Status | Task State | Power State | Networks           |
+--------------------------------------+------+--------+------------+------------
| 915b91e7-7592-40ff-bf73-d0b371b78455 | PF   | ACTIVE | -          | Running     | net-64-2=10.0.2.5  |
| 5ea2f497-ad6d-4fb8-ae73-cc8ec4c9232f | VF   | ACTIVE | -          | Running     | net-64-2=10.0.2.10 |
| 49e22ce5-95ba-44ff-bf98-96ae6312a3c8 | VM   | ACTIVE | -          | Running     | net-64-2=10.0.2.6  |

Checked full connectivity before & after update process.

Comment 22 errata-xmlrpc 2017-09-06 17:09:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2654