Bug 1583999 - SRIOV agent docker is missing after upgrade process from OSP12 to OSP13
Summary: SRIOV agent docker is missing after upgrade process from OSP12 to OSP13
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Saravanan KR
QA Contact: Toni Freger
URL:
Whiteboard:
Depends On:
Blocks: 1430896
TreeView+ depends on / blocked
 
Reported: 2018-05-30 07:03 UTC by Eran Kuris
Modified: 2018-06-06 13:45 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-06 13:45:33 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Eran Kuris 2018-05-30 07:03:13 UTC
Description of problem:
After running upgrade process from OSP12 with SRIOV deployment to OSP13 it looks like SRIOV agent docker is not existed on compute nodes.

the agent is running but not containerized. 

[root@compute-0 ~]# systemctl status neutron-sriov-nic-agent.service 
● neutron-sriov-nic-agent.service - OpenStack Neutron SR-IOV NIC Agent
   Loaded: loaded (/usr/lib/systemd/system/neutron-sriov-nic-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-05-29 14:05:24 UTC; 16h ago
 Main PID: 369369 (neutron-sriov-n)
    Tasks: 4
   Memory: 107.3M
   CGroup: /system.slice/neutron-sriov-nic-agent.service
           ├─369369 /usr/bin/python2 /usr/bin/neutron-sriov-nic-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/...
           ├─369471 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
           └─369472 /usr/bin/python2 /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

May 29 14:05:24 compute-0 systemd[1]: Started OpenStack Neutron SR-IOV NIC Agent.
May 29 14:05:24 compute-0 systemd[1]: Starting OpenStack Neutron SR-IOV NIC Agent...
May 29 14:05:28 compute-0 sudo[369471]:  neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
May 29 19:31:06 compute-0 neutron-sriov-nic-agent[369369]: /usr/lib/python2.7/site-packages/amqp/connection.py:304: AMQPDeprecationWarning: The .transport attribute on the conne...ssed before
May 29 19:31:06 compute-0 neutron-sriov-nic-agent[369369]: the connection was established.  This is supported for now, but will
May 29 19:31:06 compute-0 neutron-sriov-nic-agent[369369]: be deprecated in amqp 2.2.0.
May 29 19:31:06 compute-0 neutron-sriov-nic-agent[369369]: Since amqp 2.0 you have to explicitly call Connection.connect()
May 29 19:31:06 compute-0 neutron-sriov-nic-agent[369369]: before using the connection.
May 29 19:31:06 compute-0 neutron-sriov-nic-agent[369369]: W_FORCE_CONNECT.format(attr=attr)))
Hint: Some lines were ellipsized, use -l to show in full.


Version-Release number of selected component (if applicable):
OS12 latest to OSP13 latest.


[root@compute-0 ~]# rpm -qa | grep sriov
openstack-neutron-sriov-nic-agent-12.0.2-0.20180421011359.0ec54fd.el7ost.noarch
[root@compute-0 ~]# rpm -qa | grep neutron 
openstack-neutron-metering-agent-12.0.2-0.20180421011359.0ec54fd.el7ost.noarch
openstack-neutron-common-12.0.2-0.20180421011359.0ec54fd.el7ost.noarch
puppet-neutron-12.4.1-0.20180412211913.el7ost.noarch
openstack-neutron-lbaas-12.0.1-0.20180424200349.cdbf25c.el7ost.noarch
openstack-neutron-openvswitch-12.0.2-0.20180421011359.0ec54fd.el7ost.noarch
openstack-neutron-ml2-12.0.2-0.20180421011359.0ec54fd.el7ost.noarch
openstack-neutron-lbaas-ui-4.0.1-0.20180326210834.a2c502e.el7ost.noarch
python-neutron-lbaas-12.0.1-0.20180424200349.cdbf25c.el7ost.noarch
openstack-neutron-linuxbridge-12.0.2-0.20180421011359.0ec54fd.el7ost.noarch
openstack-neutron-12.0.2-0.20180421011359.0ec54fd.el7ost.noarch
python2-neutron-lib-1.13.0-1.el7ost.noarch
openstack-neutron-sriov-nic-agent-12.0.2-0.20180421011359.0ec54fd.el7ost.noarch
python2-neutronclient-6.7.0-1.el7ost.noarch
python-neutron-12.0.2-0.20180421011359.0ec54fd.el7ost.noarch
(overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep triple
python-tripleoclient-9.2.1-11.el7ost.noarch
openstack-tripleo-ui-8.3.1-2.el7ost.noarch
openstack-tripleo-common-8.6.1-17.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch
puppet-tripleo-8.3.2-6.el7ost.noarch
openstack-tripleo-common-containers-8.6.1-17.el7ost.noarch
ansible-tripleo-ipsec-8.1.1-0.20180308133440.8f5369a.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-27.el7ost.noarch
openstack-tripleo-validations-8.4.1-5.el7ost.noarch
openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
(overcloud) [stack@undercloud-0 ~]$ 


How reproducible:
100% 

Steps to Reproduce:
1. Deploy osp12 with sriov 
2. run upgrade to OSP13 with sriov
3. check that sriov docker exist on compute nodes

Comment 3 Brent Eagles 2018-05-30 13:21:47 UTC
The SR-IOV agent was removed from the compute role in https://review.openstack.org/#/c/501999/ which I think would produce this kind of behavior. I believe the NFV have upgrade guidance on this. Adding needinfo on Saravanan.

Comment 4 Saravanan KR 2018-05-30 13:26:31 UTC
The guidance for upgrade with compute role in osp12 to osp13 is in the release notes https://review.openstack.org/#/c/501999/8/releasenotes/notes/sriov-role-1ef30615048239c7.yaml

roles_data.yaml file has to be customized to support such deployments, by adding the sriov agent service to the compute role.

Comment 5 Brent Eagles 2018-05-31 12:39:15 UTC
I was mistaken about the root cause here. From what I can tell, the deployment is actually using the ComputeSriov role. The fix appear to have been to add support for deprecated params to the ComputeSriov role that was being used during upgrading:

  uses_deprecated_params: True
  deprecated_param_image: 'NovaImage'
  deprecated_param_extraconfig: 'NovaComputeExtraConfig'
  deprecated_param_metadata: 'NovaComputeServerMetadata'
  deprecated_param_scheduler_hints: 'NovaComputeSchedulerHints'
  deprecated_param_ips: 'NovaComputeIPs'
  deprecated_server_resource_name: 'NovaCompute'
  deprecated_nic_config_name: 'compute.yaml'

Comment 6 Saravanan KR 2018-05-31 12:47:00 UTC
(In reply to Brent Eagles from comment #5)
> I was mistaken about the root cause here. From what I can tell, the
> deployment is actually using the ComputeSriov role. The fix appear to have
> been to add support for deprecated params to the ComputeSriov role that was
> being used during upgrading:
> 
>   uses_deprecated_params: True
>   deprecated_param_image: 'NovaImage'
>   deprecated_param_extraconfig: 'NovaComputeExtraConfig'
>   deprecated_param_metadata: 'NovaComputeServerMetadata'
>   deprecated_param_scheduler_hints: 'NovaComputeSchedulerHints'
>   deprecated_param_ips: 'NovaComputeIPs'
>   deprecated_server_resource_name: 'NovaCompute'
>   deprecated_nic_config_name: 'compute.yaml'

These deprecated parameters are specific "Compute" role only. New roles should not have these deprecated parameters.

Comment 7 Bernard Cafarelli 2018-05-31 12:52:12 UTC
So, should this go in upgrade documentation then? Following https://review.openstack.org/#/c/501999/8/releasenotes/notes/sriov-role-1ef30615048239c7.yaml and detailing the parameters?

Comment 9 Saravanan KR 2018-06-01 14:00:26 UTC
Infrared (In reply to Bernard Cafarelli from comment #8)
> Marking as documentation from comment 4 release notes
> 
> skramaja, can you confirm the correct fix here?
If the deployment is using SR-IOV in Compute role, then comment #4 has the notes for the steps.

If the deployment is using ComputeSriov role, then there should no deprecated parameters in the deployment. QE samples have those, which should be removed.

And finally as per the description, for the issue of sriov agent not containerized, the environment file environments/services-docker/neutron-sriov.yaml should be used.


Note You need to log in before you can comment on or make changes to this bug.