Bug 1445030 - RH OSP10: With Cisco ML2 plugin for ucsm, Overcloud install failing due to config not being generated in time. [NEEDINFO]
Summary: RH OSP10: With Cisco ML2 plugin for ucsm, Overcloud install failing due to co...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: z4
: 10.0 (Newton)
Assignee: Steven Hardy
QA Contact: Gurenko Alex
URL:
Whiteboard: hot
Keywords: OtherQA, Triaged, ZStream
Depends On:
Blocks: 1321623 1335596
TreeView+ depends on / blocked
 
Reported: 2017-04-24 18:42 UTC by Sandhya Dasu
Modified: 2018-03-07 11:19 UTC (History)
15 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2017-09-06 17:09:30 UTC
pmorey: needinfo? (shardy)
akaris: needinfo? (shardy)
jjoyce: needinfo? (akaris)


Attachments (Terms of Use)
Contents of /etc/puppet/hieradata/neutron_cisco_data.yaml (3.66 KB, text/plain)
2017-04-28 16:24 UTC, Sandhya Dasu
no flags Details
/etc/puppet/hieradata/neutron_cisco_data.yaml from Controller-0 (3.66 KB, text/plain)
2017-05-01 15:17 UTC, Sandhya Dasu
no flags Details
/etc/puppet/hieradata/neutron_cisco_data.yaml from Controller-1 (3.66 KB, text/plain)
2017-05-01 15:18 UTC, Sandhya Dasu
no flags Details
/etc/puppet/hieradata/neutron_cisco_data.yaml from Controller-2 (3.66 KB, text/plain)
2017-05-01 15:18 UTC, Sandhya Dasu
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2654 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 director Bug Fix Advisory 2017-09-06 20:55:36 UTC
OpenStack gerrit 461734 None None None 2017-05-02 11:12 UTC
OpenStack gerrit 461821 None None None 2017-06-21 13:30 UTC
OpenStack gerrit 462035 None None None 2017-05-08 20:01 UTC
OpenStack gerrit 464700 None None None 2017-06-21 13:30 UTC
Launchpad 1687597 None None None 2017-05-02 11:11 UTC

Description Sandhya Dasu 2017-04-24 18:42:58 UTC
Description of problem:
Config files for Cisco ML2 plugins not being generated in time. During overcloud bring-up, the config files required during initialization of the UCSM and Nexus ML2 plugins are generated after the initialization.

Version-Release number of selected component (if applicable):
RH -OSP10 Overcloud install


How reproducible:
Consistently reproduced.


Steps to Reproduce:
1. Use DCI agent to install under cloud and overcloud with Cisco HW.
2. Config files provided to overcloud install enable the cisco_ucsm and the cisco_nexus plugins.
3. Overcloud install fails to initialize the above mentioned plugins.
4. Overcloud comes up successfully without the plugins being enabled.

Actual results:

Attaching logs from a controller node showing the failures seen while intializing the cisco_ucsm plugin:

Apr 18 10:10:58 host-X crmd[7834]:  notice: Result of notify operation for redis on overcloud-controller-0: 0 (ok)
Apr 18 10:11:00 host-X redis(redis)[45935]: INFO: monitor: Slave mode link has not yet been established (link=down)
Apr 18 10:11:00 host-X redis(redis)[45935]: INFO: demote: Setting master to 'overcloud-controller-1'
Apr 18 10:11:01 host-X os-collect-config: [2017-04-18 14:11:01,501] (heat-config) [INFO] {"deploy_stdout": "Matching apachectl 'Server version: Apache/2.4.6 (Red Hat Enterprise Linux)\nServer built:   Aug  3 2016 08:33:27'\n", "deploy_stderr": "exception: connect failed\n\u001b[1;31mWarning: Scope(Class[Cinder::Api]): keystone_enabled is deprecated, use auth_strategy instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Keystone]): Fernet token is recommended in Mitaka release. The default for token_provider will be changed to 'fernet' in O release.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Heat]): keystone_user_domain_id is deprecated, use the name option instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Heat]): keystone_project_domain_id is deprecated, use the name option instead.\u001b[0m\n\u001b[1;31mError: Must pass ucsm_ip to Class[Neutron::Plugins::Ml2::Cisco::Ucsm] at /etc/puppet/modules/tripleo/manifests/profile/base/neutron/plugins/ml2.pp:55 on node overcloud-controller-0.localdomain\u001b[0m\n\u001b[1;31mError: Must pass ucsm_ip to Class[Neutron::Plugins::Ml2::Cisco::Ucsm] at /etc/puppet/modules/tripleo/manifests/profile/base/neutron/plugins/ml2.pp:55 on node overcloud-controller-0.localdomain\u001b[0m\n", "deploy_status_code": 1}
Apr 18 10:11:01 host-X os-collect-config: [2017-04-18 14:11:01,502] (heat-config) [DEBUG] [2017-04-18 14:10:53,245] (heat-config) [DEBUG] Running FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/93999189-5017-414b-b51c-7b0bebc583bc"  FACTER_fqdn="overcloud-controller-0.localdomain"  FACTER_deploy_config_name="ControllerDeployment_Step3"  puppet apply --detailed-exitcodes --logdest console --modulepath /etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules /var/lib/heat-config/heat-config-puppet/93999189-5017-414b-b51c-7b0bebc583bc.pp
Apr 18 10:11:01 host-X os-collect-config: [2017-04-18 14:11:01,497] (heat-config) [INFO] Return code 1
Apr 18 10:11:01 host-X os-collect-config: [2017-04-18 14:11:01,497] (heat-config) [INFO] Matching apachectl 'Server version: Apache/2.4.6 (Red Hat Enterprise Linux)
Apr 18 10:11:01 host-X os-collect-config: Server built:   Aug  3 2016 08:33:27'
Apr 18 10:11:01 host-X os-collect-config: [2017-04-18 14:11:01,498] (heat-config) [INFO] exception: connect failed
Apr 18 10:11:01 host-X os-collect-config: #033[1;31mWarning: Scope(Class[Cinder::Api]): keystone_enabled is deprecated, use auth_strategy instead.#033[0m
Apr 18 10:11:01 host-X os-collect-config: #033[1;31mWarning: Scope(Class[Keystone]): Fernet token is recommended in Mitaka release. The default for token_provider will be changed to 'fernet' in O release.#033[0m
Apr 18 10:11:01 host-X os-collect-config: #033[1;31mWarning: Scope(Class[Heat]): keystone_user_domain_id is deprecated, use the name option instead.#033[0m
Apr 18 10:11:01 host-X os-collect-config: #033[1;31mWarning: Scope(Class[Heat]): keystone_project_domain_id is deprecated, use the name option instead.#033[0m
Apr 18 10:11:01 host-X os-collect-config: #033[1;31mError: Must pass ucsm_ip to Class[Neutron::Plugins::Ml2::Cisco::Ucsm] at /etc/puppet/modules/tripleo/manifests/profile/base/neutron/plugins/ml2.pp:55 on node overcloud-controller-0.localdomain#033[0m
Apr 18 10:11:01 host-X os-collect-config: #033[1;31mError: Must pass ucsm_ip to Class[Neutron::Plugins::Ml2::Cisco::Ucsm] at /etc/puppet/modules/tripleo/manifests/profile/base/neutron/plugins/ml2.pp:55 on node overcloud-controller-0.localdomain#033[0m


The config file does get generated and the timestamp is later than the initialization. The config file that it is looking for is being generated by TripleO and has the following timestamp:
[root@overcloud-controller-0 hieradata]# ls -al neutron_cisco_data.yaml
-rw-r--r--. 1 root root 3385 Apr 18 14:11 neutron_cisco_data.yaml
[root@overcloud-controller-0 hieradata]# 

Expected results:

neutron_cisco_data.yaml file is generated before cisco_ucsm is initialized.


Additional info:
The heat templates and triple code to handle config for cisco_ucsm can be found upstream at:
https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/neutron/plugins/ml2.pp

https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/puppet/extraconfig/all_nodes/neutron-ml2-cisco-nexus-ucsm.yaml

The code for the plugin can be found at:
https://github.com/openstack/networking-cisco/tree/master/networking_cisco/plugins/ml2/drivers/cisco/ucsm

Comment 1 Emilien Macchi 2017-04-24 19:52:41 UTC
Sandhya, thanks for the bug report. Would you be able to work on this bug or do you expect us to take a look? My team doesn't have the hardware to test TripleO deployment with Cisco hardware so i'm not sure we can be really useful.

But let us know if it we can help here.

Comment 2 Sandhya Dasu 2017-04-24 20:18:03 UTC
Agreed. We need to work together on this. We need help with the tripleo side of things and I will be able to provide you with the necessary information from the logs itself. For starters, can someone look at our heat templates and tripleo code that is upstream and let us know if we are missing changes for OSP10/Newton? Those were added during the Liberty timeframe and haven't changes since. (Config items themselves don't have changes.)

(In reply to Emilien Macchi from comment #1)
> Sandhya, thanks for the bug report. Would you be able to work on this bug or
> do you expect us to take a look? My team doesn't have the hardware to test
> TripleO deployment with Cisco hardware so i'm not sure we can be really
> useful.
> 
> But let us know if it we can help here.

Comment 3 Sandhya Dasu 2017-04-28 16:24 UTC
Created attachment 1274976 [details]
Contents of /etc/puppet/hieradata/neutron_cisco_data.yaml

This file seems accurately generated.

Comment 4 Emilien Macchi 2017-05-01 13:35:19 UTC
please provide full sosreport of all nodes, so we can debug.
Thanks

Comment 5 Sandhya Dasu 2017-05-01 15:17 UTC
Created attachment 1275426 [details]
/etc/puppet/hieradata/neutron_cisco_data.yaml from Controller-0

Comment 6 Sandhya Dasu 2017-05-01 15:18 UTC
Created attachment 1275427 [details]
/etc/puppet/hieradata/neutron_cisco_data.yaml from Controller-1

Comment 7 Sandhya Dasu 2017-05-01 15:18 UTC
Created attachment 1275428 [details]
/etc/puppet/hieradata/neutron_cisco_data.yaml from Controller-2

Comment 8 Sandhya Dasu 2017-05-01 15:19:37 UTC
/etc/puppet/hieradata/neutron_cisco_data.yaml from all 3 controllers have been attached. I am not sure what sosreport means.
(In reply to Emilien Macchi from comment #4)
> please provide full sosreport of all nodes, so we can debug.
> Thanks

Comment 9 Emilien Macchi 2017-05-01 15:20:50 UTC
I asked for sosreport, not /etc/puppet/hieradata/neutron_cisco_data.yaml file.
Please provide sosreports from all nodes that you're deploying.

Thanks

Comment 10 Dan Prince 2017-05-01 20:35:03 UTC
I think part of the issue might be that the hiera for this plugin is laid down as part of the 'all_nodes' extra config data (for which I don't think there is a guarantee that it would exist early during the deployment). I think we probably need to refactor puppet/all_nodes/neutron-ml2-cisco-nexus-ucsm.yaml so that it uses puppet/pre_deploy for the relevant hiera deployments so that these exist early on...

Comment 11 Steven Hardy 2017-05-02 07:18:42 UTC
I think Dan is right, but I'm a little confused as we tested this previously and it worked:

https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.j2.yaml#L653

It seems we could add a depends_on so AllNodesDeploySteps depends_on AllNodesExtraConfig - I'll look back over the history to see if we changed the ordering here when doing the OSP10 refactoring for custom roles (if so we can probably just add the missing depends_on which will solve this I think).

Comment 12 Steven Hardy 2017-05-02 07:24:22 UTC
So, yes looking at OSP8 openstack-tripleo-heat-templates it looks like this:

  # Nested stack deployment runs after all other controller deployments
  ControllerNodesPostDeployment:
    type: OS::TripleO::ControllerPostDeployment
    depends_on: [ControllerBootstrapNodeDeployment, ControllerAllNodesDeployment, ControllerSwiftDeployment, ControllerCephDeployment]
    properties:
      servers: {get_attr: [Controller, attributes, nova_server_resource]}
      NodeConfigIdentifiers:
        allnodes_extra: {get_attr: [AllNodesExtraConfig, config_identifier]}

Note the implicit dependency on AllNodesExtraConfig - we lost that with the refactor to AllNodesDeploySteps, so I think the simplest solution is to reinstate that ordering via an explicit depends_on - I'll post a patch later today after doing some local testing.

Comment 13 Steven Hardy 2017-05-02 11:12:29 UTC
https://review.openstack.org/461734 proposed which should ensure the original ordering is restored

Comment 14 Steven Hardy 2017-05-02 11:14:39 UTC
Clearing needinfo as I think the above patch should resolve this, feedback welcome though (it's a one line patch, so should be easy to manually apply and re-test)

Comment 15 Emilien Macchi 2017-05-02 12:18:01 UTC
Thanks Steve, I'll take care of backporting it down to OSP9.

Comment 16 Steven Hardy 2017-05-02 15:50:14 UTC
Note I think this issue is only found on OSP10 branches and newer, I checked rhos-9.0-patches and it is fine, because it's prior to the refactor for custom roles.

Comment 17 Sid Ahmed Sadouni 2017-05-02 16:00:45 UTC
We faced the same bug today. I patched as c#13. I will post the results.
Thanks

Comment 18 Emilien Macchi 2017-05-02 16:52:26 UTC
Correction on my last comment. OSP9 should work fine. OSP10, OSP11 and OSP12 (on dev right now) are broken though.

Comment 19 Sandhya Dasu 2017-05-02 19:58:38 UTC
I manually added the patch in https://review.openstack.org/461734 and redeployed undercloud and I still see the failure. Here are some commands I ran on the director node to debug further:

[stack@B6-DIRECTOR share]$  heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+---------------+----------------------+--------------+
| id                                   | stack_name | stack_status  | creation_time        | updated_time |
+--------------------------------------+------------+---------------+----------------------+--------------+
| cda9a907-69c6-4368-8b48-0d7315956148 | overcloud  | CREATE_FAILED | 2017-05-02T19:18:13Z | None         |
+--------------------------------------+------------+---------------+----------------------+--------------+
[stack@B6-DIRECTOR share]$ 
[stack@B6-DIRECTOR share]$ 
[stack@B6-DIRECTOR share]$ 
[stack@B6-DIRECTOR share]$ heat resource-list cda9a907-69c6-4368-8b48-0d7315956148 | grep FAILED
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
| AllNodesDeploySteps                       | 301b9e2d-a19e-4dce-8fa8-8f83c9510466         | OS::TripleO::PostDeploySteps                    | CREATE_FAILED   | 2017-05-02T19:18:13Z |
[stack@B6-DIRECTOR share]$ 
[stack@B6-DIRECTOR share]$ 
[stack@B6-DIRECTOR share]$ heat resource-list 301b9e2d-a19e-4dce-8fa8-8f83c9510466  | grep _FAILED
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
| ComputeDeployment_Step3       | d4ff5f43-acdf-4936-996f-a2b0ac53d04b | OS::Heat::StructuredDeploymentGroup                                                                   | CREATE_FAILED   | 2017-05-02T19:37:11Z |
| ControllerDeployment_Step3    | 141c5f72-7c85-480a-b726-98321096f983 | OS::Heat::StructuredDeploymentGroup                                                                   | CREATE_FAILED   | 2017-05-02T19:37:11Z |

***Sandhya: Note that the output shows CREATE_FAILED for a compute and controller node.********

[stack@B6-DIRECTOR share]$ heat resource-list 141c5f72-7c85-480a-b726-98321096f983
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id                 | resource_type                  | resource_status | updated_time         |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| 0             | 38ab0f7a-1993-4a20-ba7b-869fd43d4f0f | OS::Heat::StructuredDeployment | CREATE_COMPLETE | 2017-05-02T19:42:28Z |
| 1             | 766cba2a-401a-4bc9-b750-1a4080a1a5f5 | OS::Heat::StructuredDeployment | CREATE_COMPLETE | 2017-05-02T19:42:28Z |
| 2             | 34dbac25-9426-46bc-aea9-eb74436bff24 | OS::Heat::StructuredDeployment | CREATE_COMPLETE | 2017-05-02T19:42:28Z |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+

***Sandhya: The above is the output for the controller. It appears that the 3 controllers were "created" correctly. Not sure why the output above showed CREATE_FAILED for controller.********

[stack@B6-DIRECTOR share]$ heat resource-list d4ff5f43-acdf-4936-996f-a2b0ac53d04b
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id                 | resource_type                  | resource_status | updated_time         |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| 0             | bb7cddf6-11b0-4f3e-958f-9e21ea4e45d4 | OS::Heat::StructuredDeployment | CREATE_FAILED   | 2017-05-02T19:42:28Z |
| 1             | 6393a693-9d84-4b88-aae5-4d99ee7feead | OS::Heat::StructuredDeployment | CREATE_COMPLETE | 2017-05-02T19:42:28Z |
| 2             | 679cbcde-e512-4748-b596-09d7b63b1f96 | OS::Heat::StructuredDeployment | CREATE_COMPLETE | 2017-05-02T19:42:28Z |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+

***Sandhya: The above is the output for the "failed" compute node. Here we see that CREATE_FAILED for 1 compute and in line with the earlier output.******

[stack@B6-DIRECTOR share]$ heat deployment-show bb7cddf6-11b0-4f3e-958f-9e21ea4e45d4
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED", 
  "server_id": "057e992f-548e-4744-903a-a1bd9d5ffba5", 
  "config_id": "e9f1b20c-121f-4d5e-a706-bba5c85afdfd", 
  "output_values": {
    "deploy_stdout": "Matching apachectl 'Server version: Apache/2.4.6 (Red Hat Enterprise Linux)\nServer built:   Aug  3 2016 08:33:27'\n", 
    "deploy_stderr": "exception: connect failed\n\u001b[1;31mError: Must pass ucsm_ip to Class[Neutron::Plugins::Ml2::Cisco::Ucsm] at /etc/puppet/modules/tripleo/manifests/profile/base/neutron/plugins/ml2.pp:55 on node overcloud-compute-0.localdomain\u001b[0m\n\u001b[1;31mError: Must pass ucsm_ip to Class[Neutron::Plugins::Ml2::Cisco::Ucsm] at /etc/puppet/modules/tripleo/manifests/profile/base/neutron/plugins/ml2.pp:55 on node overcloud-compute-0.localdomain\u001b[0m\n", 
    "deploy_status_code": 1
  }, 
  "creation_time": "2017-05-02T19:42:29Z", 
  "updated_time": "2017-05-02T19:43:02Z", 
  "input_values": {
    "step": 3, 
    "update_identifier": "1493752681"
  }, 
  "action": "CREATE", 
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", 
  "id": "bb7cddf6-11b0-4f3e-958f-9e21ea4e45d4"
}

***Sandhya: More interesting stuff here because there is an attempt to initialize the ML2 UCSM plugin on the compute when it should be done only on the controller.

On the 3 Controllers, found that /etc/neutron/plugins/ml2/ml2_conf.ini contains correct the UCSM plugin config items. So, this definitely is an improvement after the patch.

Comment 20 Steven Hardy 2017-05-03 08:10:41 UTC
Ok let me try to summarize the current status:

1. https://review.openstack.org/461734 worked, e.g it fixed the originally reported issue which was that /etc/puppet/hieradata/neutron_cisco_data.yaml was not created before we ran puppet

2. On the 3 Controllers, /etc/neutron/plugins/ml2/ml2_conf.ini is correctly configured, which proves the hieradata is correctly written and puppet applied cleanly

3. There's now a different problem to the one originally reported, which is that puppet tries to configure the ML2 plugin on the compute nodes, but it should only be on the controllers?

I think to debug (3) we need the fully openstack overcloud deploy command used (e.g all the environment files which were passed, and any roles_data file if specified -  Sandhya Dasu can you please provide that?

Comment 21 Steven Hardy 2017-05-03 08:46:43 UTC
Ok thanks to some information from  Sid Ahmed Sadouni on IRC (thanks!) I think I found the issue:

https://review.openstack.org/#/c/338315/ adjusted the way we apply neutron puppet profiles on compute nodes to align with the composable services interface, but it didn't update the environments/neutron-ml2-cisco-nexus-ucsm.yaml file to disable the OS::TripleO::Services::ComputeNeutronCorePlugin as was done for various other vendor plugins.

So I think perhaps adding the following (either to environments/neutron-ml2-cisco-nexus-ucsm.yaml or some other file may solve the problem:

resource_registry:
  OS::TripleO::Services::ComputeNeutronOvsAgent: OS::Heat::None
  OS::TripleO::Services::ComputeNeutronCorePlugin: OS::Heat::None

That will disable the ml2 and ovs puppet configuration completely on the compute nodes, which it seems is what is desired here?

https://review.openstack.org/462035 posted which does this, but I've got no way to really test it, so any feedback welcome before we go ahead and merge it, thanks!

Comment 22 Sid Ahmed Sadouni 2017-05-03 11:37:09 UTC
Quick update, we are no more facing this issue following #c21.

Comment 23 Sandhya Dasu 2017-05-03 14:44:15 UTC
Thanks Steven and Sid Ahmed Sadouni for the potential fix.
Steven, your summary in #c20 is accurate.
I have made changes corresponding to https://review.openstack.org/#/c/338315/ locally and fired off another run. I'll update with results soon

Comment 24 Sandhya Dasu 2017-05-03 14:47:07 UTC
Correcting link to review from my earlier comment.
I am using the changes in https://review.openstack.org/#/c/462035/.

Comment 25 Sandhya Dasu 2017-05-03 16:08:36 UTC
Fix provided in https://review.openstack.org/#/c/462035/ still did not work for us. One compute host still tries to configure the plugins and overcloud install fails. Provided comments to the review. The config files that the plugins need are not being generated on the compute hosts which is the correct behavior. But, the plugin initialization is still being attempted.

Comment 27 Steven Hardy 2017-05-08 19:59:04 UTC
https://review.openstack.org/#/c/462035/ has some conflicting feedback compared to comment #25 - Sandhya can you pls confirm if we can proceed with landing that, and what issues remain to enable fully functioning ucsm integration, thanks!

Comment 28 Sandhya Dasu 2017-05-08 20:32:58 UTC
Issue fixed by https://review.openstack.org/#/c/462035/ is good to be merged. I am able to make progress and will update if I hit any other issues.

Comment 29 Sid Ahmed Sadouni 2017-05-09 08:57:33 UTC
Are we sure that fix proposed in https://review.openstack.org/#/c/462035/ is what we want/need ?
Because, ok, the stack completition is ok, but after that we are unable to boot an instance on this compute node, we got neutron error. (not talking about sr-iov at this point, but just cisco ucsm ml2 mechanism driver enabled).

What are your thoughts ?
Thanks

Comment 30 Steven Hardy 2017-05-09 16:04:15 UTC
> we are unable to boot an instance on this compute node, we got neutron error

Please provide the error, it's very difficult to offer any opinion on whether the fix I proposed is related unless we see what actually went wrong, thanks!

Comment 31 Sid Ahmed Sadouni 2017-05-10 08:50:01 UTC
Hi, you will find below a step by stepb of what the customer does after the overcloud is deployed, with neutron ml2 cisco ucsm and the patch proposed here.


openstack commands : http://pastebin.test.redhat.com/482779

Logs : http://pastebin.test.redhat.com/482775

Error :
2017-05-10 08:29:44.127 160788 ERROR neutron.plugins.ml2.managers [req-125c295d-f0c2-4956-b0ee-bb672e805f3b b871afb5882549dca9d05a7437884ee8 cbfeecb828084f0484764bf4cea16682 - - -] Failed to bind port e2f59255-f498-47cc-bc19-68e1389021d5 on host vel1-nfv01-kvm0201.nfv.private.customer.com for vnic_type normal using segments [{'segmentation_id': 92, 'physical_network': None, 'id': u'afc5ce57-5ff6-4ad5-a346-a3a57351c51c', 'network_type': u'vxlan'}]

Thanks

Comment 32 Sandhya Dasu 2017-05-10 15:35:56 UTC
The plugin does not support VxLAN yet. So, this is not a valid issue in this thread.

(In reply to Sid Ahmed Sadouni from comment #31)
> Hi, you will find below a step by stepb of what the customer does after the
> overcloud is deployed, with neutron ml2 cisco ucsm and the patch proposed
> here.
> 
> 
> openstack commands : http://pastebin.test.redhat.com/482779
> 
> Logs : http://pastebin.test.redhat.com/482775
> 
> Error :
> 2017-05-10 08:29:44.127 160788 ERROR neutron.plugins.ml2.managers
> [req-125c295d-f0c2-4956-b0ee-bb672e805f3b b871afb5882549dca9d05a7437884ee8
> cbfeecb828084f0484764bf4cea16682 - - -] Failed to bind port
> e2f59255-f498-47cc-bc19-68e1389021d5 on host
> vel1-nfv01-kvm0201.nfv.private.customer.com for vnic_type normal using
> segments [{'segmentation_id': 92, 'physical_network': None, 'id':
> u'afc5ce57-5ff6-4ad5-a346-a3a57351c51c', 'network_type': u'vxlan'}]
> 
> Thanks

Comment 33 Sid Ahmed Sadouni 2017-05-15 17:36:10 UTC
The 2 patches proposed here by Steven Hardy [1] and [2] are working and permit to successfully deploy the overcloud 

[1] https://review.openstack.org/#/c/461734/
[2] https://review.openstack.org/#/c/462035/

After that, we will need to figure out how can we manage creation of 'regular' VMs on such compute node as we have deactivated 
OS::TripleO::Services::ComputeNeutronCorePlugin: OS::Heat::None
and 
OS::TripleO::Services::ComputeNeutronOvsAgent: OS::Heat::None

I may open a new bz for this.

Comment 34 Sandhya Dasu 2017-05-15 17:40:23 UTC
Can the patches mentioned below be back ported to OSP10?

(In reply to Sid Ahmed Sadouni from comment #33)
> The 2 patches proposed here by Steven Hardy [1] and [2] are working and
> permit to successfully deploy the overcloud 
> 
> [1] https://review.openstack.org/#/c/461734/
> [2] https://review.openstack.org/#/c/462035/
> 
> After that, we will need to figure out how can we manage creation of
> 'regular' VMs on such compute node as we have deactivated 
> OS::TripleO::Services::ComputeNeutronCorePlugin: OS::Heat::None
> and 
> OS::TripleO::Services::ComputeNeutronOvsAgent: OS::Heat::None
> 
> I may open a new bz for this.

Comment 35 Red Hat Bugzilla Rules Engine 2017-05-16 12:19:31 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 36 Sid Ahmed Sadouni 2017-05-17 10:01:41 UTC
Update :

it seems that in patch https://review.openstack.org/#/c/462035/ only :
OS::TripleO::Services::ComputeNeutronCorePlugin: OS::Heat::None is required.

In fact, if we want to keep the ability to have instances using ovs ml2 agent, we need to keep open vswitch agent on compute node as well, so :
OS::TripleO::Services::ComputeNeutronOvsAgent: OS::Heat::None is not required.

Commenting this line, and running a fresh deployment is OK and permit to configure cisco_ucsm mechanism driver and keep ovs ml2 capacities on compute nods.

Test : booting an instance on a vxlan network is OK. Previously it was not possible as my comment #c31

Comment 37 Sandhya Dasu 2017-05-17 13:50:28 UTC
Actually, this was a comment/question I had added to the bug https://review.openstack.org/#/c/462035/. We do need the IVS agent to be running on the compute node.

Comment 38 Sandhya Dasu 2017-05-24 15:38:34 UTC
After adding fixes in https://review.openstack.org/#/c/462035 and https://review.openstack.org/#/c/461734 to my OSP10 setup,

I see the following traceback:

2017-05-23 19:57:15.910 27015 INFO heat.engine.stack [req-20b1040b-6552-4c35-89aa-50d1ade8dd40 769faf3de093422aa78e499843beeff9 a34f658edfe744fc98171231ef8d65cd - - -] Exception in stack validation
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack Traceback (most recent call last):
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack   File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 824, in validate
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack     result = res.validate()
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack   File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 64, in validate
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack     self.validate_nested_stack()
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack   File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/resource_group.py", line 278, in validate_nested_stack
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack     if not self.get_size():
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack   File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/software_deployment.py", line 663, in get_size
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack     return len(self.properties[self.SERVERS])
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack   File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 471, in __getitem__
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack     return self._get_property_value(key)
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack   File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 464, in _get_property_value
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack     return self.get_user_value(key, validate, template=template)
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack   File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 456, in get_user_value
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack     raise ValueError(six.text_type(e))
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack ValueError: "" is not a map
2017-05-23 19:57:15.910 27015 ERROR heat.engine.stack
2017-05-23 19:57:15.911 27015 DEBUG heat.engine.stack [req-20b1040b-6552-4c35-89aa-50d1ade8dd40 769faf3de093422aa78e499843beeff9 a34f658edfe744fc98171231ef8d65cd - - -] Failed to validate: resources.AllNodesExtraConfig: "" is not a map validate /usr/lib/python2.7/site-packages/heat/engine/stack.py:828
2017-05-23 19:57:15.912 27015 DEBUG oslo_messaging.rpc.server [req-20b1040b-6552-4c35-89aa-50d1ade8dd40 769faf3de093422aa78e499843beeff9 a34f658edfe744fc98171231ef8d65cd - - -] Expected exception during message handling () _process_incoming /usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py:136

Comment 39 Sandhya Dasu 2017-06-13 15:38:04 UTC
After these patches were back ported to OSP10, I don't see the above traceback anymore.

Comment 42 Pierre-Andre MOREY 2017-07-25 13:04:55 UTC
Hi Alex,

I just read the file neutron-ml2-cisco-nexus-ucsm.yaml on the 5.2.0-21.el7ost, and I don't see the lines that should be included by the patchset. Current content is:

# A Heat environment file which can be used to enable a
# a Cisco Neutron plugin.
resource_registry:
  OS::TripleO::AllNodesExtraConfig: ../puppet/extraconfig/all_nodes/neutron-ml2-cisco-nexus-ucsm.yaml

parameter_defaults:
  NetworkUCSMIp: '127.0.0.1'
  NetworkUCSMUsername: 'admin'
  NetworkUCSMPassword: 'password'
  NetworkUCSMHostList: '12:34:56:78:9a:bc:profile1, 12:34:56:78:9a:de:profile2'
  NetworkUCSMSupportedPciDevs: ''
  NetworkNexusConfig: {}
  NetworkNexusManagedPhysicalNetwork: ''
  NetworkNexusVlanNamePrefix: 'q-'
  NetworkNexusSviRoundRobin: 'false'
  NetworkNexusProviderVlanNamePrefix: 'p-'
  NetworkNexusPersistentSwitchConfig: 'false'
  NetworkNexusSwitchHeartbeatTime: 0
  NetworkNexusSwitchReplayCount: 3
  NetworkNexusProviderVlanAutoCreate: 'true'
  NetworkNexusProviderVlanAutoTrunk: 'true'
  NetworkNexusVxlanGlobalConfig: 'false'
  NetworkNexusHostKeyChecks: 'false'
  NetworkNexusVxlanVniRanges: '0:0'
  NetworkNexusVxlanMcastRanges: '0.0.0.0:0.0.0.0'

Comment 43 Pierre-Andre MOREY 2017-07-25 13:20:05 UTC
Hi Alex,

To be more clear, you need to add:
OS::TripleO::Services::ComputeNeutronCorePlugin: OS::Heat::None is required

Regards,
Pierre-André

Comment 44 Pierre-Andre MOREY 2017-07-25 13:23:47 UTC
Hi Steven,

Is there any reason for 
OS::TripleO::Services::ComputeNeutronCorePlugin: OS::Heat::None is required
to have been removed?

Regards,
Pierre-André

Comment 45 Gonéri Le Bouder 2017-07-25 13:32:12 UTC
Hi Pierre-Andre,

This is the review https://review.openstack.org/#/c/480714/ . The compute nodes still use OVS internally.

Comment 46 Gonéri Le Bouder 2017-07-26 13:09:33 UTC
Just to clarify, you will also need the following extra configuration in your template:

ControllerExtraConfig:
    neutron::plugins::ml2::mechanism_drivers: ['openvswitch', 'cisco_ucsm', 'cisco_nexus']

  NovaComputeExtraConfig:
    neutron::plugins::ml2::mechanism_drivers: ['openvswitch']

Comment 47 Scott Lewis 2017-08-11 15:16:57 UTC
planned to be included in the next maintenance release.

Comment 48 Andreas Karis 2017-08-11 15:27:41 UTC
Hi, 

I removed the "Fixed in Version" because this was clearly not fixed in that version. The issue still exists in openstack-tripleo-heat-templates-5.2.0-25.el7ost

Can you please provide the correct "Fixed in Version"? 

Thanks!

Andreas

Comment 49 Andreas Karis 2017-08-11 15:31:02 UTC
Made a bunch of private comments public.

Comment 53 errata-xmlrpc 2017-09-06 17:09:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2654


Note You need to log in before you can comment on or make changes to this bug.