Bug 1764470 - Update unable to proceed due to validation of networks failure
Summary: Update unable to proceed due to validation of networks failure
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-23 06:30 UTC by Brendan Shephard
Modified: 2019-10-24 01:00 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-24 01:00:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Brendan Shephard 2019-10-23 06:30:14 UTC
Description of problem:
Customer is trying to perform a minor update of OSP13 and receiving the following error:

InvalidConfiguration: Missing networks from environment configuration. Ensure the following networks are properly configured in the provided environment files [set([u'OS::TripleO::Network::StorageNFS', u'OS::TripleO::Network::StorageMgmt', u'OS::TripleO::Network::BaseManagement'])]

It looks like there are networks defined in the environment, that are not in the network_data.yaml file. The networks in the overcloud environment don't exist in Neutron or the Heat DB.  

It looks like an update was performed at some stage using the wrong network_data.yaml file and now we are unable to proceed due to the validation here failing:
https://github.com/openstack/python-tripleoclient/blob/stable/queens/tripleoclient/utils.py#L603-L616

Version-Release number of selected component (if applicable):
OSP13


How reproducible:
100%

Steps to Reproduce:
1. openstack overcloud update prepare
2. observe failure
3.

Actual results:

InvalidConfiguration: Missing networks from environment configuration. Ensure the following networks are properly configured in the provided environment files [set([u'OS::TripleO::Network::StorageNFS', u'OS::TripleO::Network::StorageMgmt', u'OS::TripleO::Network::BaseManagement'])]

Expected results:
I should be able to at least define these networks as:

resource_registry:
  OS::TripleO::Network::BaseManagement: OS::Heat::None
  OS::TripleO::Network::StorageMgmt: OS::Heat::None
  OS::TripleO::Network::StorageNFS: OS::Heat::None

To get passed this issue.

Additional info:
Since the validation runs before the templates are uploaded, I can't fix this issue by updating the templates. The only solution I can think of is to add these unwanted networks to network_data.yaml make sure that the environment files will match what is expected in the environment. But that will leave them with 3 extra networks that don't currently exist in the overcloud.

My request here is to know if I can possibly comment out the validation check from tripleoclient to get the deployment moving? I would like to avoid having to define the unwanted networks if at all possible. Since we know they don't exist in Neutron or Heat currently, I don't want to risk having the deployment trying to add these networks to the production overcloud.

I can see the environment information that forms stack_nets comes from basically this:
https://github.com/openstack/python-tripleoclient/blob/stable/queens/tripleoclient/utils.py#L606:
openstack stack environment show overcloud | grep OS::TripleO::Network:: | grep -v OS::TripleO::Network::Ports | grep -v OS::Heat::None

And it's defined in mysql under heat.raw_template. This is a massive field and I couldn't possible update it without causing more issues. So looking for a way to progress without creating a mess of unused networks that could cause other issues if we just try defining them in network_data.yaml.

Comment 1 Brendan Shephard 2019-10-23 23:33:42 UTC
Reproducer steps are:

1. Deploy the overcloud and enable the management network;
2. Delete the management ports, subnets and network
3. For full consistency with the environment in question, delete the ManagementNetwork and ManagementSubnet from heat.resources in mysql
4. Change the name of the management network in network_data.yaml to something like Manage and for consistency, I also changed the lower case version to manage as well
5. re-run the overcloud deploy

Now we observe the error:

(undercloud) [stack@undercloud-0 ~]$ ./overcloud_deploy.sh 
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 3b2125fb-e077-479c-87ae-6784f295e672
Waiting for messages on queue 'tripleo' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: c6879bff-6d22-4c3d-9136-9e220c7c35ed
Plan updated.
Processing templates in the directory /tmp/tripleoclient-EcaykL/tripleo-heat-templates
Missing networks from environment configuration. Ensure the following networks are properly configured in the provided environment files [set([u'OS::TripleO::Network::Management'])]
nt'])]


Two solutions I am testing. First one, is to just simply change the name back to Management and management and then re-run the deployment. This solution works, but it creates the network again which in the specific use case on this BZ is not deiserable. But it works.

The second solution I'm testing is commenting out the validation in tripleoclient/utils.py:
    def _get_networks(registry):
        nets = set()
        for k, v in six.iteritems(registry):
            if (k.startswith('OS::TripleO::Network::')
                and not k.startswith('OS::TripleO::Network::Port')
                    and v != 'OS::Heat::None'):
                nets.add(k)
        return nets

    stack_registry = stack.environment().get('resource_registry', {})
    env_registry = environment.get('resource_registry', {})

    stack_nets = _get_networks(stack_registry)
    env_nets = _get_networks(env_registry)

    env_diff = set(stack_nets) - set(env_nets)
    #if env_diff:
    #    raise exceptions.InvalidConfiguration('Missing networks from '
    #                                          'environment configuration. '
    #                                          'Ensure the following networks '
    #                                          'are properly configured in '
    #                                          'the provided environment files '
    #                                          '[{}]'.format(env_diff))


So far, this seems to be progressing although I haven't dug too much deeper into the potential consequences of doing this. I think it should be fine provided that the network really isn't there, but I will soon find out if something interesting happens as a result.

Comment 2 Brendan Shephard 2019-10-24 01:00:16 UTC
Commenting out the validation works and it just created the network with the wrong name in addition to the "Manage" on. I assume it will also work if I simply remove the network as well.

I think the best solution here is just to include the network_data.yaml file with all of the networks that are expected according to:
openstack stack environment show overcloud | grep OS::TripleO::Network:: | grep -v OS::TripleO::Network::Ports | grep -v OS::Heat::None

Commenting out the validation doesn't remove the bad networks anyway, it just creates the new on as well. So the issue is still going to exist even if we tried commenting it out:
(undercloud) [stack@undercloud-0 ~]$ openstack stack environment show overcloud | grep OS::TripleO::Network:: | grep -v OS::TripleO::Network::Ports | grep -v OS::Heat::None
  OS::TripleO::Network::External: http://192.168.24.1:8080/v1/AUTH_e71af90cb44b416a813969996e17ecb4/overcloud/network/external.yaml
  OS::TripleO::Network::InternalApi: http://192.168.24.1:8080/v1/AUTH_e71af90cb44b416a813969996e17ecb4/overcloud/network/internal_api.yaml
  OS::TripleO::Network::Manage: http://192.168.24.1:8080/v1/AUTH_e71af90cb44b416a813969996e17ecb4/overcloud/network/manage.yaml          <<<---- New one
  OS::TripleO::Network::Management: http://192.168.24.1:8080/v1/AUTH_e71af90cb44b416a813969996e17ecb4/overcloud/network/management.yaml  <<<---- Original
  OS::TripleO::Network::Storage: http://192.168.24.1:8080/v1/AUTH_e71af90cb44b416a813969996e17ecb4/overcloud/network/storage.yaml
  OS::TripleO::Network::StorageMgmt: http://192.168.24.1:8080/v1/AUTH_e71af90cb44b416a813969996e17ecb4/overcloud/network/storage_mgmt.yaml
  OS::TripleO::Network::Tenant: http://192.168.24.1:8080/v1/AUTH_e71af90cb44b416a813969996e17ecb4/overcloud/network/tenant.yaml


Let's go with the adding the networks. I created this solution article for it:
https://access.redhat.com/solutions/4526651


We might want to consider a code solution for how to work around this. But I think at this stage, it's a fairly specific issue and we can address it as an RFE rather than this Urgent BZ. I'll close it off accordingly.


Note You need to log in before you can comment on or make changes to this bug.