Bug 1734155

Summary: roles_data.yaml must include "subnet:" under each network for a role. This can affect upgrade from previous release.
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: openstack-tripleo-heat-templatesAssignee: Dan Sneddon <dsneddon>
Status: CLOSED NOTABUG QA Contact: Alexander Chuzhoy <sasha>
Severity: medium Docs Contact:
Priority: medium    
Version: 15.0 (Stein)CC: bfournie, dsneddon, hjensas, mburns
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-12 19:35:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Chuzhoy 2019-07-29 19:18:12 UTC
roles_data.yaml must include "subnet:"  under each network for a role.  This can affect upgrade from previous release.

Environment:
openstack-tripleo-heat-templates-10.6.1-0.20190722170519.014b20c.el8ost.noarch

Here's an example of networks for the default controller role:
- name: Controller
  description: |
    Controller role that has all the controler services loaded and handles
    Database, Messaging and Network functions.
  CountDefault: 1
  tags:
    - primary
    - controller
  networks:
    External:
      subnet: external_subnet
    InternalApi:
      subnet: internal_api_subnet
    Storage:
      subnet: storage_subnet
    StorageMgmt:
      subnet: storage_mgmt_subnet
    Tenant:
      subnet: tenant_subnet


A deployment will fail during validation if the subnets are commented out. This can affect upgrades from previous releases where subnets weren't set.

Comment 5 Harald Jensås 2019-08-12 13:30:27 UTC
(In reply to Alexander Chuzhoy from comment #0)
> roles_data.yaml must include "subnet:"  under each network for a role.  This
> can affect upgrade from previous release.
> 
> Environment:
> openstack-tripleo-heat-templates-10.6.1-0.20190722170519.014b20c.el8ost.
> noarch
> 
> Here's an example of networks for the default controller role:
> - name: Controller
>   description: |
>     Controller role that has all the controler services loaded and handles
>     Database, Messaging and Network functions.
>   CountDefault: 1
>   tags:
>     - primary
>     - controller
>   networks:
>     External:
>       subnet: external_subnet
>     InternalApi:
>       subnet: internal_api_subnet
>     Storage:
>       subnet: storage_subnet
>     StorageMgmt:
>       subnet: storage_mgmt_subnet
>     Tenant:
>       subnet: tenant_subnet
> 
> 

This should not be a problem, prior to adding subnets the 'networks' of a role contained a list of network names, for example:

  networks:
    - External
    - InternalApi
    - Storage
    - StorageMgmt
    - Tenant

We have contition's checking if role.networks is a mapping or not already. If the role data from a previous release is used it would be a list and we fall-back to use {{network.name_lower}}_subnet.
A few examples where we ensure to support both the new and old format:
 https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/role.role.j2.yaml#L425-L434
 https://github.com/openstack/tripleo-heat-templates/blob/9119734f0adec3bfd41f5f4a50f08a8cb8610968/environments/network-isolation.j2.yaml#L23-L32
 https://github.com/openstack/tripleo-heat-templates/search?q=%22if+role.networks+is+mapping%22&unscoped_q=%22if+role.networks+is+mapping%22


> A deployment will fail during validation if the subnets are commented out.
> This can affect upgrades from previous releases where subnets weren't set.

I was'nt aware of this validation, can you share more details such as the exact validation error so I may find that pice of validation code?
IMO it's not a bad idea to do such validation, and halt the deployment so that we may remove all that compatiblity code in a feature release. But the error should be clear about it, and we need to document it.

Comment 6 Dan Sneddon 2019-08-12 17:13:30 UTC
Sasha, do you have the output from when the validation failed? The actual error message would be helpful here.

Comment 7 Alexander Chuzhoy 2019-08-12 18:10:36 UTC
Just removed the subnets from controller role:

(undercloud) [stack@undercloud-0 ~]$ diff templates/roles_data.yaml{,_orig}                                                                                                                   
16a17
>       subnet: external_subnet
17a19
>       subnet: internal_api_subnet
18a21
>       subnet: storage_subnet
19a23
>       subnet: storage_mgmt_subnet
20a25
>       subnet: tenant_subnet




(undercloud) [stack@undercloud-0 ~]$ bash overcloud_deploy.sh
Removing the current plan files
Uploading new plan files
Error rendering template puppet/controller-role.yaml : 'None' has no attribute 'get'
{'result': "Failure caused by error in tasks: notify_zaqar\n\n  notify_zaqar [task_ex_id=2e3cd0ab-06f8-4b17-8607-587bb1ac4d50] -> Workflow failed due to message status. Status:FAILED Message:Error rendering template puppet/controller-role.yaml : 'None' has no attribute 'get'\n    [wf_ex_id=a0118a77-38e0-4d9f-b801-b6837a2c0ef5, idx=0]: Workflow failed due to message status. Status:FAILED Message:Error rendering template puppet/controller-role.yaml : 'None' has no attribute 'get'\n", 'swift_container': 'overcloud-swift-rings', 'status': 'FAILED', 'message': "Error rendering template puppet/controller-role.yaml : 'None' has no attribute 'get'"}
Exception updating plan: {'result': "Failure caused by error in tasks: notify_zaqar\n\n  notify_zaqar [task_ex_id=2e3cd0ab-06f8-4b17-8607-587bb1ac4d50] -> Workflow failed due to message status. Status:FAILED Message:Error rendering template puppet/controller-role.yaml : 'None' has no attribute 'get'\n    [wf_ex_id=a0118a77-38e0-4d9f-b801-b6837a2c0ef5, idx=0]: Workflow failed due to message status. Status:FAILED Message:Error rendering template puppet/controller-role.yaml : 'None' has no attribute 'get'\n", 'swift_container': 'overcloud-swift-rings', 'status': 'FAILED', 'message': "Error rendering template puppet/controller-role.yaml : 'None' has no attribute 'get'"}

Comment 8 Dan Sneddon 2019-08-12 19:20:20 UTC
(In reply to Alexander Chuzhoy from comment #7)
> Just removed the subnets from controller role:
> 
> (undercloud) [stack@undercloud-0 ~]$ diff templates/roles_data.yaml{,_orig} 
> 
> 16a17
> >       subnet: external_subnet
> 17a19
> >       subnet: internal_api_subnet
> 18a21
> >       subnet: storage_subnet
> 19a23
> >       subnet: storage_mgmt_subnet
> 20a25
> >       subnet: tenant_subnet


Sasha, what did your role_data.yaml networks section look like? I would expect this:

  networks:
    - External
    - InternalApi
    - Storage
    - StorageMgmt
    - Tenant

But if the only difference was the missing subnet: lines, does that mean that it actually looks like this?

  networks:
    - External:
    - InternalApi:
    - Storage:
    - StorageMgmt:
    - Tenant:

That seems to be the only logical conclusion if the only difference was the missing "subnet: <network>_subnet" lines.

Comment 9 Alexander Chuzhoy 2019-08-12 19:35:02 UTC
Thanks for looking into it Dan,
I indeed forgot to remove the colons. Once removed - the deployment proceeds.
Closing as not a bug.