Bug 1239130 - [RFE] Heat environment sanity check
Summary: [RFE] Heat environment sanity check
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 10.0 (Newton)
Assignee: Hugh Brock
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-03 16:47 UTC by Marius Cornea
Modified: 2016-09-30 07:58 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
The director does not provide network validation before or during a deployment. This means a deployment with a bad network configuration can run for two hours with no output and can result in failure. A network validation script is currently in development and will be released in the future.
Clone Of:
Environment:
Last Closed: 2016-09-30 07:58:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marius Cornea 2015-07-03 16:47:37 UTC
Description of problem:

Today we hit an issue when the overcloud deployment timed out after 2 hours, caused by a broken network template for the compute role(containing port for the storage management network). It would be nice to get a sanity check before deployment that validates the heat environment so you don't get to wait 2 hours to fix a problem that shows up in the early stages of the deployment.

Comment 3 chris alfonso 2015-07-06 16:09:23 UTC
Please provide the exact steps and failure to see if we can add checks.

Comment 5 Marius Cornea 2015-07-07 09:44:03 UTC
Deploy overcloud by passing -e ~/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml.

/usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml is the default

network-environment.yaml includes this compute.yaml nic template[1]. The template contains a vlan interface with IP address from StorageMgmtIpSubnet but in network-isolation.yaml compute role doesn't have a port for StorageMgmt. 

As a result this error[2] shows up on the deployed compute nodes.

[1] http://pastebin.test.redhat.com/295175
[2] http://pastebin.test.redhat.com/295181

Comment 6 Marius Cornea 2015-07-13 18:38:38 UTC
Another check we should cover:

When creating ovs bridges only one interface should be part of the ovs bridge if bonds are not used. 

Here's an example of bad template which might lead to loops:

resources:
  OsNetConfigImpl:
    type: OS::Heat::StructuredConfig
    properties:
      group: os-apply-config
      config:
        os_net_config:
          network_config:
            -
              type: ovs_bridge
              name: br-storage
              use_dhcp: true
              members:
                -
                  type: interface
                  name: eth0
                  use_dhcp: false
                -
                  type: interface
                  name: eth1
                  use_dhcp: false
                -
                  type: interface
                  name: eth2
                  # force the MAC address of the bridge to this interface
                  primary: true
                  addresses:
                  -
                    ip_netmask: {get_param: StorageIpSubnet}
                  -
                    ip_netmask: {get_param: StorageMgmtIpSubnet}

Comment 10 Jaromir Coufal 2016-09-30 07:58:45 UTC
This should be fixed with validations.


Note You need to log in before you can comment on or make changes to this bug.