Bug 1637150 - Bad network cidr for the net_ip_map network subnets when not using network-isolation
Summary: Bad network cidr for the net_ip_map network subnets when not using network-is...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 14.0 (Rocky)
Assignee: Emilien Macchi
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-08 18:55 UTC by Alex Schultz
Modified: 2019-01-11 11:53 UTC (History)
12 users (show)

Fixed In Version: openstack-heat-11.0.1-0.20180921133343.9c20465.el7ost
Doc Type: Bug Fix
Doc Text:
Cause: OSP 14 puddles with version number 2018-10-02.2 have a regression that causes failures when not using network isolation due to an incorrect version of openstack-heat-engine. Consequence: Deployments of puddle 2018-10-02.2 fail with an error when not using network isolation. Fix: The version of openstack-heat-engine has been corrected in puddle 2018-10-02.3 and later. Result: Puddles with version 2018-10-02.3 or higher should work as expected.
Clone Of:
Environment:
Last Closed: 2019-01-11 11:53:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:53:48 UTC

Description Alex Schultz 2018-10-08 18:55:27 UTC
Description of problem:
If you do not pass in network-isolation.yaml into the deployment, the defaults are no longer correct.  This results in a bad CIDR being used for the <network>_subnet data.  The net_ip_map from the deployment data looks like...

        "net_ip_map": {
          "tenant": "192.168.24.12",
          "management": "192.168.24.12",
          "tenant_uri": "192.168.24.12",
          "ctlplane_uri": "192.168.24.12",
          "management_uri": "192.168.24.12",
          "management_subnet": "192.168.24.12/None",
          "storage": "192.168.24.12",
          "internal_api_subnet": "192.168.24.12/None",
          "storage_subnet": "192.168.24.12/None",
          "external_subnet": "192.168.24.12/None",
          "ctlplane": "192.168.24.12",
          "storage_mgmt_subnet": "192.168.24.12/None",
          "external": "192.168.24.12",
          "ctlplane_subnet": "192.168.24.12/None",
          "storage_mgmt": "192.168.24.12",
          "internal_api_uri": "192.168.24.12",
          "external_uri": "192.168.24.12",
          "storage_uri": "192.168.24.12",
          "internal_api": "192.168.24.12",
          "storage_mgmt_uri": "192.168.24.12",
          "tenant_subnet": "192.168.24.12/None"
        },


Version-Release number of selected component (if applicable):
openstack-tripleo-common-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch
openstack-tripleo-image-elements-9.0.0-0.20180831210308.2dc678a.el7ost.noarch
openstack-tripleo-validations-9.3.1-0.20180831205306.el7ost.noarch
ansible-role-tripleo-modify-image-1.0.1-0.20180915144057.cb535e9.el7ost.noarch
openstack-tripleo-common-containers-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch
python-tripleoclient-10.5.1-0.20180906012842.el7ost.noarch
puppet-tripleo-9.3.1-0.20180831202649.8ec6c86.el7ost.noarch
ansible-tripleo-ipsec-9.0.1-0.20180827143021.d2b9234.el7ost.noarch
python2-tripleo-common-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch
python-tripleoclient-heat-installer-10.5.1-0.20180906012842.el7ost.noarch
openstack-tripleo-puppet-elements-9.0.0-0.20180906013709.daf9069.el7ost.noarch
openstack-tripleo-heat-templates-9.0.0-0.20180919080941.0rc1.0rc1.el7ost.noarch

How reproducible:

Every time.

Steps to Reproduce:
1. Deploy OSP14 undercloud and setup the hosts
2. Run openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/enable-swap.yaml -e /home/cloud-user/container-image-prepare.yaml -e /home/cloud-user/overcloud_parameters.yaml


Actual results:

Deployment fails with...
TASK [Debug output for task: Run puppet host configuration for step 1] *********
Monday 08 October 2018  14:35:08 -0400 (0:00:16.455)       0:04:13.134 ******** 
fatal: [overcloud-controller-0]: FAILED! => {
    "failed_when_result": true, 
    "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
        "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend", 
        "Notice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.", 
        "Notice: Compiled catalog for overcloud-controller-0.localdomain in environment production in 2.74 seconds", 
        "Warning: Undefined variable '::deploy_config_name'; ", 
        "   (file & line not available)", 
        "Warning: Undefined variable 'deploy_config_name'; ", 
        "Warning: This method is deprecated, please use the stdlib validate_legacy function,", 
        "                    with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 54]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", 
        "   (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:28:in `deprecation')", 
        "                    with Stdlib::Compat::Absolute_Path. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 55]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", 
        "                    with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 56]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", 
        "                    with Stdlib::Compat::Array. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 66]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", 
        "                    with Pattern[]. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 68]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", 
        "                    with Stdlib::Compat::Numeric. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 76]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", 
        "Warning: tag is a metaparam; this value will inherit to all contained resources in the tripleo::firewall::rule definition", 
        "                    with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/tripleo/manifests/firewall/rule.pp\", 148]:", 
        "Error: Parameter source failed on Firewall[121 memcached ipv4]: Munging failed for value \"192.168.24.12/None\" in class source: host_to_ip failed for 192.168.24.12/None, exception Failed to resolve hostname 192.168.24.12/None at /etc/puppet/modules/tripleo/manifests/firewall/rule.pp:162"


Expected results:
<network>_subnet should not be <ip>/None

Comment 1 Alex Schultz 2018-10-08 18:57:45 UTC
Created attachment 1491743 [details]
overcloud_parameters.yaml

Comment 2 Alex Schultz 2018-10-08 18:58:01 UTC
Created attachment 1491744 [details]
container-image-prepare.yaml

Comment 3 Alex Schultz 2018-10-08 18:58:15 UTC
Created attachment 1491745 [details]
config-download.tgz

Comment 4 Alex Schultz 2018-10-08 20:07:00 UTC
Workaround is to manually provide ControlPlaneSubnetCidr: 24

Comment 5 Dan Macpherson 2018-10-10 03:39:40 UTC
I've found a similar issue even with network-isolation.yaml included. The Control Plane CIDR is None. Adding ControlPlaneSubnetCidr: 24 doesn't seem to work.

My custom network environment file:

[stack@ccsosp-undercloud ~]$ cat ccsosp-templates/network_customizations.yaml 
resource_registry:
  OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/ccsosp-templates/custom-nics/controller.yaml
  OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/ccsosp-templates/custom-nics/compute.yaml
  OS::TripleO::CephStorage::Net::SoftwareConfig: /home/stack/ccsosp-templates/custom-nics/ceph-storage.yaml

parameter_defaults:
  NeutronExternalNetworkBridge: "''"
  ControlPlaneDefaultRoute: 192.0.2.5
  ControlPlaneSubnetCidr: '24'
  ... etc...

And on the failed Controller node, the ip_netmask is "192.0.2.103/None" in the os-net-config file:

[heat-admin@overcloud-controller-0 ~]$ cat /etc/os-net-config/config.json 
{"network_config": [{"addresses": [{"ip_netmask": "10.65.176.102/23"}], "dns_servers": ["10.64.63.6", "10.64.5.26"], "members": [{"name": "nic1", "type": "interface"}], "name": "br-ex", "routes": [{"default": true, "next_hop": "10.65.177.254"}], "type": "ovs_bridge", "use_dhcp": false}, {"addresses": [{"ip_netmask": "192.0.2.103/None"}], "dns_servers": ["10.64.63.6", "10.64.5.26"], "name": "nic2", "primary": false, "routes": [{"ip_netmask": "169.254.169.254/32", "next_hop": "192.0.2.5"}], "type": "interface", "use_dhcp": false}, {"members": [{"name": "nic3", "primary": false, "type": "interface"}, {"addresses": [{"ip_netmask": "172.16.2.15/24"}], "routes": [], "type": "vlan", "vlan_id": 201}, {"addresses": [{"ip_netmask": "172.16.1.4/24"}], "routes": [], "type": "vlan", "vlan_id": 202}, {"addresses": [{"ip_netmask": "172.16.3.17/24"}], "routes": [], "type": "vlan", "vlan_id": 203}, {"addresses": [{"ip_netmask": "172.16.0.28/24"}], "routes": [], "type": "vlan", "vlan_id": 204}], "name": "br-vlans", "type": "ovs_bridge", "use_dhcp": false}]}

I know that for Rocky the ControlPlaneSubnetCidr params gets automatically set before the deployment kicks in (hence why it overrides my custom ControlPlaneSubnetCidr), but something doesn't seem to be passing that cidr to the deployment when you run "openstack overcloud deploy".

I'm using the latest puddle, which I think is linked to 2018-10-08.4.

Comment 6 Dan Macpherson 2018-10-10 13:22:30 UTC
Tried with the newest puddle (2018-10-10.1) and the control plane cidr still isn't being passed with network-isolation. Here's the traceback that appeared during the Ansible config execution:


"Traceback (most recent call last):", 
"  File \"/bin/os-net-config\", line 10, in <module>", 
"    sys.exit(main())", 
"  File \"/usr/lib/python2.7/site-packages/os_net_config/cli.py\", line 259, in main", 
"    obj = objects.object_from_json(iface_json)", 
"  File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 43, in object_from_json", 
"    return Interface.from_json(json)", 
"  File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 415, in from_json", 
"    opts = _BaseOpts.base_opts_from_json(json)", 
"  File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 364, in base_opts_from_json", 
"    addresses.append(Address.from_json(address))", 
"  File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 269, in from_json", 
"    return Address(ip_netmask)", 
"  File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 260, in __init__", 
"    ip_nw = netaddr.IPNetwork(self.ip_netmask)", 
"  File \"/usr/lib/python2.7/site-packages/netaddr/ip/__init__.py\", line 938, in __init__", 
"    raise AddrFormatError('invalid IPNetwork %s' % addr)", 
"netaddr.core.AddrFormatError: invalid IPNetwork 192.0.2.114/None",

Comment 7 Harald Jensås 2018-10-15 20:09:46 UTC
This looks like something that could be related to my patch: https://review.openstack.org/579579


Does this test have the correct version of Heat?
 Please log into your containers and look for this code:
   https://review.openstack.org/#/c/568960/


If that is not there the get_attr would return "None", explaining why we get "None" in the net_ip_map?

Comment 8 Harald Jensås 2018-10-15 20:12:51 UTC
The reason I'm asking to check that the heat code is on the system is taht we hit a similar issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1621333

Comment 9 Bob Fournier 2018-10-15 20:23:22 UTC
Please check that openstack-heat-11.0.1-0.20180901130821.680a515.el7ost or later is being used per Bug 1621333.

Comment 10 Bob Fournier 2018-10-15 20:55:48 UTC
It appears that puddle 2018-10-02.2 is using the wrong version of heat - its using openstack-heat-12.0.0-0.20180604085325.7d878a8.el7ost when it should be using the 11.0.1 version, i.e. openstack-heat-engine-11.0.1-0.20180921133343.9c20465.el7ost.noarch.rpm which the 2018-10-10.3 is using, see [1]

This same versioning problem happened before [2].

Please retest with puddle 2018-10-10.3 or newer.

[1] http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/14.0-RHEL-7/2018-10-10.3/RH7-RHOS-14.0/x86_64/os/Packages/.
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1621333#c14

Comment 11 Bob Fournier 2018-10-15 21:18:01 UTC
Setting FixedInVersion to proper version with fix which should be in 10-10 puddle.

Comment 13 Dan Macpherson 2018-10-18 08:08:13 UTC
Checked the latest puddle (2018-10-17.2) and seems to be working now with network isolation.

Comment 18 Alexander Chuzhoy 2018-11-13 17:02:25 UTC
Verified:

Environment:
openstack-heat-common-11.0.1-0.20181010161427.46aacab.el7ost.noarch
openstack-heat-agents-1.7.1-0.20180907213355.476aae2.el7ost.noarch
openstack-heat-engine-11.0.1-0.20181010161427.46aacab.el7ost.noarch
openstack-heat-monolith-11.0.1-0.20181010161427.46aacab.el7ost.noarch
openstack-heat-api-11.0.1-0.20181010161427.46aacab.el7ost.noarch



Successfully deployed OC without network isolation using:
openstack overcloud deploy --templates \
-e /home/stack/containers-prepare-parameter.yaml \
-e /home/stack/overcloud_parameters.yaml

(overcloud) [stack@undercloud ~]$ cat /home/stack/overcloud_parameters.yaml
parameter_defaults:
    NtpServer: ['clock.redhat.com']
    ControllerCount: 3
    OvercloudControllerFlavor: control
    ComputeCount: 1
    OvercloudComputeFlavor: compute

Comment 20 errata-xmlrpc 2019-01-11 11:53:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.