Description of problem: If you do not pass in network-isolation.yaml into the deployment, the defaults are no longer correct. This results in a bad CIDR being used for the <network>_subnet data. The net_ip_map from the deployment data looks like... "net_ip_map": { "tenant": "192.168.24.12", "management": "192.168.24.12", "tenant_uri": "192.168.24.12", "ctlplane_uri": "192.168.24.12", "management_uri": "192.168.24.12", "management_subnet": "192.168.24.12/None", "storage": "192.168.24.12", "internal_api_subnet": "192.168.24.12/None", "storage_subnet": "192.168.24.12/None", "external_subnet": "192.168.24.12/None", "ctlplane": "192.168.24.12", "storage_mgmt_subnet": "192.168.24.12/None", "external": "192.168.24.12", "ctlplane_subnet": "192.168.24.12/None", "storage_mgmt": "192.168.24.12", "internal_api_uri": "192.168.24.12", "external_uri": "192.168.24.12", "storage_uri": "192.168.24.12", "internal_api": "192.168.24.12", "storage_mgmt_uri": "192.168.24.12", "tenant_subnet": "192.168.24.12/None" }, Version-Release number of selected component (if applicable): openstack-tripleo-common-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch openstack-tripleo-image-elements-9.0.0-0.20180831210308.2dc678a.el7ost.noarch openstack-tripleo-validations-9.3.1-0.20180831205306.el7ost.noarch ansible-role-tripleo-modify-image-1.0.1-0.20180915144057.cb535e9.el7ost.noarch openstack-tripleo-common-containers-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch python-tripleoclient-10.5.1-0.20180906012842.el7ost.noarch puppet-tripleo-9.3.1-0.20180831202649.8ec6c86.el7ost.noarch ansible-tripleo-ipsec-9.0.1-0.20180827143021.d2b9234.el7ost.noarch python2-tripleo-common-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch python-tripleoclient-heat-installer-10.5.1-0.20180906012842.el7ost.noarch openstack-tripleo-puppet-elements-9.0.0-0.20180906013709.daf9069.el7ost.noarch openstack-tripleo-heat-templates-9.0.0-0.20180919080941.0rc1.0rc1.el7ost.noarch How reproducible: Every time. Steps to Reproduce: 1. Deploy OSP14 undercloud and setup the hosts 2. Run openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/enable-swap.yaml -e /home/cloud-user/container-image-prepare.yaml -e /home/cloud-user/overcloud_parameters.yaml Actual results: Deployment fails with... TASK [Debug output for task: Run puppet host configuration for step 1] ********* Monday 08 October 2018 14:35:08 -0400 (0:00:16.455) 0:04:13.134 ******** fatal: [overcloud-controller-0]: FAILED! => { "failed_when_result": true, "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [ "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend", "Notice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.", "Notice: Compiled catalog for overcloud-controller-0.localdomain in environment production in 2.74 seconds", "Warning: Undefined variable '::deploy_config_name'; ", " (file & line not available)", "Warning: Undefined variable 'deploy_config_name'; ", "Warning: This method is deprecated, please use the stdlib validate_legacy function,", " with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 54]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:28:in `deprecation')", " with Stdlib::Compat::Absolute_Path. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 55]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 56]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::Array. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 66]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Pattern[]. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 68]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::Numeric. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 76]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", "Warning: tag is a metaparam; this value will inherit to all contained resources in the tripleo::firewall::rule definition", " with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/tripleo/manifests/firewall/rule.pp\", 148]:", "Error: Parameter source failed on Firewall[121 memcached ipv4]: Munging failed for value \"192.168.24.12/None\" in class source: host_to_ip failed for 192.168.24.12/None, exception Failed to resolve hostname 192.168.24.12/None at /etc/puppet/modules/tripleo/manifests/firewall/rule.pp:162" Expected results: <network>_subnet should not be <ip>/None
Created attachment 1491743 [details] overcloud_parameters.yaml
Created attachment 1491744 [details] container-image-prepare.yaml
Created attachment 1491745 [details] config-download.tgz
Workaround is to manually provide ControlPlaneSubnetCidr: 24
I've found a similar issue even with network-isolation.yaml included. The Control Plane CIDR is None. Adding ControlPlaneSubnetCidr: 24 doesn't seem to work. My custom network environment file: [stack@ccsosp-undercloud ~]$ cat ccsosp-templates/network_customizations.yaml resource_registry: OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/ccsosp-templates/custom-nics/controller.yaml OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/ccsosp-templates/custom-nics/compute.yaml OS::TripleO::CephStorage::Net::SoftwareConfig: /home/stack/ccsosp-templates/custom-nics/ceph-storage.yaml parameter_defaults: NeutronExternalNetworkBridge: "''" ControlPlaneDefaultRoute: 192.0.2.5 ControlPlaneSubnetCidr: '24' ... etc... And on the failed Controller node, the ip_netmask is "192.0.2.103/None" in the os-net-config file: [heat-admin@overcloud-controller-0 ~]$ cat /etc/os-net-config/config.json {"network_config": [{"addresses": [{"ip_netmask": "10.65.176.102/23"}], "dns_servers": ["10.64.63.6", "10.64.5.26"], "members": [{"name": "nic1", "type": "interface"}], "name": "br-ex", "routes": [{"default": true, "next_hop": "10.65.177.254"}], "type": "ovs_bridge", "use_dhcp": false}, {"addresses": [{"ip_netmask": "192.0.2.103/None"}], "dns_servers": ["10.64.63.6", "10.64.5.26"], "name": "nic2", "primary": false, "routes": [{"ip_netmask": "169.254.169.254/32", "next_hop": "192.0.2.5"}], "type": "interface", "use_dhcp": false}, {"members": [{"name": "nic3", "primary": false, "type": "interface"}, {"addresses": [{"ip_netmask": "172.16.2.15/24"}], "routes": [], "type": "vlan", "vlan_id": 201}, {"addresses": [{"ip_netmask": "172.16.1.4/24"}], "routes": [], "type": "vlan", "vlan_id": 202}, {"addresses": [{"ip_netmask": "172.16.3.17/24"}], "routes": [], "type": "vlan", "vlan_id": 203}, {"addresses": [{"ip_netmask": "172.16.0.28/24"}], "routes": [], "type": "vlan", "vlan_id": 204}], "name": "br-vlans", "type": "ovs_bridge", "use_dhcp": false}]} I know that for Rocky the ControlPlaneSubnetCidr params gets automatically set before the deployment kicks in (hence why it overrides my custom ControlPlaneSubnetCidr), but something doesn't seem to be passing that cidr to the deployment when you run "openstack overcloud deploy". I'm using the latest puddle, which I think is linked to 2018-10-08.4.
Tried with the newest puddle (2018-10-10.1) and the control plane cidr still isn't being passed with network-isolation. Here's the traceback that appeared during the Ansible config execution: "Traceback (most recent call last):", " File \"/bin/os-net-config\", line 10, in <module>", " sys.exit(main())", " File \"/usr/lib/python2.7/site-packages/os_net_config/cli.py\", line 259, in main", " obj = objects.object_from_json(iface_json)", " File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 43, in object_from_json", " return Interface.from_json(json)", " File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 415, in from_json", " opts = _BaseOpts.base_opts_from_json(json)", " File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 364, in base_opts_from_json", " addresses.append(Address.from_json(address))", " File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 269, in from_json", " return Address(ip_netmask)", " File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 260, in __init__", " ip_nw = netaddr.IPNetwork(self.ip_netmask)", " File \"/usr/lib/python2.7/site-packages/netaddr/ip/__init__.py\", line 938, in __init__", " raise AddrFormatError('invalid IPNetwork %s' % addr)", "netaddr.core.AddrFormatError: invalid IPNetwork 192.0.2.114/None",
This looks like something that could be related to my patch: https://review.openstack.org/579579 Does this test have the correct version of Heat? Please log into your containers and look for this code: https://review.openstack.org/#/c/568960/ If that is not there the get_attr would return "None", explaining why we get "None" in the net_ip_map?
The reason I'm asking to check that the heat code is on the system is taht we hit a similar issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1621333
Please check that openstack-heat-11.0.1-0.20180901130821.680a515.el7ost or later is being used per Bug 1621333.
It appears that puddle 2018-10-02.2 is using the wrong version of heat - its using openstack-heat-12.0.0-0.20180604085325.7d878a8.el7ost when it should be using the 11.0.1 version, i.e. openstack-heat-engine-11.0.1-0.20180921133343.9c20465.el7ost.noarch.rpm which the 2018-10-10.3 is using, see [1] This same versioning problem happened before [2]. Please retest with puddle 2018-10-10.3 or newer. [1] http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/14.0-RHEL-7/2018-10-10.3/RH7-RHOS-14.0/x86_64/os/Packages/. [2] https://bugzilla.redhat.com/show_bug.cgi?id=1621333#c14
Setting FixedInVersion to proper version with fix which should be in 10-10 puddle.
Checked the latest puddle (2018-10-17.2) and seems to be working now with network isolation.
Verified: Environment: openstack-heat-common-11.0.1-0.20181010161427.46aacab.el7ost.noarch openstack-heat-agents-1.7.1-0.20180907213355.476aae2.el7ost.noarch openstack-heat-engine-11.0.1-0.20181010161427.46aacab.el7ost.noarch openstack-heat-monolith-11.0.1-0.20181010161427.46aacab.el7ost.noarch openstack-heat-api-11.0.1-0.20181010161427.46aacab.el7ost.noarch Successfully deployed OC without network isolation using: openstack overcloud deploy --templates \ -e /home/stack/containers-prepare-parameter.yaml \ -e /home/stack/overcloud_parameters.yaml (overcloud) [stack@undercloud ~]$ cat /home/stack/overcloud_parameters.yaml parameter_defaults: NtpServer: ['clock.redhat.com'] ControllerCount: 3 OvercloudControllerFlavor: control ComputeCount: 1 OvercloudComputeFlavor: compute
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045