Bug 1637150
| Summary: | Bad network cidr for the net_ip_map network subnets when not using network-isolation | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alex Schultz <aschultz> |
| Component: | openstack-heat | Assignee: | Emilien Macchi <emacchi> |
| Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 14.0 (Rocky) | CC: | aschultz, bfournie, dmacpher, dsneddon, emacchi, fcharlie, hjensas, lhh, mburns, sasha, sbaker, shardy |
| Target Milestone: | beta | Keywords: | Triaged |
| Target Release: | 14.0 (Rocky) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-heat-11.0.1-0.20180921133343.9c20465.el7ost | Doc Type: | Bug Fix |
| Doc Text: |
Cause: OSP 14 puddles with version number 2018-10-02.2 have a regression that causes failures when not using network isolation due to an incorrect version of openstack-heat-engine.
Consequence: Deployments of puddle 2018-10-02.2 fail with an error when not using network isolation.
Fix: The version of openstack-heat-engine has been corrected in puddle 2018-10-02.3 and later.
Result: Puddles with version 2018-10-02.3 or higher should work as expected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-01-11 11:53:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Created attachment 1491743 [details]
overcloud_parameters.yaml
Created attachment 1491744 [details]
container-image-prepare.yaml
Created attachment 1491745 [details]
config-download.tgz
Workaround is to manually provide ControlPlaneSubnetCidr: 24 I've found a similar issue even with network-isolation.yaml included. The Control Plane CIDR is None. Adding ControlPlaneSubnetCidr: 24 doesn't seem to work.
My custom network environment file:
[stack@ccsosp-undercloud ~]$ cat ccsosp-templates/network_customizations.yaml
resource_registry:
OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/ccsosp-templates/custom-nics/controller.yaml
OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/ccsosp-templates/custom-nics/compute.yaml
OS::TripleO::CephStorage::Net::SoftwareConfig: /home/stack/ccsosp-templates/custom-nics/ceph-storage.yaml
parameter_defaults:
NeutronExternalNetworkBridge: "''"
ControlPlaneDefaultRoute: 192.0.2.5
ControlPlaneSubnetCidr: '24'
... etc...
And on the failed Controller node, the ip_netmask is "192.0.2.103/None" in the os-net-config file:
[heat-admin@overcloud-controller-0 ~]$ cat /etc/os-net-config/config.json
{"network_config": [{"addresses": [{"ip_netmask": "10.65.176.102/23"}], "dns_servers": ["10.64.63.6", "10.64.5.26"], "members": [{"name": "nic1", "type": "interface"}], "name": "br-ex", "routes": [{"default": true, "next_hop": "10.65.177.254"}], "type": "ovs_bridge", "use_dhcp": false}, {"addresses": [{"ip_netmask": "192.0.2.103/None"}], "dns_servers": ["10.64.63.6", "10.64.5.26"], "name": "nic2", "primary": false, "routes": [{"ip_netmask": "169.254.169.254/32", "next_hop": "192.0.2.5"}], "type": "interface", "use_dhcp": false}, {"members": [{"name": "nic3", "primary": false, "type": "interface"}, {"addresses": [{"ip_netmask": "172.16.2.15/24"}], "routes": [], "type": "vlan", "vlan_id": 201}, {"addresses": [{"ip_netmask": "172.16.1.4/24"}], "routes": [], "type": "vlan", "vlan_id": 202}, {"addresses": [{"ip_netmask": "172.16.3.17/24"}], "routes": [], "type": "vlan", "vlan_id": 203}, {"addresses": [{"ip_netmask": "172.16.0.28/24"}], "routes": [], "type": "vlan", "vlan_id": 204}], "name": "br-vlans", "type": "ovs_bridge", "use_dhcp": false}]}
I know that for Rocky the ControlPlaneSubnetCidr params gets automatically set before the deployment kicks in (hence why it overrides my custom ControlPlaneSubnetCidr), but something doesn't seem to be passing that cidr to the deployment when you run "openstack overcloud deploy".
I'm using the latest puddle, which I think is linked to 2018-10-08.4.
Tried with the newest puddle (2018-10-10.1) and the control plane cidr still isn't being passed with network-isolation. Here's the traceback that appeared during the Ansible config execution:
"Traceback (most recent call last):",
" File \"/bin/os-net-config\", line 10, in <module>",
" sys.exit(main())",
" File \"/usr/lib/python2.7/site-packages/os_net_config/cli.py\", line 259, in main",
" obj = objects.object_from_json(iface_json)",
" File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 43, in object_from_json",
" return Interface.from_json(json)",
" File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 415, in from_json",
" opts = _BaseOpts.base_opts_from_json(json)",
" File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 364, in base_opts_from_json",
" addresses.append(Address.from_json(address))",
" File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 269, in from_json",
" return Address(ip_netmask)",
" File \"/usr/lib/python2.7/site-packages/os_net_config/objects.py\", line 260, in __init__",
" ip_nw = netaddr.IPNetwork(self.ip_netmask)",
" File \"/usr/lib/python2.7/site-packages/netaddr/ip/__init__.py\", line 938, in __init__",
" raise AddrFormatError('invalid IPNetwork %s' % addr)",
"netaddr.core.AddrFormatError: invalid IPNetwork 192.0.2.114/None",
This looks like something that could be related to my patch: https://review.openstack.org/579579 Does this test have the correct version of Heat? Please log into your containers and look for this code: https://review.openstack.org/#/c/568960/ If that is not there the get_attr would return "None", explaining why we get "None" in the net_ip_map? The reason I'm asking to check that the heat code is on the system is taht we hit a similar issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1621333 Please check that openstack-heat-11.0.1-0.20180901130821.680a515.el7ost or later is being used per Bug 1621333. It appears that puddle 2018-10-02.2 is using the wrong version of heat - its using openstack-heat-12.0.0-0.20180604085325.7d878a8.el7ost when it should be using the 11.0.1 version, i.e. openstack-heat-engine-11.0.1-0.20180921133343.9c20465.el7ost.noarch.rpm which the 2018-10-10.3 is using, see [1] This same versioning problem happened before [2]. Please retest with puddle 2018-10-10.3 or newer. [1] http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/14.0-RHEL-7/2018-10-10.3/RH7-RHOS-14.0/x86_64/os/Packages/. [2] https://bugzilla.redhat.com/show_bug.cgi?id=1621333#c14 Setting FixedInVersion to proper version with fix which should be in 10-10 puddle. Checked the latest puddle (2018-10-17.2) and seems to be working now with network isolation. Verified:
Environment:
openstack-heat-common-11.0.1-0.20181010161427.46aacab.el7ost.noarch
openstack-heat-agents-1.7.1-0.20180907213355.476aae2.el7ost.noarch
openstack-heat-engine-11.0.1-0.20181010161427.46aacab.el7ost.noarch
openstack-heat-monolith-11.0.1-0.20181010161427.46aacab.el7ost.noarch
openstack-heat-api-11.0.1-0.20181010161427.46aacab.el7ost.noarch
Successfully deployed OC without network isolation using:
openstack overcloud deploy --templates \
-e /home/stack/containers-prepare-parameter.yaml \
-e /home/stack/overcloud_parameters.yaml
(overcloud) [stack@undercloud ~]$ cat /home/stack/overcloud_parameters.yaml
parameter_defaults:
NtpServer: ['clock.redhat.com']
ControllerCount: 3
OvercloudControllerFlavor: control
ComputeCount: 1
OvercloudComputeFlavor: compute
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045 |
Description of problem: If you do not pass in network-isolation.yaml into the deployment, the defaults are no longer correct. This results in a bad CIDR being used for the <network>_subnet data. The net_ip_map from the deployment data looks like... "net_ip_map": { "tenant": "192.168.24.12", "management": "192.168.24.12", "tenant_uri": "192.168.24.12", "ctlplane_uri": "192.168.24.12", "management_uri": "192.168.24.12", "management_subnet": "192.168.24.12/None", "storage": "192.168.24.12", "internal_api_subnet": "192.168.24.12/None", "storage_subnet": "192.168.24.12/None", "external_subnet": "192.168.24.12/None", "ctlplane": "192.168.24.12", "storage_mgmt_subnet": "192.168.24.12/None", "external": "192.168.24.12", "ctlplane_subnet": "192.168.24.12/None", "storage_mgmt": "192.168.24.12", "internal_api_uri": "192.168.24.12", "external_uri": "192.168.24.12", "storage_uri": "192.168.24.12", "internal_api": "192.168.24.12", "storage_mgmt_uri": "192.168.24.12", "tenant_subnet": "192.168.24.12/None" }, Version-Release number of selected component (if applicable): openstack-tripleo-common-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch openstack-tripleo-image-elements-9.0.0-0.20180831210308.2dc678a.el7ost.noarch openstack-tripleo-validations-9.3.1-0.20180831205306.el7ost.noarch ansible-role-tripleo-modify-image-1.0.1-0.20180915144057.cb535e9.el7ost.noarch openstack-tripleo-common-containers-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch python-tripleoclient-10.5.1-0.20180906012842.el7ost.noarch puppet-tripleo-9.3.1-0.20180831202649.8ec6c86.el7ost.noarch ansible-tripleo-ipsec-9.0.1-0.20180827143021.d2b9234.el7ost.noarch python2-tripleo-common-9.3.1-0.20180923215325.d22cb3e.el7ost.noarch python-tripleoclient-heat-installer-10.5.1-0.20180906012842.el7ost.noarch openstack-tripleo-puppet-elements-9.0.0-0.20180906013709.daf9069.el7ost.noarch openstack-tripleo-heat-templates-9.0.0-0.20180919080941.0rc1.0rc1.el7ost.noarch How reproducible: Every time. Steps to Reproduce: 1. Deploy OSP14 undercloud and setup the hosts 2. Run openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/enable-swap.yaml -e /home/cloud-user/container-image-prepare.yaml -e /home/cloud-user/overcloud_parameters.yaml Actual results: Deployment fails with... TASK [Debug output for task: Run puppet host configuration for step 1] ********* Monday 08 October 2018 14:35:08 -0400 (0:00:16.455) 0:04:13.134 ******** fatal: [overcloud-controller-0]: FAILED! => { "failed_when_result": true, "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [ "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend", "Notice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.", "Notice: Compiled catalog for overcloud-controller-0.localdomain in environment production in 2.74 seconds", "Warning: Undefined variable '::deploy_config_name'; ", " (file & line not available)", "Warning: Undefined variable 'deploy_config_name'; ", "Warning: This method is deprecated, please use the stdlib validate_legacy function,", " with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 54]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:28:in `deprecation')", " with Stdlib::Compat::Absolute_Path. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 55]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 56]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::Array. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 66]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Pattern[]. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 68]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", " with Stdlib::Compat::Numeric. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 76]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 29]", "Warning: tag is a metaparam; this value will inherit to all contained resources in the tripleo::firewall::rule definition", " with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/tripleo/manifests/firewall/rule.pp\", 148]:", "Error: Parameter source failed on Firewall[121 memcached ipv4]: Munging failed for value \"192.168.24.12/None\" in class source: host_to_ip failed for 192.168.24.12/None, exception Failed to resolve hostname 192.168.24.12/None at /etc/puppet/modules/tripleo/manifests/firewall/rule.pp:162" Expected results: <network>_subnet should not be <ip>/None