Description of problem: in my network-environment.yaml template i have ControlPlaneDefaultRoute: 172.16.4.254 and # Use either this parameter or ControlPlaneDefaultRoute in the NIC templates # ManagementInterfaceDefaultRoute: 10.126.185.65 when i deploy i get this result: Removing the current plan files Uploading new plan files Started Mistral Workflow. Execution ID: 69f424a4-b14f-4505-9dd3-ba1ed4722c02 Plan updated Deploying templates in the directory /tmp/tripleoclient-eOSV2S/tripleo-heat-templates Started Mistral Workflow. Execution ID: baedf51b-f2a5-4d0a-87af-09ceaba5c943 {u'execution': {u'id': u'baedf51b-f2a5-4d0a-87af-09ceaba5c943', u'input': {u'container': u'overcloud', u'queue_name': u'7426b861-2139-48a6-947b-0c5f2af722a7', u'timeout': 90}, u'name': u'tripleo.deployment.v1.deploy_plan', u'params': {}, u'spec': {u'input': [u'container', {u'timeout': 240}, {u'queue_name': u'tripleo'}], u'name': u'deploy_plan', u'tasks': {u'copy_validation_ssh_keys': {u'name': u'copy_validation_ssh_keys', u'on-complete': u'send_message', u'type': u'direct', u'version': u'2.0', u'workflow': u'tripleo.validations.v1.copy_ssh_key'}, u'deploy': {u'action': u'tripleo.deployment.deploy timeout=<% $.timeout %> container=<% $.container %>', u'name': u'deploy', u'on-error': u'set_deployment_failed', u'on-success': u'test_validations_enabled', u'type': u'direct', u'version': u'2.0'}, u'send_message': {u'action': u'zaqar.queue_post', u'input': {u'messages': {u'body': {u'payload': {u'execution': u'<% execution() %>', u'message': u"<% $.get('message', '') %>", u'status': u"<% $.get('status', 'SUCCESS') %>"}, u'type': u'tripleo.deployment.v1.deploy_plan'}}, u'queue_name': u'<% $.queue_name %>'}, u'name': u'send_message', u'retry': u'count=5 delay=1', u'type': u'direct', u'version': u'2.0'}, u'set_deployment_failed': {u'name': u'set_deployment_failed', u'on-success': u'send_message', u'publish': {u'message': u'<% task(deploy).result %>', u'status': u'FAILED'}, u'type': u'direct', u'version': u'2.0'}, u'test_validations_enabled': {u'action': u'tripleo.validations.enabled', u'name': u'test_validations_enabled', u'on-error': u'send_message', u'on-success': u'copy_validation_ssh_keys', u'type': u'direct', u'version': u'2.0'}}, u'version': u'2.0'}}, u'message': u"Failed to run action [action_ex_id=4de81b4b-b417-4075-9f80-b92a392cef2f, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes='{}', params='{u'container': u'overcloud', u'timeout': 90}']\n ERROR: Failed to validate: : resources.Networks: : Failed to validate: resources.ManagementNetwork: The Parameter (ManagementInterfaceDefaultRoute) was not provided.", u'status': u'FAILED'} Version-Release number of selected component (if applicable): python-tripleoclient-5.3.0-4.el7ost.noarch openstack-tripleo-heat-templates-5.0.0-1.2.el7ost.noarch How reproducible: every time Steps to Reproduce: 1. set ControlPlaneDefaultRoute 2. don't set ManagementInterfaceDefaultRoute 3. Actual results: as above Expected results: deployment. Additional info: there is a comment in the file stating "# Use either this parameter or ControlPlaneDefaultRoute in the NIC templates" above the ManagementInterfaceDefaultRoute setting.
ManagementInterfaceDefaultRoute needs to be set if you're using the management network, which you must be including explicity somehow since it's disabled by default.
Yeah I'm using the management network so I assume it needs setting. Perhaps, then, it's just the comment that is confusing. Above the option it says this: # Use either this parameter or ControlPlaneDefaultRoute in the NIC templates ManagementInterfaceDefaultRoute: 10.126.185.126 So I was assuming it was this OR that.
(In reply to August Simonelli from comment #2) > Yeah I'm using the management network so I assume it needs setting. Perhaps, > then, it's just the comment that is confusing. Above the option it says this: > > # Use either this parameter or ControlPlaneDefaultRoute in the NIC templates > ManagementInterfaceDefaultRoute: 10.126.185.126 > > So I was assuming it was this OR that. There can only be one active default route on a system at any one time, by definition. If you have more than one configured, the results are unpredictable (the default route chosen may change between reboots or in response to network events). However, we do support setting specific routes on interfaces. So, for instance, if you have a remote backup network and a network that contains remote monitoring systems, you can place a route to those networks on the Management interface pointing to the router gateway on that subnet. This will make the Management IPs reachable from those networks (but the Control Plane IPs will not be reachable from those specific networks). Another option is to relax the rp_filter sysctl settings. This will allow the system to respond to packets from a different interface than they were received on, but may result in increasing the security attack surface without proper firewalling in place.
Hi, I hit the same symptom in my lab with OSP10. In my case, I set default value to ManagementInterfaceDefaultRoute in nic-configs/{controller,compute}.yaml and ${THT_TOP}/network/management.yaml, then the deployment successfully finished. I believe there should be some situations where managment network is necessary and default gateway is in control plane.
(In reply to Manabu Ori from comment #5) > Hi, > > I hit the same symptom in my lab with OSP10. > In my case, I set default value to ManagementInterfaceDefaultRoute in > nic-configs/{controller,compute}.yaml and > ${THT_TOP}/network/management.yaml, then the deployment successfully > finished. > > I believe there should be some situations where managment network is > necessary and default gateway is in control plane. Sorry, that won't actually be possible until we have source routing implemented: https://blueprints.launchpad.net/tripleo/+spec/tripleo-os-net-config-source-routing However, I'm curious, why do you want the default gateway on the Control Plane network instead of the Management network? Is there a reason you want your outbound traffic to use the Control Plane instead of Management? I would think that if I were using the Management network, I would probably make that the default gateway. There is probably a use case I'm not seeing clearly.
One scenario I can think of is that undercloud acts as a masquerading gateway for overcloud nodes, mainly in a test environment.
Yes I think this is an assumption that I'm coming up against too. E.g. I want to run config management (Ansible) and monitoring over the cloud and don't want to connect the undercloud to the storage or api network. But I don't necessarily need a route out. It would be good if this could be changed from a requirement to optional or use ctlplane.
https://bugzilla.redhat.com/show_bug.cgi?id=1393641 Hello, There clearly is a misunderstanding in this bug report here, and it's astonishing that this has been open for nearly a year now without anybody objecting or pointing out the issue. Indeed, it should be possible to have: * a control plane interface * a management interface * a default route out of the control plane interface * no route at all out of the management interface A use case could be a jump server on the management interface, or some other device sitting on the same subnet as the management interface. The above use case makes total sense and is totally legit. Now to the analysis of the actual issue. It turns out that the templates *were* created with this in mind, and that this should work, but due to a problem about how "null" seems to be handled in heat, this simply does not work. ~~~ Removing the current plan files │ Uploading new plan files │ Started Mistral Workflow. Execution ID: ecc2df38-f10a-440e-982a-469f00f53abb │ Plan updated │ Deploying templates in the directory /tmp/tripleoclient-NdrmZc/tripleo-heat-templates │ Started Mistral Workflow. Execution ID: b2ec5d91-840b-49bf-b0ec-782ffd5b03c3 │ {u'execution': {u'id': u'b2ec5d91-840b-49bf-b0ec-782ffd5b03c3', │ u'input': {u'container': u'overcloud', │ u'queue_name': u'4f031ca3-b69b-4920-ad5e-3fee3e65add8', │ u'skip_deploy_identifier': False, │ u'timeout': 240}, │ u'name': u'tripleo.deployment.v1.deploy_plan', │ u'params': {}, │ u'spec': {u'input': [u'container', │ {u'timeout': 240}, │ {u'skip_deploy_identifier': False}, │ {u'queue_name': u'tripleo'}], │ u'name': u'deploy_plan', │ u'tasks': {u'add_validation_ssh_key': {u'name': u'add_validation_ssh_key', │ u'on-complete': u'create_swift_rings_backup_plan', │ u'type': u'direct', │ u'version': u'2.0', │ u'workflow': u'tripleo.validations.v1.add_validation_ssh_key_parameter container=<% $.container %>'}, │ u'create_swift_rings_backup_plan': {u'input': {u'container': u'<% $.container %>', │ u'queue_name': u'<% $.queue_name %>', │ u'use_default_templates': True}, │ u'name': u'create_swift_rings_backup_plan', │ u'on-error': u'create_swift_rings_backup_plan_set_status_failed', │ u'on-success': u'deploy', │ u'type': u'direct', │ u'version': u'2.0', │ u'workflow': u'tripleo.swift_rings_backup.v1.create_swift_rings_backup_container_plan'}, │ u'create_swift_rings_backup_plan_set_status_failed': {u'name': u'create_swift_rings_backup_plan_set_status_failed', │ u'on-success': u'send_message', │ u'publish': {u'message': u'<% task(create_swift_rings_backup_plan).result %>', │ u'status': u'FAILED'}, │ u'type': u'direct', │ u'version': u'2.0'}, │ u'deploy': {u'action': u'tripleo.deployment.deploy timeout=<% $.timeout %> container=<% $.container %>', │ u'input': {u'container': u'<% $.container %>', │ u'skip_deploy_identifier': u'<% $.skip_deploy_identifier %>', │ u'timeout': u'<% $.timeout %>'}, │ u'name': u'deploy', │ u'on-error': u'set_deployment_failed', │ u'on-success': u'send_message', │ u'type': u'direct', │ u'version': u'2.0'}, │ u'send_message': {u'action': u'zaqar.queue_post', │ u'input': {u'messages': {u'body': {u'payload': {u'execution': u'<% execution() %>', │ u'message': u"<% $.get('message', '') %>", │ u'status': u"<% $.get('status', 'SUCCESS') %>"}, │ u'type': u'tripleo.deployment.v1.deploy_plan'}}, │ u'queue_name': u'<% $.queue_name %>'}, │ u'name': u'send_message', │ u'retry': u'count=5 delay=1', │ u'type': u'direct', │ u'version': u'2.0'}, │ u'set_deployment_failed': {u'name': u'set_deployment_failed', │ u'on-success': u'send_message', │ u'publish': {u'message': u'<% task(deploy).result %>', │ u'status': u'FAILED'}, │ u'type': u'direct', │ u'version': u'2.0'}}, │ u'version': u'2.0'}}, │ u'message': u"Failed to run action [action_ex_id=c2766a23-5020-4d34-b2e4-b4987267c481, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes│ ='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 240}']\n ERROR: Failed to validate: : resources.Networks: : Failed to validate: r│ esources.ManagementNetwork: The Parameter (ManagementInterfaceDefaultRoute) was not provided.", │ u'status': u'FAILED'} ~~~ So let's drill down in the templates: /usr/share/openstack-tripleo-heat-templates/overcloud.j2.yaml ~~~ 444 # creates the network architecture 445 Networks: 446 type: OS::TripleO::Network ~~~ /usr/share/openstack-tripleo-heat-templates/network/networks.yaml ~~~ 22 ManagementNetwork: 23 type: OS::TripleO::Network::Management ~~~ ManagementNetwork is what fails. Let's look at it: /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml: #OS::TripleO::Network::Management: ../network/management.yaml The management network fails because there is no decent dummy default route - we are setting `null` (note, not the "null" string, but null) /usr/share/openstack-tripleo-heat-templates/network/management.yaml ~~~ ManagementInterfaceDefaultRoute: default: null description: The default route of the management network. type: string resources: ManagementNetwork: type: OS::Neutron::Net properties: admin_state_up: {get_param: ManagementNetAdminStateUp} name: {get_param: ManagementNetName} shared: {get_param: ManagementNetShared} value_specs: {get_param: ManagementNetValueSpecs} ManagementSubnet: type: OS::Neutron::Subnet properties: cidr: {get_param: ManagementNetCidr} enable_dhcp: {get_param: ManagementNetEnableDHCP} name: {get_param: ManagementSubnetName} network: {get_resource: ManagementNetwork} allocation_pools: {get_param: ManagementAllocationPools} gateway_ip: {get_param: ManagementInterfaceDefaultRoute} outputs: OS::stack_id: description: Neutron management network value: {get_resource: ManagementNetwork} ~~~ The External network does *not* need a default route because a correct dummy default value is set. Hence users can define External Networks without default routes. They simply change the compute.yaml and controller.yaml and can have an ExternalNetwork which only works directly connected. /usr/share/openstack-tripleo-heat-templates/network/external.yaml ~~~ ExternalInterfaceDefaultRoute: default: '10.0.0.1' description: default route for the external network type: string resources: ExternalNetwork: type: OS::Neutron::Net properties: admin_state_up: {get_param: ExternalNetAdminStateUp} name: {get_param: ExternalNetName} shared: {get_param: ExternalNetShared} value_specs: {get_param: ExternalNetValueSpecs} ExternalSubnet: type: OS::Neutron::Subnet properties: cidr: {get_param: ExternalNetCidr} enable_dhcp: {get_param: ExternalNetEnableDHCP} name: {get_param: ExternalSubnetName} network: {get_resource: ExternalNetwork} allocation_pools: {get_param: ExternalAllocationPools} gateway_ip: {get_param: ExternalInterfaceDefaultRoute} outputs: OS::stack_id: description: Neutron external network value: {get_resource: ExternalNetwork} ~~~ However, the default in /usr/share/openstack-tripleo-heat-templates/network/internal_api.yaml is null: ~~~ resources: InternalApiNetwork: type: OS::Neutron::Net properties: admin_state_up: {get_param: InternalApiNetAdminStateUp} name: {get_param: InternalApiNetName} shared: {get_param: InternalApiNetShared} value_specs: {get_param: InternalApiNetValueSpecs} InternalApiSubnet: type: OS::Neutron::Subnet properties: cidr: {get_param: InternalApiNetCidr} enable_dhcp: {get_param: InternalApiNetEnableDHCP} name: {get_param: InternalApiSubnetName} network: {get_resource: InternalApiNetwork} allocation_pools: {get_param: InternalApiAllocationPools} gateway_ip: null outputs: OS::stack_id: description: Neutron internal network value: {get_resource: InternalApiNetwork} ~~~ When we have a look at internal_api, we can see that `gateway_ip: null`. We *do* want to achieve this with: ~~~ ManagementInterfaceDefaultRoute: default: null description: The default route of the management network. type: string ~~~ The problem is that "null" as a default is being interpreted by heat as a "you must set this default parameter, or I am assuming that this parameter was not provided", and the null is not being passed down to gateway_ip --- this is clearly what the creator of the templates intended, however this does not work. Instead, ManagementNetwork complains about a missing paramter: ~~~ 22 ManagementNetwork: 23 type: OS::TripleO::Network::Management ~~~ ~~~ u'message': u"Failed to run action [action_ex_id=c2766a23-5020-4d34-b2e4-b4987267c481, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes│ ='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 240}']\n ERROR: Failed to validate: : resources.Networks: : Failed to validate: r│ esources.ManagementNetwork: The Parameter (ManagementInterfaceDefaultRoute) was not provided.", ~~~ This may be related to the discussion here: https://openstack.nimeyo.com/102603/openstack-dev-heat-on-allowing-null-as-a-parameter-default We need to set some dummy default value in the templates. Given that we do it for network and subnet mask, and we also do it for ExternalNetwork, I don't see why we wouldn't do it for the ManagementNetwork as well. I can't check this in a lab right now, but this seems like a bug in the templates. - Andresa
It is absolutely possible to have the default route on the control plane, and no route on the management network. In that case, the management network will only be reachable from other hosts on the management network, but not remotely. As long as that caveat is acceptable, there is no issue here. I think we can close this as NOTABUG.
Hi Dan, Did you read https://bugzilla.redhat.com/show_bug.cgi?id=1393641#c9 ~~~ Now to the analysis of the actual issue. It turns out that the templates *were* created with this in mind, and that this should work, but due to a problem about how "null" seems to be handled in heat, this simply does not work. ~~~ There is a bug in the templates that prohibits https://bugzilla.redhat.com/show_bug.cgi?id=1393641#c10 ~~~ * a control plane interface * a management interface * a default route out of the control plane interface * no route at all out of the management interface ~~~ This is currently not possible, see comment 9 for the error message that it fails with. Users at the current point are obliged to configure a workaround as in https://bugzilla.redhat.com/show_bug.cgi?id=1393641#c5 ~~~ In my case, I set default value to ManagementInterfaceDefaultRoute in nic-configs/{controller,compute}.yaml and ${THT_TOP}/network/management.yaml, then the deployment successfully finished. ~~~ To be more explicit (I am reconstructing from memory here, so if there are typos below, please disregard. I actually tested comment 9 in a lab): in controller.yaml and compute.yaml, the default route is configured via the ControlPlaneInterface, and in network-environment.yaml, ControlPlaneDefaultRoute is configured to the correct value. Note .. I think that this has the same behavior with the External default route interface as well. E.g. nic-configs/controller.yaml ~~~ - type: interface # physical eth0, provioning network name: em4 use_dhcp: false dns_servers: {get_param: DnsServers} addresses: - ip_netmask: list_join: - '/' - - {get_param: ControlPlaneIp} - {get_param: ControlPlaneSubnetCidr} routes: - ip_netmask: 169.254.169.254/32 next_hop: {get_param: EC2MetadataIp} - default: true next_hop: {get_param: ControlPlaneDefaultRoute} ~~~ network-environment.yaml ~~~ parameter_defaults: ControlPlaneDefaultRoute: 192.168.1.1 ~~~ + management network configuration. Actually, follow https://access.redhat.com/solutions/3063131 and keep this commented: ~~~ # routes: # - # default: true # next_hop: {get_param: ManagementInterfaceDefaultRoute} ~~~ And do *not* add `ManagementInterfaceDefaultRoute: 10.0.1.1` to parameter defaults. This will get you the outcome from comment 9. The above will *not* work. The workaround is to *add* `ManagementInterfaceDefaultRoute: 10.0.1.1`, *even though* in controller.yaml and compute.yaml, the routes are still commented and the control plane or external interface default route is used. This is unexpected behavior, and thus this bug report.
Fix will be to use 10.0.1.1 instead of null as example default value - https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/network/management.yaml#L42
Deployed OSP10 latest. cat /etc/yum.repos.d/latest-installed 10 -p 2018-08-04.1 Made change to network-environment.yaml and deployed overcloud with no issues. cat /home/stack/virt/network/network-environment.yaml | grep ControlPlaneDefaultRoute ControlPlaneDefaultRoute: 192.168.24.1 Status: Stack overcloud CREATE_COMPLETE Please reopen is this issue still persists.
Hi there, If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -. Thanks, Alex
Note fix https://review.openstack.org/#/c/579961/ as to set: ManagementInterfaceDefaultRoute: default: "" instead of null.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2670