Bug 1272591

Summary: director node should use admin api to deploy service catalogs
Product: Red Hat OpenStack Reporter: Dan Yocum <dyocum>
Component: rhosp-directorAssignee: Dan Sneddon <dsneddon>
Status: CLOSED NOTABUG QA Contact: Gurenko Alex <agurenko>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: achernet, athomas, augol, bnemec, dmacpher, dsneddon, felipe.alfaro, ggillies, jcoufal, jslagle, mburns, mhalas, ohochman, rhel-osp-director-maint, sclewis
Target Milestone: ---Keywords: TestOnly, Triaged
Target Release: 8.0 (Liberty)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
The Undercloud used the Public API to configure service endpoints during the post-deployment stage. This meant the Undercloud needed to reach the Public API in order to complete the deployment. If the External uplink on the Undercloud is not the same subnet as the Public API, the Undercloud requires a route to the Public API and any firewall ACLs must allow this traffic. With this route, the Undercloud connects to the Public API and completes post-deployment tasks.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-18 08:50:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Set of configuration files to replicate the issue none

Description Dan Yocum 2015-10-16 19:59:55 UTC
Description of problem:
During the last stage of the overcloud deployment, the director may not have access to the public IP to perform the final steps (i.e., $service-manage db_sync).  The director should use the ctlplane (InternalApiNet) to perform this step.

Case in point: the director node is on an network segment that is disallowed by the firewall to connect anywhere outside the firewall for security reasons.


Version-Release number of selected component (if applicable):
7.1

How reproducible:
Always

Steps to Reproduce:
1. Deploy overcloud
2. Block the director from having access to the public facing IP of the overcloud
3.

Actual results:

ERROR: openstack Unable to establish connection to http://209.132.179.140:35357/v2.0/tenants


Expected results:

Deployment completes.

Additional info:

Comment 2 Dan Yocum 2015-10-16 20:02:23 UTC
Scratch the bit about "(InternalApiNetwork)" as dsneddon indicates that the director may not have access to that network.

Also, this summary:

"Post-Deployment Configuring Using Public Keystone Endpoint Instead of Admin Endpoint"

Comment 3 Dan Sneddon 2015-10-16 20:06:19 UTC
(In reply to Dan Yocum from comment #2)
> Scratch the bit about "(InternalApiNetwork)" as dsneddon indicates...

Right, here is the output from a fresh deployment "keystone catalog":

Service: identity
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
|   adminURL  |   http://192.0.2.14:35357/v2.0   |
|      id     | 22eb7c2635db4d7fb88c60521ce519b0 |
| internalURL |   http://192.0.2.14:5000/v2.0    |
|  publicURL  |    http://10.0.0.4:5000/v2.0     |
|    region   |            regionOne             |
+-------------+----------------------------------+

You can see that both the Internal and Admin Keystone endpoints are on the ctlplane network. Only the Public URL is on the External network. When I did this deployment, the Undercloud needed to connect to the Public API in order to complete the post-deployment steps to enable the creation of the admin user and security groups.

Since we now have the Internal and Admin endpoints on the ctlplane, this route shouldn't have been necessary.

Comment 4 Dan Sneddon 2015-10-20 19:10:53 UTC
(In reply to Dan Sneddon from comment #3)

To clarify, what needs to change is that the Undercloud needs to use the Admin API to setup the service catalog instead of the Public API. I'm not sure where the code is that sets up the service catalog, but I can't influence that call with any networking changes.

Comment 5 Dan Yocum 2015-10-22 19:34:28 UTC
I think I've found the problem.

In /usr/bin/instack-deploy-overcloud, I see this:

# work for creating the overcloudrc file.
unset TRIPLEO_ROOT
instack-create-overcloudrc
source ~/overcloudrc

init-keystone -o $OVERCLOUD_IP -t $OVERCLOUD_ADMIN_TOKEN \
-e admin.example.com -p $OVERCLOUD_ADMIN_PASSWORD -u heat-admin \
${SSLBASE:+-s $PUBLIC_API_URL}


That should NOT be PUBLIC_API_URL.  It should be something else.  INTERNAL_API_URL?  

Adding slagle as he's the author of this script.

Comment 6 Dan Sneddon 2015-10-23 00:10:44 UTC
(In reply to Dan Yocum from comment #5)
> I think I've found the problem.
> 
> In /usr/bin/instack-deploy-overcloud, I see this:
> 
> # work for creating the overcloudrc file.
> unset TRIPLEO_ROOT
> instack-create-overcloudrc
> source ~/overcloudrc
> 
> init-keystone -o $OVERCLOUD_IP -t $OVERCLOUD_ADMIN_TOKEN \
> -e admin.example.com -p $OVERCLOUD_ADMIN_PASSWORD -u heat-admin \
> ${SSLBASE:+-s $PUBLIC_API_URL}
> 
> 
> That should NOT be PUBLIC_API_URL.  It should be something else. 
> INTERNAL_API_URL?  
> 
> Adding slagle as he's the author of this script.

Thanks for finding the part of the code that sets this, I was looking for something in the Heat Templates, and was coming up dry, this explains how the Public API URL gets used.

I had a discussion about this with slagle. I'll try to summarize, he can correct me if I'm wrong:

It turns out that this is by design. Here were the design considerations:

* Many operators will not want to use the provisioning network for critical OpenStack services (because it's a single link that can't be bonded).
* Given that, it makes sense to put the Internal OpenStack APIs on the Internal API network.
* The Undercloud is not attached to the Internal API network.
* The Undercloud does usually have an upstream connection through it's external uplink, so generally the Public API is accessible to the Undercloud.

Some possible solutions, off the top of my head:

1) Make the endpoint that instack-deploy-overcloud uses configurable. Then you could add an interface to the Internal API network and contact the Keystone Internal API even if its on the Internal API network.

2) Modify HAProxy to host Keystone services for the Overcloud on the provisioning network in addition to whichever network is selected in the ServiceNetMap. Then have instack-deploy-overcloud use the control plane vip, even if it's not an advertised endpoint.

3) Modify the required architecture to require a connection from the Undercloud to the Internal API

Solution 1 seems acceptable to me.

Solution 2 has the downside of increasing the attack surface and decreasing security, which I think is probably unacceptable, unless we turn this off by default.

Solution 3 has the downside of increased complexity for all users, even if they don't have an issue reaching the public VIP.

One possible workaround is to modify the reverse path filter settings on the controllers and add a route to the Public VIP via the provisioning net. This will cause the Undercloud to send the packet with a destination IP address of the Public VIP, but to the active controller via the provisioning net. Ordinarily, the controller wouldn't respond to this, but by setting the rp_filter to 2 ('loose'), the controller will accept the packet and respond to the Undercloud via the provisioning net (note, this will only work if the same controller is holding the active control plane vip and the public vip).

So, after the controllers are deployed, but before the deployment times out:

Log in to each controller.

Add the following to /etc/sysctl.conf:
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.all.rp_filter = 2

Then run "sysctl -p" to activate the change and make it permanent.

On the Undercloud, add a route to the Public VIP via the Control Plane.

I'm going to give this a try and see if it works for me.

Comment 7 Graeme Gillies 2015-10-23 02:18:29 UTC
Just to be clear, long term the best solution would be to have the keystone setup done on one of the controller nodes correct? that way we don't have to worry about any network conectivity stuff, and if we want to customise it (say, ssl etc) it will be in the tripleo-heat-templates where we can access the appropriate variables

Regards,

Graeme

Comment 10 Mike Burns 2015-11-06 16:05:55 UTC
removing blocker per comment 9

Comment 12 Angus Thomas 2016-02-05 14:18:48 UTC
Moving to MODIFIED as this has doctext added for 7.3

Comment 16 Marius Cornea 2016-03-02 10:06:52 UTC
There hasn't been any patch posted so I'm not if this requires any testing. Are we planning to do any changes in OSP8 to remove the network connectivity constraints as per comment#7 ?

Comment 20 Marius Cornea 2016-04-19 11:06:27 UTC
There hasn't been any patch posted so I don't think this BZ requires testing. 

The initial report is valid imo so I think we should keep the BZ open. Are we planning to do any changes in the future to remove the network connectivity constraints as per comment#7?

Comment 21 Miro Halas 2016-07-11 19:20:01 UTC
I can connfirm that this is still an issue with RHOSP8 and very simple setup (see attached set of configs)

In basic, this is VLAN OVS based setup on single NIC 

  ExternalNetCidr: 172.21.1.0/24
  InternalApiNetCidr: 172.22.1.0/24

  KeystoneAdminApiNetwork: ctlplane
  KeystonePublicApiNetwork: internal_api

This is how Keystone gets deployed


[heat-admin@overcloud-controller-0 ~]$ openstack catalog list
+----------+----------+---------------------------------------------+
| Name     | Type     | Endpoints                                   |
+----------+----------+---------------------------------------------+
| keystone | identity | regionOne                                   |
|          |          |   publicURL: http://172.21.1.10:5000/v2.0   |
|          |          |   internalURL: http://172.22.1.11:5000/v2.0 |
|          |          |   adminURL: http://192.0.2.12:35357/v2.0    |
|          |          |                                             |
+----------+----------+---------------------------------------------+

Which is by itself interesting since Keystone Public API is on External Network even though the network environment template specified internal_api network (maybe confusingly names variables).

That said, with this setup the deployment fails with the following error

Creating service for identity.
REQ: curl -g -i -X POST http://192.0.2.12:35357/v2.0/OS-KSADM/services -H "User-Agent: python-keystoneclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}d21ecfb3a342d5e18bd5ce4e52dfebd6a4a4fa8f" -d '{"OS-KSADM:service": {"type": "identity", "name": "keystone", "description": "Keystone Identity Service"}}'
RESP: [200] date: Fri, 08 Jul 2016 22:07:50 GMT vary: X-Auth-Token content-length: 165 content-type: application/json x-openstack-request-id: req-8f9adc06-1b2c-48eb-8988-21c23bd78de0 
RESP BODY: {"OS-KSADM:service": {"id": "7d03a988d26c4fa7b0b349a10aeffbd7", "enabled": true, "type": "identity", "name": "keystone", "description": "Keystone Identity Service"}}

REQ: curl -g -i -X GET http://192.0.2.12:35357/v2.0/endpoints -H "User-Agent: python-keystoneclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}d21ecfb3a342d5e18bd5ce4e52dfebd6a4a4fa8f"
RESP: [200] date: Fri, 08 Jul 2016 22:07:50 GMT vary: X-Auth-Token content-length: 17 content-type: application/json x-openstack-request-id: req-e7ba86fb-7cd5-483e-ac6b-e13edaa651ed 
RESP BODY: {"endpoints": []}

Creating endpoint for service 7d03a988d26c4fa7b0b349a10aeffbd7.
REQ: curl -g -i -X POST http://192.0.2.12:35357/v2.0/endpoints -H "User-Agent: python-keystoneclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}d21ecfb3a342d5e18bd5ce4e52dfebd6a4a4fa8f" -d '{"endpoint": {"adminurl": "http://192.0.2.12:35357/v2.0", "service_id": "7d03a988d26c4fa7b0b349a10aeffbd7", "region": "regionOne", "internalurl": "http://172.22.1.11:5000/v2.0", "publicurl": "http://172.21.1.10:5000/v2.0"}}'
RESP: [200] date: Fri, 08 Jul 2016 22:07:50 GMT vary: X-Auth-Token content-length: 265 content-type: application/json x-openstack-request-id: req-3cab08a8-4420-4cc7-996a-697457ba8d57 
RESP BODY: {"endpoint": {"adminurl": "http://192.0.2.12:35357/v2.0", "region": "regionOne", "internalurl": "http://172.22.1.11:5000/v2.0", "service_id": "7d03a988d26c4fa7b0b349a10aeffbd7", "id": "886fc1c910914896bb1c26fe7a935b48", "publicurl": "http://172.21.1.10:5000/v2.0"}}

Warning: Permanently added '192.0.2.12' (ECDSA) to the list of known hosts.
No handlers could be found for logger "oslo_config.cfg"
2016-07-08 22:07:51.233 28995 WARNING keystone.cmd.cli [-] keystone-manage pki_setup is not recommended for production use.
The following cert files already exist, use --rebuild to remove the existing files before regenerating:
/etc/keystone/ssl/certs/ca.pem already exists
/etc/keystone/ssl/private/signing_key.pem already exists
/etc/keystone/ssl/certs/signing_cert.pem already exists
Connection to 192.0.2.12 closed.
Creating keystone client.
Making authentication request to http://172.21.1.10:5000/v2.0/tokens
Authorization Failed: Unable to establish connection to http://172.21.1.10:5000/v2.0/tokens
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 374, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python2.7/site-packages/cliff/command.py", line 54, in run
    self.take_action(parsed_args)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 961, in take_action
    self._deploy_postconfig(stack, parsed_args)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 559, in _deploy_postconfig
    parsed_args, stack)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 464, in _keystone_init
    overcloud_endpoint)
  File "/usr/lib/python2.7/site-packages/os_cloud_config/utils/clients.py", line 67, in get_keystone_client
    return ksclient.Client(**kwargs)
  File "/usr/lib/python2.7/site-packages/keystoneclient/v2_0/client.py", line 166, in __init__
    self.authenticate()
  File "/usr/lib/python2.7/site-packages/keystoneclient/utils.py", line 337, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/keystoneclient/httpclient.py", line 589, in authenticate
    resp = self.get_raw_token_from_identity_service(**kwargs)
  File "/usr/lib/python2.7/site-packages/keystoneclient/v2_0/client.py", line 210, in get_raw_token_from_identity_service
    _("Authorization Failed: %s") % e)
AuthorizationFailure: Authorization Failed: Unable to establish connection to http://172.21.1.10:5000/v2.0/tokens
clean_up DeployOvercloud: Authorization Failed: Unable to establish connection to http://172.21.1.10:5000/v2.0/tokens
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/openstackclient/shell.py", line 112, in run
    ret_val = super(OpenStackShell, self).run(argv)
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 255, in run
    result = self.run_subcommand(remainder)
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 374, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python2.7/site-packages/cliff/command.py", line 54, in run
    self.take_action(parsed_args)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 961, in take_action
    self._deploy_postconfig(stack, parsed_args)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 559, in _deploy_postconfig
    parsed_args, stack)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 464, in _keystone_init
    overcloud_endpoint)
  File "/usr/lib/python2.7/site-packages/os_cloud_config/utils/clients.py", line 67, in get_keystone_client
    return ksclient.Client(**kwargs)
  File "/usr/lib/python2.7/site-packages/keystoneclient/v2_0/client.py", line 166, in __init__
    self.authenticate()
  File "/usr/lib/python2.7/site-packages/keystoneclient/utils.py", line 337, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/keystoneclient/httpclient.py", line 589, in authenticate
    resp = self.get_raw_token_from_identity_service(**kwargs)
  File "/usr/lib/python2.7/site-packages/keystoneclient/v2_0/client.py", line 210, in get_raw_token_from_identity_service
    _("Authorization Failed: %s") % e)
AuthorizationFailure: Authorization Failed: Unable to establish connection to http://172.21.1.10:5000/v2.0/tokens


and the deployment fails - overstack cloud stack correctly created but none endpoints other than identity are created. 

The diagram in the following link
https://access.redhat.com/documentation/en/red-hat-openstack-platform/version-8/director-installation-and-usage/#sect-Planning_Networks
describes that there should be routing between director external interface and external APIs but this requirement is nowhere else discussed as far as I know or verified.

Comment 22 Miro Halas 2016-07-11 19:21:59 UTC
Created attachment 1178509 [details]
Set of configuration files to replicate the issue

Comment 23 Miro Halas 2016-07-11 21:29:24 UTC
The workaround is to add the following route to the undercloud node

ip route add 172.21.1.0/24 via 192.0.2.1

[root@pamlico network-scripts]# cat /etc/sysconfig/network-scripts/route-br-ctlplane
172.21.1.0/24 via 192.0.2.1 dev br-ctlplane

where 172.21.1.0/24 is the external network and 192.0.2.0/24 is the control plane network.

Comment 24 Ben Nemec 2017-03-09 23:17:49 UTC
We no longer ssh to the overcloud nodes to do post-deployment, so I believe this bug should be fixed.

Comment 25 Gurenko Alex 2017-05-18 08:50:25 UTC
After talking with Ben, it was a design decision/limitation at a time and no longer relevant in latest versions due to change in a process.