Created attachment 1043383 [details] heat-api.log and "grep -i error /var/log/heat/heat-engine.log" Description of problem: I did a bare metal install with network isolation enabled. It got most of the way through deployment, but during the stage where it was going through the network isolation templates I got a weird Heat error. This happened when I ran "openstack overcloud deploy" with --use-tripleo-heat-templates and with --plan-uuid heat-engine.log Version-Release number of selected component (if applicable): 2015-06-25.2 poodle How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud with network isolation. 2. 3. Actual results: openstack overcloud deploy --plan-uuid=<UUID> returned: ERROR: openstack ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1 Expected results: The overcloud should finish deploying. Additional info: I attached heat-api.log and any line in heat-engine.log which had the contained "error".
Quick way to enable network isolation: openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml --plan-uuid <UUID>
Seems the initial error is "Unknown Property ControlPlaneNetwork" which indicates that the templates are getting parameters they aren't set up for.
(In reply to Ryan Brown from comment #4) > Seems the initial error is "Unknown Property ControlPlaneNetwork" which > indicates that the templates are getting parameters they aren't set up for. No, that is actually a different bug entirely: https://bugzilla.redhat.com/show_bug.cgi?id=1235848 But simply enabling network isolation using the two -e parameters like I outlined above does not result in the "Unknown Property ControlPlaneNetwork", but instead results in the Heat errors I encountered. Further testing after I filed this bug revealed that I was hitting this Heat error whether or not I was using network isolation. It also happened whether I was using Heat or Tuskar. The poodle I was testing on yesterday was actually 2015-06-25.8, but I understand that today's poodles are successfully deploying so I am going to test again today.
reproduce - that was the deployment command : openstack overcloud deploy --plan-uuid db6ec6dc-762a-43ac-a36c-7631baa37996 --control-scale 3 --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e network_environment.yaml --debug " console-output : ---------------- VR": "False", "Compute-1::NeutronL3HA": "True", "Cinder-Storage-1::Image": "overcloud-full", "Controller-1::KeystoneCACertificate": "-----BEGIN CERTIFICATE-----\nMIIDNzCCAh+gAwIBAgIBATANBgkqhkiG9w0BAQUFADBTMQswCQYDVQQGEwJYWDEO\nMAwGA1UECBMFVW5zZXQxDjAMBgNVBAcTBVVuc2V0MQ4wDAYDVQQKEwVVbnNldDEU\nMBIGA1UEAxMLS2V5c3RvbmUgQ0EwHhcNMTUwNjI2MTQ0MDA5WhcNMjUwNjIzMTQ0\nMDA5WjBTMQswCQYDVQQGEwJYWDEOMAwGA1UECBMFVW5zZXQxDjAMBgNVBAcTBVVu\nc2V0MQ4wDAYDVQQKEwVVbnNldDEUMBIGA1UEAxMLS2V5c3RvbmUgQ0EwggEiMA0G\nCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDQiy2dMzVInMWuY3hm34HAkvbt0ruG\nzvSG1IpF2TRGXZrq7mNYbVCXvmuV1DKSEEEjJN4yn0nw8bED80KLRqZJEWTm7aXF\n/CeRSf90SJJFtkaiayRWZU00VAdNIiNfrEYNslwOScux+UKJglWlDEpalCYdZQAm\nJWcEqB40MnyeZkAuSq76XqOxa3qBCRLvd0t8/y6y2A7tsctK8NSYLfdoeK5lFyLk\nEZeNMmGr3SOxiWNTc8d8Ij4XPmpXfiTB6Nl+5uMa7mMultHNBKmEQ1dit/ua2uot\nQlwAv2cEqdg04ZZzV+MNbPtZZRnICGQezdZUOAaN6biSIAoBYnElCispAgMBAAGj\nFjAUMBIGA1UdEwEB/wQIMAYBAf8CAQAwDQYJKoZIhvcNAQEFBQADggEBAC58igSu\naeAkWbGX8QjY5dOoMVvXV2pEO4BVWimvuqjJABinDY/SJRZ/mNE6DsFqLVvGz39t\nDZpnFu/XgK0hTOuXc3M7cxv7KJ6KM2/IGLwoayRsMS6wRCIGIwHPd+jFAMCNvtNX\nEC9HQOpRQnIWZZZp9rV6jgvkiueLex56LNOTeKdnfswIDI7EANqJr0E30mBXwjUN\nUTvTg0MyvIwQ+kw92R9tu6HeDLXxyUHJJXombNGj9V2igd6M/RfmS+Ukqu4m8o99\nBFMoAWoVutpeFDKUlpgwjKrLbwx9cvdAW9p+UWaF4peOsciwGZIjy1uCmp4KCMxI\nEqUR6UN/o6SiPwc=\n-----END CERTIFICATE-----\n", "Compute-1::NeutronPassword": "******", "Cinder-Storage-1::ServiceNetMap": "{\"NovaVncProxyNetwork\": \"internal_api\", \"NeutronTenantNetwork\": \"tenant\", \"NovaApiNetwork\": \"internal_api\", \"CeilometerApiNetwork\": \"internal_api\", \"SwiftMgmtNetwork\": \"storage_mgmt\", \"MemcachedNetwork\": \"internal_api\", \"RabbitMqNetwork\": \"internal_api\", \"KeystoneAdminApiNetwork\": \"internal_api\", \"SwiftProxyNetwork\": \"storage\", \"CinderApiNetwork\": \"internal_api\", \"CephClusterNetwork\": \"storage_mgmt\", \"NovaMetadataNetwork\": \"internal_api\", \"RedisNetwork\": \"internal_api\", \"NeutronApiNetwork\": \"internal_api\", \"GlanceApiNetwork\": \"storage\", \"KeystonePublicApiNetwork\": \"internal_api\", \"HeatApiNetwork\": \"internal_api\", \"GlanceRegistryNetwork\": \"internal_api\", \"MysqlNetwork\": \"internal_api\", \"CephPublicNetwork\": \"storage\", \"MongoDbNetwork\": \"internal_api\", \"HorizonNetwork\": \"internal_api\", \"CinderIscsiNetwork\": \"storage\"}", "Controller-1::NeutronPublicInterfaceRawDevice": "", "Controller-1::AdminPassword": "******", "Controller-1::CeilometerMeteringSecret": "******", "Cinder-Storage-1::RabbitPassword": "guest", "Controller-1::EnableCephStorage": "False", "Cinder-Storage-1::CinderPassword": "******", "Controller-1::CinderBackendConfig": "{}", "Compute-1::SnmpdReadonlyUserName": "ro_snmp_user", "OS::stack_name": "overcloud", "Cinder-Storage-1::GlancePort": "9292", "Controller-1::NeutronPublicInterface": "nic1", "Compute-1::Debug": "", "Cinder-Storage-1::NtpServer": "", "Swift-Storage-1::MountCheck": "False", "Ceph-Storage-1::UpdateIdentifier": "", "Compute-1::CeilometerComputeAgent": "", "Compute-1::NeutronBridgeMappings": "datacentre:br-ex", "Cinder-Storage-1::GlanceProtocol": "http", "Controller-1::EnableGalera": "True", "ObjectStorageHostnameFormat": "%stackname%-objectstorage-%index%", "Controller-1::AdminToken": "******", "Controller-1::SwiftMountCheck": "False", "Controller-1::UpdateIdentifier": "", "Controller-1::NeutronNetworkVLANRanges": "datacentre:1:1000", "Controller-1::GlanceBackend": "rbd", "Compute-1::RabbitClientPort": "5672", "Controller-1::ExtraConfig": "{}", "Controller-1::CinderEnableRbdBackend": "True", "Cinder-Storage-1::RabbitClientUseSSL": "False", "Compute-1::NeutronPhysicalBridge": "br-ex", "Compute-1::NeutronEnableTunnelling": "True", "Compute-1::NovaComputeExtraConfig": "{}", "Compute-1::RabbitPassword": "******"}, "id": "2878a87c-f316-4544-ba15-a739ad10e1dc", "template_description": "No description"}} DEBUG: heatclient.common.http curl -g -i -X GET -H 'X-Auth-Token: {SHA1}f610a16b7b98a767bb3fa7600adf2c0bd1850d1c' -H 'Content-Type: application/json' -H 'X-Auth-Url: http://192.0.2.1:5000/v2.0' -H 'Accept: application/json' -H 'User-Agent: python-heatclient' http://192.0.2.1:8004/v1/517f9f1ef9a34580b74400643938f4f3/stacks/overcloud DEBUG: heatclient.common.http HTTP/1.1 401 Unauthorized content-length: 23 www-authenticate: Keystone uri='http://192.0.2.1:5000/v2.0' connection: keep-alive date: Fri, 26 Jun 2015 18:12:44 GMT content-type: text/plain x-openstack-request-id: req-7477d674-f7d6-4645-9d54-645a0f95af5c Authentication required ERROR: openstack ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1 Authentication required Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/cliff/app.py", line 295, in run_subcommand result = cmd.run(parsed_args) File "/usr/lib/python2.7/site-packages/cliff/command.py", line 53, in run self.take_action(parsed_args) File "/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py", line 667, in take_action self._deploy_tuskar(stack, parsed_args) File "/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py", line 456, in _deploy_tuskar self._heat_deploy(stack, overcloud_yaml, parameters, environments) File "/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py", line 354, in _heat_deploy orchestration_client, "overcloud") File "/usr/lib/python2.7/site-packages/rdomanager_oscplugin/utils.py", line 147, in wait_for_stack_ready stack = orchestration_client.stacks.get(stack_name) File "/usr/lib/python2.7/site-packages/heatclient/v1/stacks.py", line 202, in get resp, body = self.client.json_request('GET', '/stacks/%s' % stack_id) File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 265, in json_request resp = self._http_request(url, method, **kwargs) File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 217, in _http_request 'content': resp.content HTTPUnauthorized: ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1 Authentication required DEBUG: openstackclient.shell clean_up DeployOvercloud DEBUG: openstackclient.shell got an error: ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1 Authentication required ERROR: openstackclient.shell Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/openstackclient/shell.py", line 176, in run return super(OpenStackShell, self).run(argv) File "/usr/lib/python2.7/site-packages/cliff/app.py", line 230, in run result = self.run_subcommand(remainder) File "/usr/lib/python2.7/site-packages/cliff/app.py", line 295, in run_subcommand result = cmd.run(parsed_args) File "/usr/lib/python2.7/site-packages/cliff/command.py", line 53, in run self.take_action(parsed_args) File "/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py", line 667, in take_action self._deploy_tuskar(stack, parsed_args) File "/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py", line 456, in _deploy_tuskar self._heat_deploy(stack, overcloud_yaml, parameters, environments) File "/usr/lib/python2.7/site-packages/rdomanager_oscplugin/v1/overcloud_deploy.py", line 354, in _heat_deploy orchestration_client, "overcloud") File "/usr/lib/python2.7/site-packages/rdomanager_oscplugin/utils.py", line 147, in wait_for_stack_ready stack = orchestration_client.stacks.get(stack_name) File "/usr/lib/python2.7/site-packages/heatclient/v1/stacks.py", line 202, in get resp, body = self.client.json_request('GET', '/stacks/%s' % stack_id) File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 265, in json_request resp = self._http_request(url, method, **kwargs) File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 217, in _http_request 'content': resp.content HTTPUnauthorized: ERROR: Authentication failed. Please try again with option --include-password or export HEAT_INCLUDE_PASSWORD=1 Authentication required [stack@instack ~]$
Created attachment 1043634 [details] keystone.log adding full keystone.log from instack machine
The only denials I see are prefixed with this message: 2015-06-26 10:11:11.397 28749 WARNING keystone.common.wsgi [-] Authorization failed. Could not find user: ironic (Disable debug mode to suppress these details .) (Disable debug mode to suppress these details.) from 127.0.0.1 Which makes me think heat somehow has credentials for the ironic user.
(In reply to Ryan Brown from comment #8) > The only denials I see are prefixed with this message: > > 2015-06-26 10:11:11.397 28749 WARNING keystone.common.wsgi [-] Authorization > failed. Could not find user: ironic (Disable debug mode to suppress these > details > .) (Disable debug mode to suppress these details.) from 127.0.0.1 > > > Which makes me think heat somehow has credentials for the ironic user. Those authorization failed messages happened several hours before the failed deployment, though. I have a sneaking suspicion that this is not really an authorization problem, but is erroneously being reported as one.
Since a 401 is being raised late in the deployment process, it is possible that the token is expiring. Try raising the default token expiry limit even more. Also I assume "openstack overcloud deploy" is creating a token and using it for multiple heat requests as it polls for progress. It should be made aware of the token timeout and request new tokens as necessary (maybe switching to SessionClient would make this token renewal transparent).
If this is happening one hour into deployment then the problem is token expiry. the undercloud /etc/keystone/keystone.conf [token]expiration is set to 3600 (1 hour). It should be set to 14400 (4 hours) since overcloud stacks often take more than 1 hour to deploy. It looks like this was set initially in tripleo-image-elements/elements/keystone/os-apply-config/etc/keystone/keystone.conf but the setting regressed when the undercloud switched to puppet deploy. I will submit a change to instack-undercloud/elements/puppet-stack-config/puppet-stack-config.yaml.template to set keystone::token_expiration: 14400
The latest run where I tested network isolation on bare metal used the 2015-06-26.3 puddle, and the deployment failed because of https://bugzilla.redhat.com/show_bug.cgi?id=1236167. However, I didn't get any Heat errors, and the controller and compute nodes did get network configuration set up correctly. I got to ControllerNodesPostDeployment CREATE_COMPLETE, which is a really good sign that I didn't run in to this bug the last time. That said, increasing the timeout is a good idea, because it isn't hard for a complex bare metal deployment to take over an hour.
I posted a patch to extend the timeout here https://code.engineering.redhat.com/gerrit/#/c/51898/
The gerrithub patch is at https://review.gerrithub.io/#/c/237935/
*** Bug 1237221 has been marked as a duplicate of this bug. ***
merged https://code.engineering.redhat.com/gerrit/#/c/51898/
*** Bug 1237318 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549
*** Bug 1238133 has been marked as a duplicate of this bug. ***