Bug 1258614

Summary: Puppet fails to find heat domain ID
Product: Red Hat OpenStack Reporter: Mike Burns <mburns>
Component: openstack-puppet-modulesAssignee: Martin Magr <mmagr>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: medium    
Version: 7.0 (Kilo)CC: calfonso, christopher_dearborn, derekh, dsavinea, gtrellu, ichavero, jslagle, jstransk, jtaleric, mburns, mcornea, mmagr, ohochman, rhel-osp-director-maint, rybrown, sclewis, shivrao, sjeuk, whayutin, yeylon
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-puppet-modules-2015.1.8-28.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1252585
: 1280379 (view as bug list) Environment:
Last Closed: 2015-12-21 17:09:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1252585, 1261979, 1280379    

Description Mike Burns 2015-08-31 18:42:43 UTC
The original bug hits both openstack-puppet-modules and openstack-tripleo-heat-templates.  Cloning this so that the openstack-puppet-modules part gets fixed

+++ This bug was initially created as a clone of Bug #1252585 +++

Description of problem:

Deployment fails on one of the controller nodes. The deployment is on 3 controllers, 5 computes, and 2 ceph nodes at the moment. 

The error that causes the stack to fail comes from puppet on Controller-0

Aug 11 11:24:29 localhost os-collect-config: ::Server[6000]): The default incoming_chmod set to 0644 may yield in error prone directories and will be changed in a later release.\u001b[0m\n\u001b[1;31mWarning: Scope(Swift::Storage::Server[6000]): The default outgoing_chmod set to 0644 may yield in error prone directories and will be changed in a later release.\u001b[0m\n\u001b[1;31mWarning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.\n   (at /usr/share/ruby/vendor_ruby/puppet/type.rb:816:in `set_default')\u001b[0m\n\u001b[1;31mError: Received error response from Keystone server at http://172.17.0.10:35357/v3/domains: Unauthorized\u001b[0m\n\u001b[1;31mError: /Stage[main]/Heat::Keystone::Domain/Heat_domain_id_setter[heat_domain_id]/ensure: change from absent to present failed: Received error response from Keystone server at http://172.17.0.10:35357/v3/domains: Unauthorized\u001b[0m\n", "deploy_status_code": 6}

The interesting part of that error (formatted nicely):

Error: Received error response from Keystone server at http://172.17.0.10:35357/v3/domains: Unauthorized

Error: /Stage[main]/Heat::Keystone::Domain/Heat_domain_id_setter[heat_domain_id]/ensure: change from absent to present failed: Received error response from Keystone server at http://172.17.0.10:35357/v3/domains: Unauthorized


The connection to the keystone service works, as does everything else in the network I was able to test. 

In the Overcloud keystone logs, I see there's a login request for ceilometer that returns a 401:Unauthorized and seems to be unrelated, but there don't seem to be other failing requests.


--- Additional comment from Joe Talerico on 2015-08-11 18:00:03 EDT ---

Hitting this in the BAGL lab.

Aug 11 16:59:55 localhost os-collect-config: [2015-08-11 16:59:55,117] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/43fe7cb8-0d1f-473a-b056-4364ee7f4e8d.pp. [6]
Aug 11 16:59:55 localhost os-collect-config: ^[[1;31mError: Received error response from Keystone server at http://172.21.33.11:35357/v3/domains: Unauthorized^[[0m

--- Additional comment from Gaƫtan Trellu on 2015-08-12 17:32:50 EDT ---

Hi,

We had the same issue.
Your undercloud should be able to communicate with your overcloud external network.

--- Additional comment from Emilien Macchi on 2015-08-18 15:01:39 EDT ---

This is not a networking or communication issue I think, because we have "Unauthorized". I guess this is a domain configuration issue...

--- Additional comment from James Slagle on 2015-08-18 15:22:38 EDT ---

fwiw, i saw this error when i had not specified a value for --ntp-server during a deployment

--- Additional comment from Martin Magr on 2015-08-19 03:39:59 EDT ---

I believe this is caused by heat domain creation script due to asynchronous corner cases during multi-controller deployment. Once [1] and [2] will be used, then this issue should perish. Will rebase [2] and run some tests.


[1] https://review.openstack.org/#/c/204541/
[2] https://review.openstack.org/#/c/180566/17



--- Additional comment from Shiva Prasad Rao on 2015-08-28 02:31:29 EDT ---

Is there a workaround for this issue? We have been hitting this issue on most of the deployments.

--- Additional comment from Martin Magr on 2015-08-28 11:24:43 EDT ---

Sadly, I'm not aware about any workaround. We might remove the #180566 patch from RHOSd until the latest patchset will be fully tested, but that will mean that customer will have to create domain manually and we can instead just save configuration from heat::keystone::domain class [1]. Derek, does that make sense?

[1] https://github.com/openstack/puppet-heat/blob/master/manifests/keystone/domain.pp#L78

--- Additional comment from Derek Higgins on 2015-08-28 11:34:08 EDT ---

Any time I've seen this it has been an underlying issue with the keystone setup on the controllers, the domain creation script just happens to be the first thing to use keystone.

Can you confirm that the time on each host is in sync? As times out of sync can cause tokens to be seen as invalid.

Comment 8 wes hayutin 2015-10-21 12:23:49 UTC
hrm.. this looks like it's failing.
https://bugzilla.redhat.com/show_bug.cgi?id=1273857

Comment 9 Martin Magr 2015-10-21 13:44:07 UTC
*** Bug 1273857 has been marked as a duplicate of this bug. ***

Comment 10 Jiri Stransky 2015-11-04 09:29:14 UTC
The backport had to be reverted from OPM because it brought the CI down, and while adapting instack-undercloud and t-h-t we found that the backport introduced a different bug.

The fix for the bug is submitted here:

https://review.openstack.org/#/c/239680/

And adaptation of instack-undercloud here:

https://review.openstack.org/#/c/239707/

And upstream re-submission of overcloud keystone heat domain here (this part had to be reverted upstream too because of intermittent issues, which would hopefully be solved by the first linked patch):

https://review.openstack.org/#/c/180566/


Given that we've historically hit intermittent issues in this area, it would be good to get all these patches merged in upstream first to make sure they're stable enough for backporting into product.

Comment 11 Jiri Stransky 2015-11-04 10:37:04 UTC
I think i found an alternative solution which might be less obtrusive (backporting just one patch instead of three).

The problem is with the heat_domain_id_setter resource:

Error: /Stage[main]/Heat::Keystone::Domain/Heat_domain_id_setter[heat_domain_id]/ensure: change from absent to present failed: Received error response from Keystone server at http://172.17.0.10:35357/v3/domains: Unauthorized\u001b

However, newer versions of Heat, including stable/kilo it seems [1], can use heat domain name instead of the ID, so the heat_domain_id_setter might not be necessary at all. The ID setting has been removed from the puppet-heat module some time ago already in favor of the name setting [2].

So perhaps backporting this single patch [2] could solve the issue for us. Martin does that sound correct?

The deployments made with older OSP-d which previously set the ID should just ignore the new name setting.

"(StrOpt) Keystone domain ID which contains heat template-defined users. If this option is set, stack_user_domain_name option will be ignored." [3]


[1] https://github.com/openstack/heat/blob/534e3e9d076f763f836510856cb890571bfb79c0/heat/common/heat_keystoneclient.py#L100-L102
[2] https://github.com/openstack/puppet-heat/commit/b7d19f43bd729e505d12979350082bf0c26b5b40
[3] http://docs.openstack.org/kilo/config-reference/content/orchestration-configuring-api.html

Comment 12 Jiri Stransky 2015-11-04 10:40:37 UTC
Just to clarify why i'm thinking this could be a solution -- the ID has to be queried from Keystone, requiring authentication credentials (and having the potential to fail with the Unauthorized error). On the other hand, the name is directly fed into the puppet class and passed into the Heat config file, not doing any Keystone lookups.

Comment 13 Martin Magr 2015-11-04 10:49:58 UTC
You're probably right. Resource heat_domain_id_setter was not able to get authorized because it was fetching admin user too soon (in certain corner cases during HA deployment) after user creation. So avoiding that behaviour should get rid of the problem.

Nevertheless we should get rid of domain creation via Python script and instead backport all the patches in future.

Comment 14 Jiri Stransky 2015-11-05 09:34:11 UTC
I applied the patch locally and overcloud deployed fine, with "stack_user_domain_name = heat_stack" present in heat.conf. I think we can backport the patch.

Comment 16 Mike Burns 2015-11-11 14:47:01 UTC
This was reverted due to issues caused in director.

Comment 17 Mike Burns 2015-11-11 15:16:04 UTC
newer patch from comment 11 is in latest builds, so moving back to ON_QA

Comment 18 Mike Burns 2015-11-23 15:27:50 UTC
*** Bug 1278925 has been marked as a duplicate of this bug. ***

Comment 19 Marius Cornea 2015-11-26 12:51:02 UTC
Verified the patch in c#11 in openstack-puppet-modules-2015.1.8-30.el7ost.noarch

Comment 21 errata-xmlrpc 2015-12-21 17:09:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2677