Bug 1325475
Summary: | rhel-osp-director: upgrade 7.3->8.0, that follows update 7.2->7.3, times out due to os-collect-config auth failure | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||||
Component: | rhosp-director | Assignee: | Steven Hardy <shardy> | ||||||
Status: | CLOSED WORKSFORME | QA Contact: | Alexander Chuzhoy <sasha> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 8.0 (Liberty) | CC: | augol, dbecker, ggillies, jcoufal, jschluet, mburns, mcornea, morazi, ohochman, ramishra, rhel-osp-director-maint, sasha, sathlang, sbaker, shardy, therve, zbitter | ||||||
Target Milestone: | async | Keywords: | Reopened | ||||||
Target Release: | 8.0 (Liberty) | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Known Issue | |||||||
Doc Text: |
Cause: In the course of upgrading the undercloud from OSPd 7 to OSPd 8, the _member_ role is removed from the admin user because Keystone no longer uses that idiom. Trusts stored in the Heat database rely on the trustor user retaining all of their delegated roles, which includes the _member_ role.
Consequence: Heat stack updates after the undercloud upgrade fail with authentication errors.
Fix: Run the command:
openstack role add _member_ --user admin --project admin
to re-add the _member_ role to the admin user.
Result: The trusts work as expected to authenticate.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1523192 (view as bug list) | Environment: | |||||||
Last Closed: | 2017-01-25 01:48:46 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1523192 | ||||||||
Attachments: |
|
Description
Alexander Chuzhoy
2016-04-09 04:13:52 UTC
This appears to be due to a different bug where the nova hostname got changed. *** This bug has been marked as a duplicate of bug 1324739 *** Sorry, closed the wrong bz. Can you attach the keystone log from the undercloud? (Heat log wouldn't hurt either, although you already pasted the most interesting part.) Created attachment 1146031 [details]
keystone logs from the undercloud
Is it possible that roles changed during the upgrade? This error will happen if a trust was created for a user, and then a role/some roles were removed from that user. As Heat delegates all roles by default, if ine role is missing Keystone then fail to authenticate the trust. From the log: Using the keystone_authtoken user as the heat trustee user directly is deprecated. Please add the trustee credentials you need to the trustee section of your heat.conf file. Making authentication request to http://192.0.2.1:5000/v3/auth/tokens get_auth_ref /usr/lib/python2.7/site-packages/keystoneclient/auth/identity/v3/base.py:188 Request returned failure status: 403 request /usr/lib/python2.7/site-packages/keystoneclient/session.py:400 Exception during message handling: Trustee has no delegated roles. (Disable debug mode to suppress these details.) (HTTP 403) (Request-ID: req-0be8b5b3-8bdc-4ae1-9dff-b27ac9d2ac61) (HTTP 403) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply executor_callback)) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch executor_callback) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch result = func(ctxt, **new_args) File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 308, in wrapped return func(self, ctx, *args, **kwargs) File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 1404, in resource_signal stack = parser.Stack.load(cnxt, stack=s, use_stored_context=True) File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 408, in load cache_data=cache_data) File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 459, in _from_db current_deps=stack.current_deps, cache_data=cache_data) File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 192, in __init__ 'keystone').auth_ref.role_names File "/usr/lib/python2.7/site-packages/heat/engine/clients/__init__.py", line 70, in client return client_plugin.client() File "/usr/lib/python2.7/site-packages/heat/engine/clients/client_plugin.py", line 78, in client self._client = self._create() File "/usr/lib/python2.7/site-packages/heat/engine/clients/os/keystone.py", line 29, in _create return hkc.KeystoneClient(self.context) File "/usr/lib/python2.7/site-packages/heat/common/heat_keystoneclient.py", line 573, in __new__ return KeystoneClientV3(context) File "/usr/lib/python2.7/site-packages/heat/common/heat_keystoneclient.py", line 84, in __init__ self._client = self._v3_client_init() File "/usr/lib/python2.7/site-packages/heat/common/heat_keystoneclient.py", line 155, in _v3_client_init auth_ref = self.context.auth_plugin.get_access(self.session) File "/usr/lib/python2.7/site-packages/keystoneclient/auth/identity/base.py", line 240, in get_access self.auth_ref = self.get_auth_ref(session) File "/usr/lib/python2.7/site-packages/keystoneclient/auth/identity/v3/base.py", line 190, in get_auth_ref authenticated=False, log=False, **rkwargs) File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 501, in post return self.request(url, 'POST', **kwargs) File "/usr/lib/python2.7/site-packages/keystoneclient/utils.py", line 337, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 401, in request raise exceptions.from_response(resp, method, url) Forbidden: Trustee has no delegated roles. (Disable debug mode to suppress these details.) (HTTP 403) (Request-ID: req-0be8b5b3-8bdc-4ae1-9dff-b27ac9d2ac61) So it seems that the trustee (i.e. the heat user) is lacking some role it requires. It also appears that we are using the keystone_authtoken user for this purpose, and that this is deprecated. I wonder if there is some change in the configuration that we were supposed to make when upgrading to Liberty, but didn't. Steve, any ideas? While creating a trust scoped token for trustee(during stack initialization with stored context), the roles in the trust are matched with the trustor roles by keystone[1]. It seems to be failing there. I suspect this could be due to either of the following: a. It seems keystone v3 does not create the '_member_' role by default. If the upgrade to 8.0 involves migration to v3, probably trust has this role where as trustor doesn't. b. The trustor roles have been changed and is not matching with the the trust roles(with the stored_context trust_id) [1] https://github.com/openstack/keystone/blob/master/keystone/token/providers/common.py#L374-L383 > So it seems that the trustee (i.e. the heat user) is lacking some role it requires. So, this isn't my interpretation of the error, because the scope of the delegation (e.g what roles are delegated) is defined by the trust (which we only hold the ID of, all trust->role mapping is done inside keystone) Thus if trustor "steve" has role "a", and it's delegated via trust 123 to trustee "heat", it shouldn't matter what roles "heat" has, only that "steve" has role "a", and that it exists at the time of delegation to "heat". I suspect Rabi is on the right track here, in that we created the trust with a v2 issued token, which probably means we delegated all roles including _member_, then we upgraded, so we need to confirm all roles delegated via the trust are still available when requesting a trust-scoped token via the v3 API. I'll try to make a small reproducer that proves/disproves the theory described by Rabi (unless he's already done so). > It also appears that we are using the keystone_authtoken user for this purpose, and that this is deprecated. I wonder if there is some change in the configuration that we were supposed to make when upgrading to Liberty, but didn't. Steve, any ideas? Yes, I fixed it in devstack and it's now also fixed in puppet-heat: https://review.openstack.org/#/c/254755 https://review.openstack.org/#/c/261398/ https://review.openstack.org/#/c/261326/ I don't think that is related to this problem though, it's only a warning and the resulting trustee user will be the same. So, here's a small script which will list all the trusts, and the roles they delegate. If we can get access to a platform with this problem, we can get the trust ID heat is trying to use, filter these results, and compare the roles in the trust with those in the ROLES print below. from keystoneclient.v3 import client OS_AUTH_URL_V3='http://192.168.1.13:5000/v3' OS_ENDPOINT_V3='http://192.168.1.13:5000/v3' USERNAME='admin' PASSWORD='foobar' TENANT_NAME='admin' DEBUG = True # Create a client with username/password c = client.Client(debug=DEBUG, username=USERNAME, password=PASSWORD, tenant_name=TENANT_NAME, auth_url=OS_AUTH_URL_V3, endpoint=OS_ENDPOINT_V3) ret = c.authenticate() for t in c.trusts.list(): the_t = c.trusts.get(t) print "SHDEBUG trust get=%s" % the_t print "SHDEBUG trust roles=%s" % the_t.roles print "---" print "SHDEBUG ROLES=%s" % c.roles.list() Note we need to run that script, with credentials/IP adjusted appropriately, on the undercloud after this problem has occurred - ideally correlating with the actual trust_id which we can obtain either via querying the user_creds table of the heat DB, or by adding a line of debug to the heat code which shows us the trust_id, or by turning on debug logging and seeing if the keystone logs contain it. If anyone attempts to reproduce this, please: 1. Collect all the roles including their IDs from keystone before/after the upgrade, either using CLI tools or the script above 2. Collect before/after heat.conf files from the undercloud. Note we need these before/after the point ofre-running openstack undercloud install during the undercloud upgrade, not the overcloud stack upgrade. Created attachment 1147843 [details]
The requested files/info.
I don't see anything in it. Can we get access to a running environment? The debug data does show we're trying to delegate _member_, so we need to prove it getting a trust scoped token from v3 keystone where _member_ is delegated actually works - the before/after heat.conf's should help with attempting an accurate reproduce too We don't change the undercloud token provider to fernet as part of the upgrade do we? https://review.openstack.org/#/c/278693/ So, therve got access to an environment suffering from this issue, and noticed that we delete the _member_ role assignment for all users. This is probably happening due to puppet-keystone getting reapplied during the upgrade process on the undercloud. I reproduced this locally and raised an upstream bug to further investigate: https://bugs.launchpad.net/tripleo/+bug/1571708 I got access to the platform with the problem, and the admin user doesn't have the _member_ role anymore. It's removed at some place in the process.. I've found this in the keystone logs: DELETE http://192.0.2.1:35357/v3/projects/5a2e8afcd79b440c9ae13d43d6488f71/users/618f9860e14c49ec93add932332847a9/roles/9fe2ff9ee4384b1894a90878d3e92bab 5a2e8afcd79b440c9ae13d43d6488f7 is the admin tenant 618f9860e14c49ec93add932332847a9 is the admin user 9fe2ff9ee4384b1894a90878d3e92bab is the _member role. Around that request we can see that the role seems to be removed from all the other service users. We should identify which part of the upgrade does that, as I think it's a wrong thing to do. I can think of one possible workaround beforehand, which is to set trusts_delegated_roles to admin in the undercloud beforehand. Otherwise, it may be nice to have the ability the regenerate the trust of the stack with a Heat API call. It seems bogus that a stack is broken once a role is removed from a user. > I can think of one possible workaround beforehand, which is to set trusts_delegated_roles to admin in the undercloud beforehand. This won't work because the trusts referenced from the heat DB (e.g for an existing overcloud deployment) will still reference the old setting (which is to delegate all roles, including _member_. I agree having a way to update the trust stored by heat would be good, but we'd have to be careful to just update the current user_creds record, because nested stacks all reference the same record. Anyway, I discussed this with Emilienm and chem in #tripleo, it is puppet that deletes the role assignments, because puppet has no knowledge of the special _member_ role that used to be created by keystone directy in the DB. We agreed the least-bad solution was to detect when the _member_ role is present and pass a boolean in so puppet can append the _member_ role and maintain the assignment to the admin user (investigation shows the role itself is left intact, so we just need puppet to leave the assignment to the user alone. Emilien and I worked on this patch (only partially tested so far, feedback welcome): https://review.openstack.org/#/c/307352/ To clarify the workaround for anyone hitting this - if you re-add the _member_ role to the admin user in the admin project, all should be OK (you'll have to do this every time you re-run openstack undercloud install until we land the patch above) Assigned to shardy to document the workaround in the Doc Text I was expecting us to land https://review.openstack.org/#/c/307352/ instead of just documenting the manual workaround, but we can do both I guess if anyone reviews that patch. To clarify, the proposed manual workaround is: openstack role add _member_ --user admin --project admin This re-adds the _member_role that heat needs and is erroneously removed by puppet. It will be necessary to re-add it every time "openstack undercloud install" is re-run, until/unless we land the workaround patch referenced above. [stack@instack ~]$ . stackrc [stack@instack ~]$ openstack role list +----------------------------------+-----------------+ | ID | Name | +----------------------------------+-----------------+ | 3d4119f0c547490390e7176168d3b9f9 | admin | | 495e20aa15ef4abdbaff7082bf75e6fb | heat_stack_user | | 949f82aa1237461fab542ca916e7f7bf | ResellerAdmin | | 9fe2ff9ee4384b1894a90878d3e92bab | _member_ | | a0b57f52c98844d2a3a977bac8e2ce03 | swiftoperator | +----------------------------------+-----------------+ [stack@instack ~]$ openstack role list --user admin --project admin +----------------------------------+-------+---------+-------+ | ID | Name | Project | User | +----------------------------------+-------+---------+-------+ | 3d4119f0c547490390e7176168d3b9f9 | admin | admin | admin | +----------------------------------+-------+---------+-------+ [stack@instack ~]$ openstack role add _member_ --user admin --project admin +-------+----------------------------------+ | Field | Value | +-------+----------------------------------+ | id | 9fe2ff9ee4384b1894a90878d3e92bab | | name | _member_ | +-------+----------------------------------+ [stack@instack ~]$ openstack role list --user admin --project admin +----------------------------------+----------+---------+-------+ | ID | Name | Project | User | +----------------------------------+----------+---------+-------+ | 3d4119f0c547490390e7176168d3b9f9 | admin | admin | admin | | 9fe2ff9ee4384b1894a90878d3e92bab | _member_ | admin | admin | +----------------------------------+----------+---------+-------+ If this workaround can be confirmed working, we can update the doc-text. Steven, So I see that after implementing the step in comment #25, the upgrade process advanced to the next step- the one including /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml Where it failed after 4 hours, with: ERROR: Authentication failed: Authentication required [stack@instack ~]$ heat resource-list -n5 overcloud|grep -v COMPLE +--------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+---------------------+-----------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +--------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+---------------------+-----------------------------------------------------------------------------------------------+ | UpdateWorkflow | 78b53603-4d9d-4780-b67b-cce2051a4f5e | OS::TripleO::Tasks::UpdateWorkflow | UPDATE_FAILED | 2016-05-12T22:48:18 | overcloud | | ControllerPacemakerUpgradeDeployment_Step1 | 6e1c1888-e21d-40ac-9169-3314636f64ac | OS::Heat::SoftwareDeploymentGroup | CREATE_FAILED | 2016-05-12T22:48:39 | overcloud-UpdateWorkflow-msi5ojo7gp2z | | 0 | 1635f2c5-d3b0-4653-be66-175cd7263c9d | OS::Heat::SoftwareDeployment | CREATE_IN_PROGRESS | 2016-05-12T22:48:47 | overcloud-UpdateWorkflow-msi5ojo7gp2z-ControllerPacemakerUpgradeDeployment_Step1-4xqf2ykvzqht | | 0 | 3f673a67-3f82-4694-b9e3-fdb0b1e77330 | OS::Heat::StructuredDeployment | UPDATE_FAILED | 2016-05-12T22:48:55 | overcloud-ControllerAllNodesDeployment-jknfu2pzvm33 | | ControllerAllNodesDeployment | e053de98-a315-4d36-8eb2-d5bb4941a2a3 | OS::Heat::StructuredDeployments | UPDATE_FAILED | 2016-05-12T22:48:55 | overcloud | +--------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+---------------------+-----------------------------------------------------------------------------------------------+ The pcs cluster doesn't run on the controllers. Is there anything in the log to suggest what is causing the timeout now? Zane, I believe appplying the fix from comment #25 should do the trick. Just need to have it. Sorry, I misunderstood. Since the workaround is working for you, I've added the doc_text as a Known Issue. The upstream patch hasn't merged yet, I'll see what I can do about pushing that forward. The doc text looks good, I'll see if I can push the upstream patch forward as it automates that workaround, basically it stalled due to lack of review feedback. The ustream patch has been merged in 5.0.0.0rc2 upstream. Is this open bug still relevant ? Moving ON_QA. (In reply to Sofer Athlan-Guyot from comment #33) > The ustream patch has been merged in 5.0.0.0rc2 upstream. Is this open bug > still relevant ? Moving ON_QA. (In reply to Steven Hardy from comment #24) > I was expecting us to land https://review.openstack.org/#/c/307352/ instead > of just documenting the manual workaround, but we can do both I guess if > anyone reviews that patch. I checked the file: /usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py The patch from the comment #24 - doesn't seems to be included on osp8 deployed environment. Environment: ------------ instack-undercloud-2.2.7-8.el7ost.noarch openstack-puppet-modules-7.1.5-1.el7ost.noarch openstack-tripleo-puppet-elements-0.0.5-1.el7ost.noarch puppet-3.6.2-4.el7sat.noarch openstack-heat-engine-5.0.1-9.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-23.el7ost.noarch openstack-heat-templates-0-0.1.20151019.el7ost.noarch heat-cfntools-1.2.8-2.el7.noarch openstack-heat-api-cfn-5.0.1-9.el7ost.noarch openstack-heat-common-5.0.1-9.el7ost.noarch openstack-tripleo-heat-templates-0.8.14-23.el7ost.noarch python-heatclient-1.0.0-1.el7ost.noarch openstack-heat-api-cloudwatch-5.0.1-9.el7ost.noarch openstack-heat-api-5.0.1-9.el7ost.noarch (In reply to Sofer Athlan-Guyot from comment #33) > The ustream patch has been merged in 5.0.0.0rc2 upstream. Is this open bug > still relevant ? Moving ON_QA. I'm not sure if this bug is even still relevant, I don't think it's likely that there are still customers that will do 7.2 - 7.3 and then upgrade to osp8 since it's Very Old environments. Hi, Closing this issue as it seems not relevant any more. It can always be re-opened if needed. Regards, *** Bug 1361746 has been marked as a duplicate of this bug. *** |