| Summary: | When multi-domains are enabled, expanding/upgrading the cloud fails | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Udi Kalifon <ukalifon> | ||||
| Component: | openstack-puppet-modules | Assignee: | Jason Guiditta <jguiditt> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Udi Kalifon <ukalifon> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 8.0 (Liberty) | CC: | asimonel, ayoung, dgurtner, dshevrin, felipe.alfaro, ggillies, jcall, jguiditt, jmelvin, jschluet, kbasil, markmc, mburns, mcornea, nkinder, nlevinki, ohochman, pharriso, racedoro, rhel-osp-director-maint, sathlang, sputhenp, srevivo | ||||
| Target Milestone: | async | Keywords: | ZStream | ||||
| Target Release: | 8.0 (Liberty) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-puppet-modules-7.1.3-1.el7ost | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-09-01 16:03:13 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
As discussed with udi, the multi-domain option is activated manually after the 7.3 deployment. This break the prefetch step of the puppet keystone_user provider. Because without multi-domain one can: openstack list user and get all the users With multi-domain this is no longer possible you have to specify the domains. openstack list user --domain default There is a patch coming on the puppet side : https://review.openstack.org/#/c/299301/ In the meantime, to have a successful update, and as the ldap db is ro, one can just de-activate the multi-domain option during the update, so that the mysql db will be updated with the correct user (ie, the prefetch will work) and then re-activate it manually after the update. This bug did not make the OSP 8.0 release. It is being deferred to OSP 10. I was able to work around the upgrade failure by disabling multi-domains in keystone before the upgrade. I added 2 new lines in the end of rhos-release-8.yaml (which is used in the first update step of the overcloud as per the upgrade guide) to disable multi-domains and restart the httpd and openstack-keystone services:
[stack@instack ~]$ cat rhos-release-8.yaml
parameter_defaults:
UpgradeInitCommand: |
set -e
rpm -ivh http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm || true # rpm -i will return 1 if already installed
rhos-release 8-director
crudini --set /etc/keystone/keystone.conf identity domain_specific_drivers_enabled false
systemctl restart openstack-keystone httpd
Note that, in a production overcloud where ldap is used with the keystone multi-domains support, this workaround will effectively take the cloud offline for the duration of the upgrade.
Note that the same bug happens not only in upgrades, but also when adding nodes to expand the cloud. Enabling multi-domains is done manually, after deployment, by logging in to each controller and running: sudo crudini --set /etc/keystone/keystone.conf identity domain_specific_drivers_enabled true ... and restarting the httpd and openstack-keystone services. This same bug also happens when just simply redeploying the overcloud (for example to change some parameters).
If the problem is simply the post-install activation of the
domain_specific_backend config option, that can be done at deploy time:
for example , add a -e deploy_env.yml file with
parameter_defaults:
controllerExtraConfig:
keystone::using_domain_config: true
keystone::config::keystone_config:
identity/domain_specific_drivers_enabled true:
value: true
That will allow you to store the LDAP specific information in the
database as opposed to having in files under /etc/keystone/domains
However, the bug report shows that what is failing is:
/usr/bin/openstack user create --format shell admin --enable --password
a2zhD64nW7W2avNzHcKHNYbNu --email root@localhost --domain Default'
I am guessing that Puppet is attempting to re-execute this user create
because the user list failed to show the user exists.
This call is creating the admin user, and we still need to confirm that user exists.
/Stage[main]/Keystone::Roles::Admin/Keystone_user[admin] seems to be
the puppet module code for creating the admin user,
and that is done here:
request('user', 'show', [name, '--domain', domain])
http://git.openstack.org/cgit/openstack/puppet-keystone/tree/lib/puppet/provider/keystone.rb?h=stable%2Fliberty#n167
In the liberty code base.(OSP 8)
http://git.openstack.org/cgit/openstack/puppet-keystone/commit/?h=stable/liberty&id=d2c44f73
The last build of puppet keystone 8.0 series was
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=499341
Unpacking the SRPM, I see:
request('user', 'show', [name, '--domain', domain])
So, this package may solve the problem. How can we confirm? What version of the RPM is deployed?
Added the two launchpad upstream related bugs and the related upstream fix and backport: stable/mitaka MERGED 306075 stable/liberty MERGED 308365 master MERGED 299301 I'm still getting the same error. It tries to create the 'admin' user and fails because this user exists already (probably it failed to list the users). This is the relevant part from heat deployment-show: Error: Could not prefetch keystone_user provider 'openstack': Could not authenticate.\u001b[0m\n\u001b[1;31mError: Execution of '/usr/bin/openstack user create --format shell admin --enable --password QYWqcfeCgeucb69X3F8Xq4BW6 --email admin --domain Default' returned 1: Conflict occurred attempting to store user - Duplicate Entry (HTTP 409) (Request-ID: req-46c1f6a2-924d-436d-9926-7a25916449a5)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack user create --format shell admin --enable --password QYWqcfeCgeucb69X3F8Xq4BW6 --email admin --domain Default' returned 1: Conflict occurred attempting to store user - Duplicate Entry (HTTP 409) This is on the latest 8.0 puddle from 2016-08-19. I enabled multi-domains (there is no need to really configure LDAP, just enable the feature) and tried to expand the cloud from 1 compute to 2. Verified. Please ignore the previous comment, it happened because there were no repos configured on the nodes (won't happen to users with registrations). |
Created attachment 1144197 [details] stderr from deployment-show Description of problem: I had a 7.3 overcloud working with a keystone domain that was configured to use LDAP. On the last deploy step in the upgrade procedure (the converge step) I fail: 2016-04-05 14:44:04 [NetworkDeployment]: SIGNAL_COMPLETE Unknown 2016-04-05 14:46:20 [0]: SIGNAL_IN_PROGRESS Signal: deployment failed (6) 2016-04-05 14:46:20 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 2016-04-05 14:46:21 [overcloud-ControllerNodesPostDeployment-rkzxhz73echi-ControllerOvercloudServicesDeployment_Step6-vtulcl75sssj]: UPDATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 2016-04-05 14:46:22 [ControllerOvercloudServicesDeployment_Step6]: CREATE_FAILED resources.ControllerOvercloudServicesDeployment_Step6: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 2016-04-05 14:46:22 [0]: SIGNAL_COMPLETE Unknown 2016-04-05 14:46:23 [0]: SIGNAL_COMPLETE Unknown 2016-04-05 14:46:24 [overcloud-ControllerNodesPostDeployment-rkzxhz73echi]: CREATE_FAILED Resource CREATE failed: resources.ControllerOvercloudServicesDeployment_Step6: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 2016-04-05 14:46:24 [0]: SIGNAL_COMPLETE Unknown 2016-04-05 14:46:25 [0]: SIGNAL_COMPLETE Unknown 2016-04-05 14:46:25 [ControllerDeployment]: SIGNAL_COMPLETE Unknown 2016-04-05 14:46:26 [NetworkDeployment]: SIGNAL_COMPLETE Unknown 2016-04-05 14:46:26 [0]: SIGNAL_COMPLETE Unknown 2016-04-05 14:46:26 [NetworkDeployment]: SIGNAL_COMPLETE Unknown Stack overcloud UPDATE_FAILED Heat Stack update failed. In the output from deployment-show I see this: Error: Execution of '/usr/bin/openstack user create --format shell admin --enable --password a2zhD64nW7W2avNzHcKHNYbNu --email root@localhost --domain Default' returned 1: Conflict occurred attempting to store user - Duplicate Entry (HTTP 409) (Request-ID: req-ae947dd9-ffc4-43f0-8847-770fed8095be)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack user create --format shell admin --enable --password a2zhD64nW7W2avNzHcKHNYbNu --email root@localhost --domain Default' returned 1: Conflict occurred attempting to store user - Duplicate Entry (HTTP 409) (Request-ID: req-ae947dd9-ffc4-43f0-8847-770fed8095be) Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-0.8.14-5.el7ost.noarch python-tripleoclient-0.3.4-2.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy a 7.3 overcloud with ospd 2. Create a keystone domain which is working against Active Directory 3. Upgrade to 8.0 according to the guide Additional info: The complete stderr from deployment-show is attached.