Bug 1324471

Summary: When multi-domains are enabled, expanding/upgrading the cloud fails
Product: Red Hat OpenStack Reporter: Udi Kalifon <ukalifon>
Component: openstack-puppet-modulesAssignee: Jason Guiditta <jguiditt>
Status: CLOSED CURRENTRELEASE QA Contact: Udi Kalifon <ukalifon>
Severity: urgent Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: asimonel, ayoung, dgurtner, dshevrin, felipe.alfaro, ggillies, jcall, jguiditt, jmelvin, jschluet, kbasil, markmc, mburns, mcornea, nkinder, nlevinki, ohochman, pharriso, racedoro, rhel-osp-director-maint, sathlang, sputhenp, srevivo
Target Milestone: asyncKeywords: ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-puppet-modules-7.1.3-1.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-01 16:03:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
stderr from deployment-show none

Description Udi Kalifon 2016-04-06 12:09:56 UTC
Created attachment 1144197 [details]
stderr from deployment-show

Description of problem:
I had a 7.3 overcloud working with a keystone domain that was configured to use LDAP. On the last deploy step in the upgrade procedure (the converge step) I fail:

2016-04-05 14:44:04 [NetworkDeployment]: SIGNAL_COMPLETE  Unknown
2016-04-05 14:46:20 [0]: SIGNAL_IN_PROGRESS  Signal: deployment failed (6)
2016-04-05 14:46:20 [0]: CREATE_FAILED  Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2016-04-05 14:46:21 [overcloud-ControllerNodesPostDeployment-rkzxhz73echi-ControllerOvercloudServicesDeployment_Step6-vtulcl75sssj]: UPDATE_FAILED  Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2016-04-05 14:46:22 [ControllerOvercloudServicesDeployment_Step6]: CREATE_FAILED  resources.ControllerOvercloudServicesDeployment_Step6: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2016-04-05 14:46:22 [0]: SIGNAL_COMPLETE  Unknown
2016-04-05 14:46:23 [0]: SIGNAL_COMPLETE  Unknown
2016-04-05 14:46:24 [overcloud-ControllerNodesPostDeployment-rkzxhz73echi]: CREATE_FAILED  Resource CREATE failed: resources.ControllerOvercloudServicesDeployment_Step6: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
2016-04-05 14:46:24 [0]: SIGNAL_COMPLETE  Unknown
2016-04-05 14:46:25 [0]: SIGNAL_COMPLETE  Unknown
2016-04-05 14:46:25 [ControllerDeployment]: SIGNAL_COMPLETE  Unknown
2016-04-05 14:46:26 [NetworkDeployment]: SIGNAL_COMPLETE  Unknown
2016-04-05 14:46:26 [0]: SIGNAL_COMPLETE  Unknown
2016-04-05 14:46:26 [NetworkDeployment]: SIGNAL_COMPLETE  Unknown
Stack overcloud UPDATE_FAILED
Heat Stack update failed.


In the output from deployment-show I see this:

Error: Execution of '/usr/bin/openstack user create --format shell admin --enable --password a2zhD64nW7W2avNzHcKHNYbNu --email root@localhost --domain Default' returned 1: Conflict occurred attempting to store user - Duplicate Entry (HTTP 409) (Request-ID: req-ae947dd9-ffc4-43f0-8847-770fed8095be)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack user create --format shell admin --enable --password a2zhD64nW7W2avNzHcKHNYbNu --email root@localhost --domain Default' returned 1: Conflict occurred attempting to store user - Duplicate Entry (HTTP 409) (Request-ID: req-ae947dd9-ffc4-43f0-8847-770fed8095be)


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.14-5.el7ost.noarch
python-tripleoclient-0.3.4-2.el7ost.noarch


How reproducible:
100%


Steps to Reproduce:
1. Deploy a 7.3 overcloud with ospd
2. Create a keystone domain which is working against Active Directory
3. Upgrade to 8.0 according to the guide


Additional info:
The complete stderr from deployment-show is attached.

Comment 2 Sofer Athlan-Guyot 2016-04-06 14:01:59 UTC
As discussed with udi, the multi-domain option is activated manually after the 7.3 deployment.  This break the prefetch step of the puppet keystone_user provider.  Because without multi-domain one can:

   openstack list user

and get all the users

With multi-domain this is no longer possible you have to specify the domains.

   openstack list user --domain default

There is a patch coming on the puppet side : https://review.openstack.org/#/c/299301/

In the meantime, to have a successful update, and as the ldap db is ro, one can just de-activate the multi-domain option during the update, so that the mysql db will be updated with the correct user (ie, the prefetch will work) and then re-activate it manually after the update.

Comment 4 Mike Burns 2016-04-07 21:36:02 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 5 Udi Kalifon 2016-04-10 10:12:36 UTC
I was able to work around the upgrade failure by disabling multi-domains in keystone before the upgrade. I added 2 new lines in the end of rhos-release-8.yaml (which is used in the first update step of the overcloud as per the upgrade guide) to disable multi-domains and restart the httpd and openstack-keystone services:

[stack@instack ~]$ cat rhos-release-8.yaml 
parameter_defaults:
  UpgradeInitCommand: |
    set -e
    rpm -ivh http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm || true  # rpm -i will return 1 if already installed
    rhos-release 8-director 
    crudini --set /etc/keystone/keystone.conf identity domain_specific_drivers_enabled false
    systemctl restart openstack-keystone httpd

Note that, in a production overcloud where ldap is used with the keystone multi-domains support, this workaround will effectively take the cloud offline for the duration of the upgrade.

Comment 6 Udi Kalifon 2016-04-19 14:39:53 UTC
Note that the same bug happens not only in upgrades, but also when adding nodes to expand the cloud.

Enabling multi-domains is done manually, after deployment, by logging in to each controller and running:

sudo crudini --set /etc/keystone/keystone.conf identity domain_specific_drivers_enabled true

... and restarting the httpd and openstack-keystone services.

Comment 7 David Gurtner 2016-06-10 10:07:50 UTC
This same bug also happens when just simply redeploying the overcloud (for example to change some parameters).

Comment 8 Adam Young 2016-08-12 19:31:14 UTC
If the problem is simply the post-install activation of the
domain_specific_backend config option, that can be done at deploy time:
for example , add a -e deploy_env.yml file with


parameter_defaults:
  controllerExtraConfig:
    keystone::using_domain_config: true
    keystone::config::keystone_config:
     identity/domain_specific_drivers_enabled true:
        value: true


That will allow you to store the LDAP specific information in the
database as opposed to having in files under /etc/keystone/domains


However, the bug report shows that what is failing is:

/usr/bin/openstack user create --format shell admin --enable --password
a2zhD64nW7W2avNzHcKHNYbNu --email root@localhost --domain Default'

I am guessing that Puppet is attempting to re-execute this user create
because the user list failed to show the user exists.


This call is creating the admin user, and we still need to confirm that user exists.


/Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]  seems to be
the puppet module code for creating the admin user,
and that is done here:

 request('user', 'show', [name, '--domain', domain])


http://git.openstack.org/cgit/openstack/puppet-keystone/tree/lib/puppet/provider/keystone.rb?h=stable%2Fliberty#n167

In the liberty code base.(OSP 8)

http://git.openstack.org/cgit/openstack/puppet-keystone/commit/?h=stable/liberty&id=d2c44f73

The last build of puppet keystone 8.0 series was

https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=499341

Unpacking the SRPM, I see:

   request('user', 'show', [name, '--domain', domain])

So, this package may solve the problem.  How can we confirm?  What version of the RPM is deployed?

Comment 10 Sofer Athlan-Guyot 2016-08-16 09:55:25 UTC
Added the two launchpad upstream related bugs and the related upstream fix and backport:

stable/mitaka        MERGED     306075
stable/liberty       MERGED     308365
master               MERGED     299301

Comment 15 Udi Kalifon 2016-09-01 14:12:41 UTC
I'm still getting the same error. It tries to create the 'admin' user and fails because this user exists already (probably it failed to list the users). This is the relevant part from heat deployment-show:

Error: Could not prefetch keystone_user provider 'openstack': Could not authenticate.\u001b[0m\n\u001b[1;31mError: Execution of '/usr/bin/openstack user create --format shell admin --enable --password QYWqcfeCgeucb69X3F8Xq4BW6 --email admin --domain Default' returned 1: Conflict occurred attempting to store user - Duplicate Entry (HTTP 409) (Request-ID: req-46c1f6a2-924d-436d-9926-7a25916449a5)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack user create --format shell admin --enable --password QYWqcfeCgeucb69X3F8Xq4BW6 --email admin --domain Default' returned 1: Conflict occurred attempting to store user - Duplicate Entry (HTTP 409) 

This is on the latest 8.0 puddle from 2016-08-19. I enabled multi-domains (there is no need to really configure LDAP, just enable the feature) and tried to expand the cloud from 1 compute to 2.

Comment 16 Udi Kalifon 2016-09-01 15:33:00 UTC
Verified.

Please ignore the previous comment, it happened because there were no repos configured on the nodes (won't happen to users with registrations).