Bug 1626140

Summary: Custom roles deployment with controller pcmk + controller systemd roles deployment fails caused by Keystone project/users not being created
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Steven Hardy <shardy>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: urgent Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: agurenko, aschultz, asimonel, dbecker, dpeacock, emacchi, mburns, mcornea, michele, morazi, sbaker, shardy, tonyb, yprokule
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-9.2.1-0.20190119154856.fe11ade Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-02 20:08:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
roles and environment files
none
roles_data.yaml none

Description Marius Cornea 2018-09-06 16:11:13 UTC
Description of problem:

Custom roles deployment with controller pcmk + controller systemd roles deployment fails caused by Keystone project/users not being created

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.4-20.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server 10.35.255.6 \
-r /home/stack/roles_data.yaml \
-e /home/stack/virt/config_lvm.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/docker-images.yaml \

Actual results:
Deployment fails:

            "Error running ['docker', 'run', '--name', 'ceilometer_gnocchi_upgrade', '--label', 'config_id=tripleo_step5', '--label', 'container_name=ceilometer_gnocchi_upgrade', '--label', 'managed_by=paunch', '--label', 'config_data={\"start_order\": 99, \"healthcheck\": {\"test\": \"/openstack/healthcheck\"}, \"image\": \"192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2018-08-30.1\", \"command\": [\"/usr/bin/bootstrap_host_exec\", \"ceilometer_agent_central\", \"su ceilometer -s /bin/bash -c \\'for n in {1..10}; do /usr/bin/ceilometer-upgrade --skip-metering-database && exit 0 || sleep 30; done; exit 1\\'\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/config-data/ceilometer/etc/ceilometer/:/etc/ceilometer/:ro\", \"/var/log/containers/ceilometer:/var/log/ceilometer\"], \"net\": \"host\", \"detach\": false, \"privileged\": false}', '--net=host', '--health-cmd=/openstack/healthcheck', '--privileged=false', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/config-data/ceilometer/etc/ceilometer/:/etc/ceilometer/:ro', '--volume=/var/log/containers/ceilometer:/var/log/ceilometer', '192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2018-08-30.1', '/usr/bin/bootstrap_host_exec', 'ceilometer_agent_central', \"su ceilometer -s /bin/bash -c 'for n in {1..10}; do /usr/bin/ceilometer-upgrade --skip-metering-database && exit 0 || sleep 30; done; exit 1'\"]. [1]", 
            "stdout: "
        ]
/var/log/containers/ceilometer/ceilometer-upgrade.log shows:

2018-09-06 03:42:58.316 210 ERROR ceilometer Unauthorized: The request you have made requires authentication. (HTTP 401) (Request-ID: req-6dcf6459-0e7f-4788-a7f6-5f03a12df827)


Checking the Keystone db:
()[root@overcloud-controller-0 /]# mysql -e 'select * from keystone.project'
+----------------------------------+--------------------------+-------+-----------------------------------------------+---------+--------------------------+-----------+-----------+
| id                               | name                     | extra | description                                   | enabled | domain_id                | parent_id | is_domain |
+----------------------------------+--------------------------+-------+-----------------------------------------------+---------+--------------------------+-----------+-----------+
| 09322b6a09844fe580f44c921626c7c0 | admin                    | {}    | Bootstrap project for initializing the cloud. |       1 | default                  | default   |         0 |
| <<keystone.domain.root>>         | <<keystone.domain.root>> | {}    |                                               |       0 | <<keystone.domain.root>> | NULL      |         1 |
| default                          | Default                  | {}    | The default domain                            |       1 | <<keystone.domain.root>> | NULL      |         1 |
+----------------------------------+--------------------------+-------+-----------------------------------------------+---------+--------------------------+-----------+-----------+
()[root@overcloud-controller-0 /]# mysql -e 'select * from keystone.user'
+----------------------------------+-------+---------+--------------------+---------------------+----------------+-----------+
| id                               | extra | enabled | default_project_id | created_at          | last_active_at | domain_id |
+----------------------------------+-------+---------+--------------------+---------------------+----------------+-----------+
| 4da527c70d9d4edbbbae983bb0f8e9a8 | {}    |       1 | NULL               | 2018-09-06 03:25:23 | NULL           | default   |
+----------------------------------+-------+---------+--------------------+---------------------+----------------+-----------+


Expected results:
Deployment succeds.

Additional info:

Attaching the environment files and sosreport.

Comment 1 Marius Cornea 2018-09-06 16:14:25 UTC
Created attachment 1481356 [details]
roles and  environment files

Comment 3 Marius Cornea 2018-09-06 18:42:17 UTC
Created attachment 1481397 [details]
roles_data.yaml

Comment 9 Steven Hardy 2018-12-13 16:07:18 UTC
Ok so update on this, there are two related bootstrapping problems we need to solve:

1. Bootstrapping performed via puppet uses a boostrap ID that's only unique per-role, e.g if you deploy something on more than one role it will run the boostrapping tasks more than once.

This is resolved by the series I posted here:

https://review.openstack.org/#/q/status:merged+topic:bug/1792613

2. Bootstrapping that happens via docker_puppet_tasks only ever happens on the primary controller, due to the ansible conditionals that check against bootstrap_nodeid

To solve this we instead need to calculate the bootstrap node per *service* either the the deploy_steps_tasks ansible or inside docker-puppet.py

The second part is still TODO, but if backported should resolve the specific case in the bug (I'm not sure if we will also need (1) to resolve the configuration Marius tested, but we'll definitely need (2))

Comment 10 Steven Hardy 2018-12-21 10:09:31 UTC
Ok I pushed https://review.openstack.org/#/c/625078/ which aims to resolve the (2) problem from comment #9 - I think that is the root-cause of this bug and (1) is a related issue which may not be a blocker for this particular configuration.

Currently need further testing/review of https://review.openstack.org/#/c/625078/ but if that works as expected we may be able to backport that and the tripleo-common patch https://review.openstack.org/#/c/605046/ idependent of the (fairly long) series of other patches which address (1) ref https://review.openstack.org/#/q/topic:bug/1792613+(status:open+OR+status:merged)

Marius - I'll needinfo you, perhaps you can help me with this testing when you have some time, and we can figure out if the partial-backport approach is viable?

We could backport the entire series but I'm a little wary as it involves a lot of changes and the problem where a service spans multiple roles is probably a less common case than the one reported in this bug?

Comment 11 Marius Cornea 2019-01-28 21:18:40 UTC
(In reply to Steven Hardy from comment #10)
> Ok I pushed https://review.openstack.org/#/c/625078/ which aims to resolve
> the (2) problem from comment #9 - I think that is the root-cause of this bug
> and (1) is a related issue which may not be a blocker for this particular
> configuration.
> 
> Currently need further testing/review of
> https://review.openstack.org/#/c/625078/ but if that works as expected we
> may be able to backport that and the tripleo-common patch
> https://review.openstack.org/#/c/605046/ idependent of the (fairly long)
> series of other patches which address (1) ref
> https://review.openstack.org/#/q/topic:bug/1792613+(status:open+OR+status:
> merged)
> 
> Marius - I'll needinfo you, perhaps you can help me with this testing when
> you have some time, and we can figure out if the partial-backport approach
> is viable?
> 

Hey Steve! I tested the rocky backport(https://review.openstack.org/#/c/632714/) and it allowed the deployment to complete successfully.

Comment 12 Lukas Bezdicka 2019-02-04 13:41:57 UTC
*** Bug 1668641 has been marked as a duplicate of this bug. ***

Comment 27 errata-xmlrpc 2019-07-02 20:08:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1672