Bug 1626140 - Custom roles deployment with controller pcmk + controller systemd roles deployment fails caused by Keystone project/users not being created
Summary: Custom roles deployment with controller pcmk + controller systemd roles deplo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: z3
: 14.0 (Rocky)
Assignee: Steven Hardy
QA Contact: Sasha Smolyak
URL:
Whiteboard:
: 1668641 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-06 16:11 UTC by Marius Cornea
Modified: 2019-07-02 20:08 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.2.1-0.20190119154856.fe11ade
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-02 20:08:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
roles and environment files (11.15 KB, application/x-gzip)
2018-09-06 16:14 UTC, Marius Cornea
no flags Details
roles_data.yaml (16.34 KB, text/plain)
2018-09-06 18:42 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1792613 0 None None None 2018-09-14 17:31:45 UTC
OpenStack gerrit 632714 0 None MERGED run docker_puppet_tasks on any role 2020-03-30 11:38:05 UTC
Red Hat Product Errata RHBA-2019:1672 0 None None None 2019-07-02 20:08:44 UTC

Description Marius Cornea 2018-09-06 16:11:13 UTC
Description of problem:

Custom roles deployment with controller pcmk + controller systemd roles deployment fails caused by Keystone project/users not being created

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.4-20.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server 10.35.255.6 \
-r /home/stack/roles_data.yaml \
-e /home/stack/virt/config_lvm.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/docker-images.yaml \

Actual results:
Deployment fails:

            "Error running ['docker', 'run', '--name', 'ceilometer_gnocchi_upgrade', '--label', 'config_id=tripleo_step5', '--label', 'container_name=ceilometer_gnocchi_upgrade', '--label', 'managed_by=paunch', '--label', 'config_data={\"start_order\": 99, \"healthcheck\": {\"test\": \"/openstack/healthcheck\"}, \"image\": \"192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2018-08-30.1\", \"command\": [\"/usr/bin/bootstrap_host_exec\", \"ceilometer_agent_central\", \"su ceilometer -s /bin/bash -c \\'for n in {1..10}; do /usr/bin/ceilometer-upgrade --skip-metering-database && exit 0 || sleep 30; done; exit 1\\'\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/config-data/ceilometer/etc/ceilometer/:/etc/ceilometer/:ro\", \"/var/log/containers/ceilometer:/var/log/ceilometer\"], \"net\": \"host\", \"detach\": false, \"privileged\": false}', '--net=host', '--health-cmd=/openstack/healthcheck', '--privileged=false', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/config-data/ceilometer/etc/ceilometer/:/etc/ceilometer/:ro', '--volume=/var/log/containers/ceilometer:/var/log/ceilometer', '192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2018-08-30.1', '/usr/bin/bootstrap_host_exec', 'ceilometer_agent_central', \"su ceilometer -s /bin/bash -c 'for n in {1..10}; do /usr/bin/ceilometer-upgrade --skip-metering-database && exit 0 || sleep 30; done; exit 1'\"]. [1]", 
            "stdout: "
        ]
/var/log/containers/ceilometer/ceilometer-upgrade.log shows:

2018-09-06 03:42:58.316 210 ERROR ceilometer Unauthorized: The request you have made requires authentication. (HTTP 401) (Request-ID: req-6dcf6459-0e7f-4788-a7f6-5f03a12df827)


Checking the Keystone db:
()[root@overcloud-controller-0 /]# mysql -e 'select * from keystone.project'
+----------------------------------+--------------------------+-------+-----------------------------------------------+---------+--------------------------+-----------+-----------+
| id                               | name                     | extra | description                                   | enabled | domain_id                | parent_id | is_domain |
+----------------------------------+--------------------------+-------+-----------------------------------------------+---------+--------------------------+-----------+-----------+
| 09322b6a09844fe580f44c921626c7c0 | admin                    | {}    | Bootstrap project for initializing the cloud. |       1 | default                  | default   |         0 |
| <<keystone.domain.root>>         | <<keystone.domain.root>> | {}    |                                               |       0 | <<keystone.domain.root>> | NULL      |         1 |
| default                          | Default                  | {}    | The default domain                            |       1 | <<keystone.domain.root>> | NULL      |         1 |
+----------------------------------+--------------------------+-------+-----------------------------------------------+---------+--------------------------+-----------+-----------+
()[root@overcloud-controller-0 /]# mysql -e 'select * from keystone.user'
+----------------------------------+-------+---------+--------------------+---------------------+----------------+-----------+
| id                               | extra | enabled | default_project_id | created_at          | last_active_at | domain_id |
+----------------------------------+-------+---------+--------------------+---------------------+----------------+-----------+
| 4da527c70d9d4edbbbae983bb0f8e9a8 | {}    |       1 | NULL               | 2018-09-06 03:25:23 | NULL           | default   |
+----------------------------------+-------+---------+--------------------+---------------------+----------------+-----------+


Expected results:
Deployment succeds.

Additional info:

Attaching the environment files and sosreport.

Comment 1 Marius Cornea 2018-09-06 16:14:25 UTC
Created attachment 1481356 [details]
roles and  environment files

Comment 3 Marius Cornea 2018-09-06 18:42:17 UTC
Created attachment 1481397 [details]
roles_data.yaml

Comment 9 Steven Hardy 2018-12-13 16:07:18 UTC
Ok so update on this, there are two related bootstrapping problems we need to solve:

1. Bootstrapping performed via puppet uses a boostrap ID that's only unique per-role, e.g if you deploy something on more than one role it will run the boostrapping tasks more than once.

This is resolved by the series I posted here:

https://review.openstack.org/#/q/status:merged+topic:bug/1792613

2. Bootstrapping that happens via docker_puppet_tasks only ever happens on the primary controller, due to the ansible conditionals that check against bootstrap_nodeid

To solve this we instead need to calculate the bootstrap node per *service* either the the deploy_steps_tasks ansible or inside docker-puppet.py

The second part is still TODO, but if backported should resolve the specific case in the bug (I'm not sure if we will also need (1) to resolve the configuration Marius tested, but we'll definitely need (2))

Comment 10 Steven Hardy 2018-12-21 10:09:31 UTC
Ok I pushed https://review.openstack.org/#/c/625078/ which aims to resolve the (2) problem from comment #9 - I think that is the root-cause of this bug and (1) is a related issue which may not be a blocker for this particular configuration.

Currently need further testing/review of https://review.openstack.org/#/c/625078/ but if that works as expected we may be able to backport that and the tripleo-common patch https://review.openstack.org/#/c/605046/ idependent of the (fairly long) series of other patches which address (1) ref https://review.openstack.org/#/q/topic:bug/1792613+(status:open+OR+status:merged)

Marius - I'll needinfo you, perhaps you can help me with this testing when you have some time, and we can figure out if the partial-backport approach is viable?

We could backport the entire series but I'm a little wary as it involves a lot of changes and the problem where a service spans multiple roles is probably a less common case than the one reported in this bug?

Comment 11 Marius Cornea 2019-01-28 21:18:40 UTC
(In reply to Steven Hardy from comment #10)
> Ok I pushed https://review.openstack.org/#/c/625078/ which aims to resolve
> the (2) problem from comment #9 - I think that is the root-cause of this bug
> and (1) is a related issue which may not be a blocker for this particular
> configuration.
> 
> Currently need further testing/review of
> https://review.openstack.org/#/c/625078/ but if that works as expected we
> may be able to backport that and the tripleo-common patch
> https://review.openstack.org/#/c/605046/ idependent of the (fairly long)
> series of other patches which address (1) ref
> https://review.openstack.org/#/q/topic:bug/1792613+(status:open+OR+status:
> merged)
> 
> Marius - I'll needinfo you, perhaps you can help me with this testing when
> you have some time, and we can figure out if the partial-backport approach
> is viable?
> 

Hey Steve! I tested the rocky backport(https://review.openstack.org/#/c/632714/) and it allowed the deployment to complete successfully.

Comment 12 Lukas Bezdicka 2019-02-04 13:41:57 UTC
*** Bug 1668641 has been marked as a duplicate of this bug. ***

Comment 27 errata-xmlrpc 2019-07-02 20:08:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1672


Note You need to log in before you can comment on or make changes to this bug.