Bug 1804079

Summary: TLS Everywhere fails with DCN: cinder active/active with etcd fails during certificate creation
Product: Red Hat OpenStack Reporter: Sadique Puthen <sputhenp>
Component: puppet-tripleoAssignee: Alan Bishop <abishop>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: high Docs Contact:
Priority: medium    
Version: 16.0 (Train)CC: abishop, acanan, alee, amcleod, gcharot, gfidente, ggrasza, jjoyce, johfulto, jschluet, ltoscano, mburns, pgrist, slinaber, spower, tshefi, tvignaud
Target Milestone: gaKeywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-11.5.0-0.20200611115535.e86dd81.el8ost openstack-tripleo-heat-templates-11.3.2-0.20200616081529.396affd.el8ost Doc Type: Bug Fix
Doc Text:
Before this update, the etcd service was not configured properly to run in a container. As a result, an error occurred when the service tried to create the TLS certificate. With this update, the etcd service runs in a container and can create the TLS certificate.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-29 07:50:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ansible.log none

Description Sadique Puthen 2020-02-18 07:57:14 UTC
Description of problem:

Trying to deploy an HCI edge location using a separate stack with cinder active/active and etcd and TLS everywhere fails with below error message.

07:28:21 puppet-user: Error: Could not find user etcd\n<13>Feb 18 07:28:21 puppet-user: Error: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/certs/etcd.crt]/owner: change from 'root' to 'etcd' failed: Could not find user etcd\n<13>Feb 18 07:28:21 puppet-user: Error: Could not find group etcd\n<13>Feb 18 07:28:21 puppet-user: Error: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/certs/etcd.crt]/group: change from 'root' to 'etcd' failed: Could not find group etcd\n<13>Feb 18 07:28:21 puppet-user: Error: Could not find user etcd\n<13>Feb 18 07:28:21 puppet-user: Error: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/private/etcd.key]/owner: change from 'root' to 'etcd' failed: Could not find user etcd\n<13>Feb 18 07:28:21 puppet-user: Error: Could not find group etcd\n<13>Feb 18 07:28:21 puppet-user: Error: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/private/etcd.key]/group: change from 'root' to 'etcd' failed: Could not find group etcd

Template used for edge stack/location can be found here. https://gitlab.cee.redhat.com/sputhenp/openstack/blob/master/basic/templates/osp-16/edge-1/overcloud-deploy-edge-1-tls-everywhere.sh

Templates used for central stack  can be found here https://gitlab.cee.redhat.com/sputhenp/openstack/blob/master/basic/templates/osp-16/overcloud-deploy-tls-everywhere.sh

Attaching ansible.log

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Sadique Puthen 2020-02-18 07:59:19 UTC
Created attachment 1663665 [details]
ansible.log

Comment 2 Alan Bishop 2020-02-18 16:59:39 UTC
This is one of several issues identified with cinder using etcd for its DLM when running active/active. See bug #1792477 (this BZ is item 1. in that BZ's description).

While I know how to fix the ownership problem, there are several more layers to the overall problem, and it's not clear whether cinder will be able to use etcd for its DLM. I'll take this BZ for now.

Comment 9 Ade Lee 2020-04-14 19:58:16 UTC
The security DFG side of this  -- that is -- adding the ability to add DNS entries to the IPA server is being tracked here:

https://bugzilla.redhat.com/show_bug.cgi?id=1823932

Comment 10 Alan Bishop 2020-06-17 20:33:12 UTC
The original problem was an issue with the puppet-tripleo code responsible for creating the etcd cert. That code was fixed a while ago, but several fixes and enhancements in other areas were required for full tls-e support. To keep the focus on this BZ, I'm stating the puppet-tripleo code is now working correctly.

Bear in mind that testing the fix requires the following
- Deploy cinder in A/A mode
- Deploy tls-e using tripleo-ipa (see bug #1823932), or wait until bug #1843701 is fixed and use novajoin
- Deploy with EnableEtcdInternalTLS set True

Comment 18 Paul Grist 2020-07-08 22:38:26 UTC
*** Bug 1792477 has been marked as a duplicate of this bug. ***

Comment 20 Tzach Shefi 2020-07-13 07:21:34 UTC
Verified on:
puppet-tripleo-11.5.0-0.20200616033427.8ff1c6a.el8ost.noarch

Following a TLS-everywhere DCN with Cinder A/A deployment, everything is TLS including Cinder A/A. 

Deployment details, proving TLS is enabled:


Overcloud_deploy.sh -.  showing only TLS related bits, other lines were removed for simplicity.
[stack@site-undercloud-0 ~]$ cat overcloud_deploy.sh
#!/bin/bash
openstack overcloud deploy \
-e /home/stack/central/enable-tls.yaml \
-e /home/stack/central/inject-trust-anchor.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-everywhere-endpoints-dns.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services/haproxy-public-tls-certmonger.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-internal-tls.yaml \

Same for DCN site:
[stack@site-undercloud-0 ~]$ cat overcloud_dcn1.sh
source /home/stack/stackrc
sudo cp overcloud_deploy.sh overcloud_deploy_dcn1.sh
sudo cp /home/stack/central/enable-tls.yaml /home/stack/dcn1/enable-tls.yaml   -> tls enabled on DCN. 
..


Confirm Cinder A/A

(dcn1) [stack@site-undercloud-0 ~]$ cinder service-list
+------------------+------------------------------------+---------+---------+-------+
| Binary           | Host                               | Zone    | Status  | State | 
+------------------+------------------------------------+---------+---------+-------+
| cinder-scheduler | central-controller0-0.redhat.local | nova    | enabled | up    | 
| cinder-scheduler | central-controller0-1.redhat.local | nova    | enabled | up    | 
| cinder-scheduler | central-controller0-2.redhat.local | nova    | enabled | up    |
| cinder-volume    | dcn1-computehci1-0@tripleo_ceph    | az-dcn1 | enabled | up    |
| cinder-volume    | dcn1-computehci1-1@tripleo_ceph    | az-dcn1 | enabled | up    |
| cinder-volume    | dcn1-computehci1-2@tripleo_ceph    | az-dcn1 | enabled | up    | 
| cinder-volume    | dcn2-computehci2-0@tripleo_ceph    | az-dcn2 | enabled | up    |
| cinder-volume    | dcn2-computehci2-1@tripleo_ceph    | az-dcn2 | enabled | up    |
| cinder-volume    | dcn2-computehci2-2@tripleo_ceph    | az-dcn2 | enabled | up    |
| cinder-volume    | hostgroup@tripleo_iscsi            | nova    | enabled | up    |


Cinder endpoint of both central and DCN are are https:
(central) [stack@site-undercloud-0 ~]$ openstack endpoint list | grep cinder
| 6bb513a8310d4b32aebe51a75a421d00 | regionOne | cinderv3     | volumev3       | True    | public    | https://overcloud.redhat.local:13776/v3/%(tenant_id)s             |
| dd6137b413b74a4dbc25c5bec4ef3c8f | regionOne | cinderv3     | volumev3       | True    | admin     | https://overcloud.internalapi.redhat.local:8776/v3/%(tenant_id)s  |
| e343a6b884d447698159a346a0ddd1d4 | regionOne | cinderv2     | volumev2       | True    | admin     | https://overcloud.internalapi.redhat.local:8776/v2/%(tenant_id)s  |
| e62845df63844a998edfd83b292f5c98 | regionOne | cinderv3     | volumev3       | True    | internal  | https://overcloud.internalapi.redhat.local:8776/v3/%(tenant_id)s  |
| e6a592920ecd4f9393d90849b5b65095 | regionOne | cinderv2     | volumev2       | True    | public    | https://overcloud.redhat.local:13776/v2/%(tenant_id)s             |
| ed44a6da498c42d4b1fd2c7b513a788b | regionOne | cinderv2     | volumev2       | True    | internal  | https://overcloud.internalapi.redhat.local:8776/v2/%(tenant_id)s  |


(dcn1) [stack@site-undercloud-0 ~]$ openstack endpoint list | grep cinder
| 6bb513a8310d4b32aebe51a75a421d00 | regionOne | cinderv3     | volumev3       | True    | public    | https://overcloud.redhat.local:13776/v3/%(tenant_id)s             |
| dd6137b413b74a4dbc25c5bec4ef3c8f | regionOne | cinderv3     | volumev3       | True    | admin     | https://overcloud.internalapi.redhat.local:8776/v3/%(tenant_id)s  |
| e343a6b884d447698159a346a0ddd1d4 | regionOne | cinderv2     | volumev2       | True    | admin     | https://overcloud.internalapi.redhat.local:8776/v2/%(tenant_id)s  |
| e62845df63844a998edfd83b292f5c98 | regionOne | cinderv3     | volumev3       | True    | internal  | https://overcloud.internalapi.redhat.local:8776/v3/%(tenant_id)s  |
| e6a592920ecd4f9393d90849b5b65095 | regionOne | cinderv2     | volumev2       | True    | public    | https://overcloud.redhat.local:13776/v2/%(tenant_id)s             |
| ed44a6da498c42d4b1fd2c7b513a788b | regionOne | cinderv2     | volumev2       | True    | internal  | https://overcloud.internalapi.redhat.local:8776/v2/%(tenant_id)s  |

Same for dcn2 site:
(dcn2) [stack@site-undercloud-0 ~]$ openstack endpoint list | grep cinder
| 6bb513a8310d4b32aebe51a75a421d00 | regionOne | cinderv3     | volumev3       | True    | public    | https://overcloud.redhat.local:13776/v3/%(tenant_id)s             |
| dd6137b413b74a4dbc25c5bec4ef3c8f | regionOne | cinderv3     | volumev3       | True    | admin     | https://overcloud.internalapi.redhat.local:8776/v3/%(tenant_id)s  |
| e343a6b884d447698159a346a0ddd1d4 | regionOne | cinderv2     | volumev2       | True    | admin     | https://overcloud.internalapi.redhat.local:8776/v2/%(tenant_id)s  |
| e62845df63844a998edfd83b292f5c98 | regionOne | cinderv3     | volumev3       | True    | internal  | https://overcloud.internalapi.redhat.local:8776/v3/%(tenant_id)s  |
| e6a592920ecd4f9393d90849b5b65095 | regionOne | cinderv2     | volumev2       | True    | public    | https://overcloud.redhat.local:13776/v2/%(tenant_id)s             |
| ed44a6da498c42d4b1fd2c7b513a788b | regionOne | cinderv2     | volumev2       | True    | internal  | https://overcloud.internalapi.redhat.local:8776/v2/%(tenant_id)s  |


A basic cinder create works on DCN2
(dcn2) [stack@site-undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+-------+------+-------------+----------+-------------+
| ID                                   | Status    | Name  | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-------+------+-------------+----------+-------------+
| 25f0d715-0c49-4f64-b13a-3482ca4fa104 | available | test1 | 1    | tripleo     | false    |             |
+--------------------------------------+-----------+-------+------+-------------+----------+-------------+

We have automation job for TLS everywhere DCN Cinder A/A.
In fact I've used that very same job to deploy the above system.
Tempest volumes test are passing. 

Confirm we can deploy TLS-everywhere DCN Cinder A/A.

Comment 23 errata-xmlrpc 2020-07-29 07:50:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148