Bug 1843701
Summary: | novajoin based tls-e not creating DNS entries for overcoud nodes | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alan Bishop <abishop> |
Component: | openstack-tripleo-heat-templates | Assignee: | Ade Lee <alee> |
Status: | CLOSED WONTFIX | QA Contact: | David Rosenfeld <drosenfe> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 16.1 (Train) | CC: | gcharot, hrybacki, mburns, spower |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-16 16:14:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alan Bishop
2020-06-03 21:21:14 UTC
Alan, can you provide us with access to a reproducer environment or sosreports and details on how to reproduce this? Thanks! Hi Harry, You should be able to reproduce this in an overcloud deployment that includes two customizations: 1. Deploy the cinder-volume service in active/active mode 2. Set EnableEtcdInternalTLS: True (it defaults to False) For #1 you can include $THT/environments/cinder-volume-active-active.yaml. Here's the full contents of that file: resource_registry: # For A/A mode, do not run the cinder-volume service under pacemaker. OS::TripleO::Services::CinderVolume: ../deployment/cinder/cinder-volume-container-puppet.yaml # Cinder requires etcd for use as its Distributed Lock Manager (DLM). OS::TripleO::Services::Etcd: ../deployment/etcd/etcd-container-puppet.yaml parameter_defaults: CinderVolumeCluster: tripleo Just add #2 to any other env file. That should be sufficient to trigger the problem, which results in a failure like this in the ansible.log: <13>Mar 10 13:15:31 puppet-user: Notice: /Stage[main]/Tripleo::Certmonger::Etcd/Certmonger_certificate [etcd]/dnsname: dnsname changed ['controller-no-ceph-0.internalapi.redhat.local'] to ['controller-no-ceph-0.internalapi.redhat.lo cal', '172.17.1.15']", "<13>Mar 10 13:15:31 puppet-user: Debug: Executing: '/usr/bin/getcert resubmit -i etcd -f /etc/pki/tls/cer ts/etcd.crt -c IPA -N CN=controller-no-ceph-0.internalapi.redhat.local -K etcd/controller-no-ceph-0.internalapi.redhat.local -D c ontroller-no-ceph-0.internalapi.redhat.local -A 172.17.1.15 -w'", "<13>Mar 10 13:15:31 puppet-user: Debug: Executing: '/usr/bin/g etcert list -i etcd'", "<13>Mar 10 13:15:31 puppet-user: Error: /Stage[main]/Tripleo::Certmonger::Etcd/Certmonger_certificate[etc d]: Could not evaluate: Could not get certificate: Server at https://freeipa-0.redhat.local/ipa/xml denied our request, giving up : 3009 (RPC failed at server. invalid 'csr': IP address in subjectAltName (172.17.1.15) unreachable from DNS names).", "<13>Mar 10 13:15:31 puppet-user: Notice: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/certs/etcd.crt]: Dependency Certmonger_ certificate[etcd] has failures: true", "<13>Mar 10 13:15:31 puppet-user: Warning: /Stage[main]/Tripleo::Certmonger::Etcd/File[/et c/pki/tls/certs/etcd.crt]: Skipping because of failed dependencies", "<13>Mar 10 13:15:31 puppet-user: Debug: /Stage[main]/Triple o::Certmonger::Etcd/File[/etc/pki/tls/certs/etcd.crt]: Resource is being skipped, unscheduling all events", "<13>Mar 10 13:15:31 puppet-user: Warning: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/private/etcd.key]: Skipping because of failed depe ndencies", "<13>Mar 10 13:15:31 puppet-user: Debug: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/private/etcd.key]: R esource is being skipped, unscheduling all events", "<13>Mar 10 13:15:31 puppet-user: Debug: Class[Tripleo::Certmonger::Etcd]: Re source is being skipped, unscheduling all events The key part is this, and that IP is for the controller-no-ceph-0.internalapi.redhat.local node. "<13>Mar 10 13:15:31 puppet-user: Error: /Stage[main]/Tripleo::Certmonger::Etcd/Certmonger_certificate[etc d]: Could not evaluate: Could not get certificate: Server at https://freeipa-0.redhat.local/ipa/xml denied our request, giving up : 3009 (RPC failed at server. invalid 'csr': IP address in subjectAltName (172.17.1.15) unreachable from DNS names)." The motivation behind this BZ is to allow us to set EnableEtcdInternalTLS: True by default, so that we always use TLS with cinder A/A by default, even if we use the old (novajoin) way of setting up TLS. The reasoning is that if we just add the call to the ansible code to generates the dns entries (https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/ipa/ipaservices-baremetal-ansible.yaml#L109-L122) to the old composable service that does tls-e (https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/ipa/ipaclient-baremetal-ansible.yaml), then the DNS entries will just be created. The problem is that in order for the DNS code to work, some new permissions needed to be added to the IPA role for the novajoin user, as well as a basic permission -- (which is on by default in newer versions of IPA server). Adding the above requires ipa admin privs - so this is a separate migration step that would need to be performed before the deployment starts. So we need some kind of pre-upgrade step to confirm that those steps have been performed. Given that some manual upgrade step may be required in any case, and that we want to encourage people to move to the new tls-e nechanism, does it make sense to add this functionality to the old novajoin way of doing things? Based on discussions of priority, we have decided to only support the addition of DNS entries with the new way of deploying TLS-E (tripleo-ipa). Please reopen if priorities change. |