Bug 1843701 - novajoin based tls-e not creating DNS entries for overcoud nodes
Summary: novajoin based tls-e not creating DNS entries for overcoud nodes
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Ade Lee
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-03 21:21 UTC by Alan Bishop
Modified: 2020-07-16 19:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-16 16:14:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alan Bishop 2020-06-03 21:21:14 UTC
When registering overcloud nodes with the IPA server for TLS-everywhere, novajoin is not creating DNS entries for the overcloud nodes. A consequence is it prevents creating certs that contain a SAN IP (the cert request fails due to the lack of a DNS entry for the IP).

This issue affects the ability to configure tripleo's etcd service with tls-e, and cinder requires etcd in order to run in cinder-volume's active/active mode. The current workaround uses [1] to have etcd and its clients (e.g. cinder) not use TLS.

[1] https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/etcd/etcd-container-puppet.yaml#L49

Comment 1 Harry Rybacki 2020-06-04 12:47:46 UTC
Alan, can you provide us with access to a reproducer environment or sosreports and details on how to reproduce this? Thanks!

Comment 2 Alan Bishop 2020-06-04 17:00:58 UTC
Hi Harry,

You should be able to reproduce this in an overcloud deployment that includes two customizations:

1. Deploy the cinder-volume service in active/active mode
2. Set EnableEtcdInternalTLS: True (it defaults to False)

For #1 you can include $THT/environments/cinder-volume-active-active.yaml. Here's the full contents of that file:

resource_registry:
  # For A/A mode, do not run the cinder-volume service under pacemaker.
  OS::TripleO::Services::CinderVolume: ../deployment/cinder/cinder-volume-container-puppet.yaml
  # Cinder requires etcd for use as its Distributed Lock Manager (DLM).
  OS::TripleO::Services::Etcd: ../deployment/etcd/etcd-container-puppet.yaml

parameter_defaults:
  CinderVolumeCluster: tripleo

Just add #2 to any other env file. That should be sufficient to trigger the problem, which results in a failure like this in the ansible.log:

<13>Mar 10 13:15:31 puppet-user: Notice: /Stage[main]/Tripleo::Certmonger::Etcd/Certmonger_certificate
[etcd]/dnsname: dnsname changed ['controller-no-ceph-0.internalapi.redhat.local'] to ['controller-no-ceph-0.internalapi.redhat.lo
cal', '172.17.1.15']", "<13>Mar 10 13:15:31 puppet-user: Debug: Executing: '/usr/bin/getcert resubmit -i etcd -f /etc/pki/tls/cer
ts/etcd.crt -c IPA -N CN=controller-no-ceph-0.internalapi.redhat.local -K etcd/controller-no-ceph-0.internalapi.redhat.local -D c
ontroller-no-ceph-0.internalapi.redhat.local -A 172.17.1.15 -w'", "<13>Mar 10 13:15:31 puppet-user: Debug: Executing: '/usr/bin/g
etcert list -i etcd'", "<13>Mar 10 13:15:31 puppet-user: Error: /Stage[main]/Tripleo::Certmonger::Etcd/Certmonger_certificate[etc
d]: Could not evaluate: Could not get certificate: Server at https://freeipa-0.redhat.local/ipa/xml denied our request, giving up
: 3009 (RPC failed at server.  invalid 'csr': IP address in subjectAltName (172.17.1.15) unreachable from DNS names).", "<13>Mar 
10 13:15:31 puppet-user: Notice: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/certs/etcd.crt]: Dependency Certmonger_
certificate[etcd] has failures: true", "<13>Mar 10 13:15:31 puppet-user: Warning: /Stage[main]/Tripleo::Certmonger::Etcd/File[/et
c/pki/tls/certs/etcd.crt]: Skipping because of failed dependencies", "<13>Mar 10 13:15:31 puppet-user: Debug: /Stage[main]/Triple
o::Certmonger::Etcd/File[/etc/pki/tls/certs/etcd.crt]: Resource is being skipped, unscheduling all events", "<13>Mar 10 13:15:31 
puppet-user: Warning: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/private/etcd.key]: Skipping because of failed depe
ndencies", "<13>Mar 10 13:15:31 puppet-user: Debug: /Stage[main]/Tripleo::Certmonger::Etcd/File[/etc/pki/tls/private/etcd.key]: R
esource is being skipped, unscheduling all events", "<13>Mar 10 13:15:31 puppet-user: Debug: Class[Tripleo::Certmonger::Etcd]: Re
source is being skipped, unscheduling all events

The key part is this, and that IP is for the controller-no-ceph-0.internalapi.redhat.local node.

"<13>Mar 10 13:15:31 puppet-user: Error: /Stage[main]/Tripleo::Certmonger::Etcd/Certmonger_certificate[etc
d]: Could not evaluate: Could not get certificate: Server at https://freeipa-0.redhat.local/ipa/xml denied our request, giving up
: 3009 (RPC failed at server.  invalid 'csr': IP address in subjectAltName (172.17.1.15) unreachable from DNS names)."

Comment 5 Ade Lee 2020-06-30 18:03:37 UTC
The motivation behind this BZ is to allow us to set EnableEtcdInternalTLS: True  by default, so that we always use TLS with cinder A/A by default,
even if we use the old (novajoin) way of setting up TLS.

The reasoning is that if we just add the call to the ansible code to generates the dns entries (https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/ipa/ipaservices-baremetal-ansible.yaml#L109-L122) 
to the old composable service that does tls-e (https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/ipa/ipaclient-baremetal-ansible.yaml), then the DNS entries will just be created.

The problem is that in order for the DNS code to work, some new permissions needed to be added to the IPA role for the novajoin user, as well as a basic permission -- (which is on by default in newer versions of IPA server).
Adding the above requires ipa admin privs - so this is a separate migration step that would need to be performed before the deployment starts.

So we need some kind of pre-upgrade step to confirm that those steps have been performed.  Given that some manual upgrade step may be required in any case, and that we want to encourage people to move to the new tls-e
nechanism, does it make sense to add this functionality to the old novajoin way of doing things?

Comment 7 Ade Lee 2020-07-16 16:14:17 UTC
Based on discussions of priority, we have decided to only support the addition of DNS entries with the new way of deploying TLS-E (tripleo-ipa).
Please reopen if priorities change.


Note You need to log in before you can comment on or make changes to this bug.