Updated description: We should mention in docs that if Multistack EDGE DCN deployment is deployed with TLS-Everywhere, the DCN stacks should not be deployed with following templates in the deploy command line: -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/haproxy-public-tls-certmonger.yaml \ The Public TLS endpoints are not needed to be created and actually It causes the deployment to fail. Description of problem: Deployment of DCN with Distributed multibackend storage and TLS-E (tripleo-ipa) fails to deploy DCN site on: <LOG> fatal: [dcn1-computehciscaleout1-0]: FAILED! => {"ansible_job_id": "552398882008.22056", "attempts": 11, "changed": true, "cmd": "set -o pipefail; puppet apply --debug --verbose --modulepath=/etc/puppet/modules: /opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --detailed-exitcodes --summarize --color=false /var/lib/tripleo-config/puppet_step_config.pp 2>&1 | logger -s -t puppet-user", "delta": "0:00:35.08 2779", "end": "2020-08-14 08:25:17.356108", "failed_when_result": true, "finished": 1, "msg": "non-zero return code", "rc": 6, "start": "2020-08-14 08:24:42.273329", "stderr": "<13>Aug 14 08:24:42 puppet-user: D ebug: Runtime environment: puppet_version=5.5.10, ruby_version=2.5.5, run_mode=user, default_encoding=UTF-8 ...skipped log ... <13>Aug 13 21:14:32 puppet-user: Debug: Issuing getcert command with args: [\"request\", \"-I\", \"haproxy-external-cert\", \"-f\", \"/etc/pki/tls/certs/haproxy/overcloud-haproxy-external.crt\", \"-c\", \"IPA\", \ \"-N\", \"CN=overcloud.redhat.local\", \"-K\", \"haproxy/overcloud.redhat.local\", \"-D\", \"overcloud.redhat.local\", \"-U\", \"id-kp-clientAuth\", \"-U\", \"id-kp-serverAuth\", \"-C\", \"/usr/bin/certmonger-h aproxy-refresh.sh reload external\", \"-w\", \"-k\", \"/etc/pki/tls/private/haproxy/overcloud-haproxy-external.key\"] <13>Aug 13 21:14:32 puppet-user: Debug: Executing: '/usr/bin/getcert request -I haproxy-external-cert -f /etc/pki/tls/certs/haproxy/overcloud-haproxy-external.crt -c IPA -N CN=overcloud.redhat.local -K haproxy/o vercloud.redhat.local -D overcloud.redhat.local -U id-kp-clientAuth -U id-kp-serverAuth -C /usr/bin/certmonger-haproxy-refresh.sh reload external -w -k /etc/pki/tls/private/haproxy/overcloud-haproxy-external.key <13>Aug 13 21:14:33 puppet-user: Warning: Could not get certificate: Execution of '/usr/bin/getcert request -I haproxy-external-cert -f /etc/pki/tls/certs/haproxy/overcloud-haproxy-external.crt -c IPA -N CN=over cloud.redhat.local -K haproxy/overcloud.redhat.local -D overcloud.redhat.local -U id-kp-clientAuth -U id-kp-serverAuth -C /usr/bin/certmonger-haproxy-refresh.sh reload external -w -k /etc/pki/tls/private/haproxy /overcloud-haproxy-external.key' returned 2: New signing request \"haproxy-external-cert\" added. <13>Aug 13 21:14:33 puppet-user: Debug: Executing: '/usr/bin/getcert list -i haproxy-external-cert' <13>Aug 13 21:14:34 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Haproxy[haproxy-external]/Certmonger_certificate[haproxy-external-cert]: Could not evaluate: Could not get certificate: Server at https://site-freeipa-0.redhat.local/ipa/xml denied our request, giving up: 2100 (RPC failed at server. Insufficient access: Insufficient 'write' privilege to the 'userCertificat e' attribute of entry 'krbprincipalname=haproxy/overcloud.redhat.local,cn=services,cn=accounts,dc=redhat,dc=local'.). </LOG> The failing command on dcn1-computehciscaleout1-0 is: /usr/bin/getcert request -I haproxy-external-cert -f /etc/pki/tls/certs/haproxy/overcloud-haproxy-external.crt -c IPA -N CN=over cloud.redhat.local -K haproxy/overcloud.redhat.local -D overcloud.redhat.local -U id-kp-clientAuth -U id-kp-serverAuth -C /usr/bin/certmonger-haproxy-refresh.sh reload external -w -k /etc/pki/tls/private/haproxy /overcloud-haproxy-external.key with error: Could not get certificate: Server at https://site-freeipa-0.redhat.local/ipa/xml denied our request, giving up: 2100 (RPC failed at server. Insufficient access: Insufficient 'write' privilege to the 'userCertificat e' attribute of entry 'krbprincipalname=haproxy/overcloud.redhat.local,cn=services,cn=accounts,dc=redhat,dc=local'.). I tried to google for a solution and I can get request cert If I do following commands: [heat-admin@dcn1-computehciscaleout1-0 ~]$ ipa service-add-host --hosts=dcn1-computehciscaleout1-0.redhat.local haproxy/overcloud.redhat.local [heat-admin@dcn1-computehciscaleout1-0 ~]$ sudo ipa-getcert resubmit -i haproxy-external-cert and then successful issued cert: [heat-admin@dcn1-computehciscaleout1-0 ~]$ sudo /usr/bin/getcert list -i haproxy-external-cert Number of certificates and requests being tracked: 11. Request ID 'haproxy-external-cert': status: MONITORING stuck: no key pair storage: type=FILE,location='/etc/pki/tls/private/haproxy/overcloud-haproxy-external.key' certificate: type=FILE,location='/etc/pki/tls/certs/haproxy/overcloud-haproxy-external.crt' CA: IPA issuer: CN=Certificate Authority,O=REDHAT.LOCAL subject: CN=overcloud.redhat.local,O=REDHAT.LOCAL expires: 2022-08-14 22:47:09 UTC dns: overcloud.redhat.local principal name: haproxy/overcloud.redhat.local key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment eku: id-kp-serverAuth,id-kp-clientAuth pre-save command: post-save command: /usr/bin/certmonger-haproxy-refresh.sh reload external track: yes auto-renew: yes I do not have clear idea about how this type of topology is deployed and what the scaleout node is for bit this is the deploy command line: openstack overcloud deploy \ --timeout 240 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack dcn1 \ --libvirt-type kvm \ --ntp-server clock1.rdu2.redhat.com \ -e /usr/share/openstack-tripleo-heat-templates/environments/dcn-hci.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/net-multiple-nics.yaml \ -e /home/stack/dcn1/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -n /home/stack/dcn1/network/network_data.yaml \ -r /home/stack/dcn1/roles/roles_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovs.yaml \ -e /home/stack/dcn1/network/network-environment.yaml \ -e /home/stack/dcn1/enable-tls.yaml \ -e /home/stack/dcn1/inject-trust-anchor.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/dcn1/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/dcn1/dcn1_ceph_keys.yaml \ -e /home/stack/dcn1/nodes_data.yaml \ -e /home/stack/dcn1/debug.yaml \ -e /home/stack/dcn1/docker-images.yaml \ -e /home/stack/dcn1/glance.yaml \ -e /home/stack/central_ceph_external.yaml \ -e /home/stack/central-export.yaml \ -e /home/stack/dcn1/config_heat.yaml \ -e ~/containers-prepare-parameter.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-everywhere-endpoints-dns.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-internal-tls.yaml \ -e /home/stack/dcn1/cloud-names.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/haproxy-public-tls-certmonger.yaml \ -e /home/stack/dcn1/ipaservices-baremetal-ansible.yaml \ --log-file dcn1_overcloud_deployment_22.log More info about how to deploy such env: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/distributed_compute_node_and_storage_deployment/assembly_deploying-storage-at-the-edge How reproducible: Always Steps to Reproduce: 1. Deploy DCN topology with Distributed Multibackend storage on the DCN site and TLS-Everywhere deployed by tripleo-ipa mode. Actual results: DCN site fails to deploy Expected results: Successful deployment Additional info: I am submitting under ansible-tripleo-ipa only as a placeholder for a triage, no idea which component to choose I will provide more info about logs and env on comments.
The workaround you found points to the fact that the relevant haproxy service was not added to IPA for the host. This is supposed to be done when tripleo_ipa is invoked, based on the service_metadata for the role. Looking at the group_vars for the node, I see the following: group_vars/ComputeHCIScaleOut1: service_metadata_settings: compact_service_etcd: - internalapi compact_service_haproxy: - ctlplane - storage - storagemgmt - internalapi compact_service_libvirt: - internalapi compact_service_libvirt-vnc: - internalapi compact_service_qemu: - internalapi managed_service_haproxyctlplane: haproxy/overcloud.ctlplane.redhat.local managed_service_haproxyinternal_api: haproxy/overcloud.internalapi.redhat.local managed_service_haproxystorage: haproxy/overcloud.storage.redhat.local managed_service_haproxystorage_mgmt: haproxy/overcloud.storagemgmt.redhat.local Compare this to the one for the controller at the central site: central/Controller0: service_metadata_settings: compact_service_HTTP: - ctlplane - external - storage - storagemgmt - internalapi compact_service_haproxy: - ctlplane - storage - storagemgmt - internalapi compact_service_libvirt-vnc: - internalapi compact_service_mysql: - internalapi compact_service_neutron: - internalapi compact_service_novnc-proxy: - internalapi compact_service_rabbitmq: - internalapi managed_service_haproxyctlplane: haproxy/overcloud.ctlplane.redhat.local managed_service_haproxyexternal: haproxy/overcloud.redhat.local managed_service_haproxyinternal_api: haproxy/overcloud.internalapi.redhat.local managed_service_haproxystorage: haproxy/overcloud.storage.redhat.local managed_service_haproxystorage_mgmt: haproxy/overcloud.storagemgmt.redhat.local managed_service_mysqlinternal_api: mysql/overcloud.internalapi.redhat.local The important part that is missing in the service metadata for ComputeHCIScaleOut1 is: managed_service_haproxyexternal: haproxy/overcloud.redhat.local The addition of that metadata would result in the service being added. We'd need to look to see why that is not being added.
That metadata seems to be defined in ./deployment/haproxy/haproxy-public-tls-certmonger.yaml, which is referenced in /usr/share/openstack-tripleo-heat-templates/environments/services/haproxy-public-tls-certmonger.yaml, which is in the deploy script for DCN. So not sure why its not being included in the metadata. Maybe coz external network not defined there? bandini --- any thoughts?
If I recall correctly while I was playing with IPv6+TLS-E, I had noticed that by default freeipa accepts certificate requests only from hosts that have an IP address within any subnets where FreeIPA has an IP configured and will actively deny other requests (also DNS requests). I wonder if that could be related as well. Marian do you have an env with this issue somewhere Ade and I can poke at?
(In reply to Michele Baldessari from comment #4) > If I recall correctly while I was playing with IPv6+TLS-E, I had noticed > that by default freeipa accepts certificate requests only from hosts that > have an IP address within any subnets where FreeIPA has an IP configured and > will actively deny other requests (also DNS requests). I wonder if that > could be related as well. > > Marian do you have an env with this issue somewhere Ade and I can poke at? I do have a setup, feel free to ping me once you have time
(In reply to Ade Lee from comment #3) > That metadata seems to be defined in > ./deployment/haproxy/haproxy-public-tls-certmonger.yaml, > which is referenced in > /usr/share/openstack-tripleo-heat-templates/environments/services/haproxy- > public-tls-certmonger.yaml, which is > in the deploy script for DCN. > > So not sure why its not being included in the metadata. > Maybe coz external network not defined there? > > bandini --- any thoughts? If I specify external network to be used for ComputeHCIScaleOut1 role then the DCN stack gets deployed properly, It seems that service metadata for external haproxy service are (as you said) created here: https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/haproxy/haproxy-public-tls-certmonger.yaml#L81-L84 and PublicNetwork is: https://opendev.org/openstack/tripleo-heat-templates/src/commit/d58efb58e0c39b2ca1585d87fe6d542484b33ad0/network/service_net_map.j2.yaml#L80 So only created if external network exists. The question is now if external network should be added to the role for https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/roles/DistributedComputeHCI.yaml role or external haproxy service should not be created. I have no idea, maybe Alan could know?
At DCN (edge) sites, haproxy is only used on the internal_api network by the DistributedComputeScaleOut and DistributedComputeHCIScaleOut roles. That's so internal glance_api requests can be forwarded to the (internal) endpoints on the DistributedCompute (or DistributedComputeHCI) nodes. I think the issue is the metadata_settings [1] specify "service: haproxy," but at the DCN site the service is named "haproxy_edge" [2]. The service must be named differently at the DCN site to avoid mixing it up with the "haproxy" service running in the control plane. [1] https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/haproxy/haproxy-public-tls-certmonger.yaml#L81-L84 [2] https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/haproxy/haproxy-edge-container-puppet.yaml#L80 But I'm not sure the answer is figuring out a way to create metadata_settings for the haproxy_edge service. Given what I stated above, I'm not sure why DCN sites need anything related to public TLS. I'd be curious to know of things work if you dropped these two env files from the DCN site's deployment command: -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/haproxy-public-tls-certmonger.yaml \ But, now I fear we'll end up with a similar problem with the internal TLS stuff at [3] [3] https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/haproxy/haproxy-internal-tls-certmonger.j2.yaml#L100-L105 I don't know how this stuff works, so I'll let you folks digest this info and see where it leads next. If something is truly necessary for haproxy then I think the key is understanding the service is actually named haproxy_edge at DCN sites.
> But I'm not sure the answer is figuring out a way to create > metadata_settings for the haproxy_edge service. Given what I stated above, > I'm not sure why DCN sites need anything related to public TLS. I'd be > curious to know of things work if you dropped these two env files from the > DCN site's deployment command: > > -e > /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints- > public-ip.yaml \ > -e > /usr/share/openstack-tripleo-heat-templates/environments/services/haproxy- > public-tls-certmonger.yaml \ I was able to get it successfully deployed once I removed these two templates from DCN deploy command line, which is a little suprising to me. Anyway I am hitting another problem (not sure if related to the way how It is depoyed deployment, especially things discussed here) - It fails to create any instance on DCN site with following error: /var/log/containers/nova/nova-conductor.log:2020-08-27 02:30:01.722 21 WARNING nova.scheduler.utils [req-95f02425-49a0-4a92-8037-d9f4acf27b5f d0ed4b0b6cab45d98e76e4b3b061040d c4d8a45d49904b1c8c0f4115c3812e13 - default default] [instance: 00628f62-6048-4763-b35f-247bcea57804] Setting instance to ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 00628f62-6048-4763-b35f-247bcea57804. Last exception: SSL exception connecting to https://172.25.3.55:9292/v2/images/cd91190f-92cf-40b5-bf78-30a30cd9ee71: HTTPSConnectionPool(host='172.25.3.55', port=9292): Max retries exceeded with url: /v2/images/cd91190f-92cf-40b5-bf78-30a30cd9ee71 (Caused by SSLError(CertificateError("hostname '172.25.3.55' doesn't match 'dcn2-computehci2-1.internalapi.redhat.local'",),)) dcn2-computehci2-1.internalapi.redhat.local resolves as 172.25.3.55 and vice versa, but the used ssl cert has CN as dcn2-computehci2-1.internalapi.redhat.local but connection is made to the IP address 172.25.3.55? Should It try to connect to https://dcn2-computehci2-1.internalapi.redhat.local:9292/v2/images/cd91190f-92cf-40b5-bf78-30a30cd9ee71?
AFAIK (Ade should confirm) it should be using FQDN and not the IP address. But what's "it" in this instance? Is it a scale-out node, which is accessing glance via haproxy?
Just to record, The problem from comment #8 is a bug, Edge compute uses IP address instead of FQDN of glance endpoint on EDGE site. I created a clone of this bug for that problem. The problem with deployment of DCN stacks seems to be solved once templates for public TLS endpoints are not used, If it is used, deployment would fail, I am going to change this bug to docs to note this in our docs.
This information is published as of the release of 16.1.2 : https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/distributed_compute_node_and_storage_deployment/index#deploying_distributed_compute_node_architecture_with_tls_e