Description of problem: The following patch: ~~~~ Per-Role krb-service-principal for CompactServices Filter krb-service-principals for the CompactServices based on the networks associated with the role. Filtering for the IndividualServices was added in previous fix https://review.openstack.org/646005, which did'nt fully fix the bug. Closes-Bug: #1821377 Change-Id: Id54477ca5581e1f5fe8a09c3bc60a238d114dbb2 (cherry picked from commit 578bcb2ffa) tags/10.6.1 ~~~~ LINK: https://opendev.org/openstack/tripleo-heat-templates/commit/223ddba9137a3c9129fc33593db086518bf75a78?lang=en-US Contains additional filtering (as means to workaround nova metadata fields size limit: each field cannot exceed 256 bytes). However, the filtering might be a little too aggressive: l63: {%- for network in networks if network.vip|default(false) and network.name in role.networks %} ^^^^^^^^^^^^^^^^^^^^^^^^^^ $ cat network_data.yaml #CTL-MANAGEMENT - name: MgmtCtl name_lower: mgmtctl vip: false <<<<<<<<<<<<<<<<< ip_subnet: xxx.xxx.xxx.0/xx allocation_pools: [{'start': 'xxx.xxx.xxx.xxx', 'end': 'xxx.xxx.xxx.xxx'}] vlan: xxxxxxxx < snip > os-collect-config logs: Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: Could not get certificate: Execution of '/usr/bin/getcert request -I contrail -f /etc/contrail/ssl/certs/server.pem -c IPA -N CN=ctl1.xxxxxxxxxxxxxxxx -K contrail/ctl1.xxxxxxxxxxxxxxxx -D ctl1.ctlplane.xxxxxxxxxxxxxxxx -D ctl1.mgmtctl.xxxxxxxxxxxxxxxx -D < snip > -D -D -D -D -D -C sudo docker ps -q --filter=name=\"contrail*\" | xargs -i sudo docker restart {} -w -k /etc/contrail/ssl/private/server-privkey.pem' returned 3: New signing request \"contrail\" added.", Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Error: /Stage[main]/Tripleo::Certmonger::Contrail/Certmonger_certificate[contrail]: Could not evaluate: Could not get certificate: Server at https://idm.xxxxxxxxxxxxxxxx/ipa/xml failed request, will retry: 4001 (RPC failed at server. The service principal for subject alt name ctl1.mgmtctl.xxxxxxxxxxxxxxxx in certificate request does not exist).", Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: Could not get certificate: Execution of '/usr/bin/getcert request -I httpd-mgmtctl -f /etc/pki/tls/certs/httpd/httpd-mgmtctl.crt -c IPA -N CN=ctl1.mgmtctl.xxxxxxxxxxxxxxxx -K HTTP/ctl1.mgmtctl.xxxxxxxxxxxxxxxx -D ctl1.mgmtctlxxxxxxxxxxxxxxxxx -C pkill -USR1 httpd -w -k /etc/pki/tls/private/httpd/httpd-mgmtctl.key' returned 3: New signing request \"httpd-mgmtctl\" added.", Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Httpd[httpd-mgmtctl]/Certmonger_certificate[httpd-mgmtctl]: Could not evaluate: Could not get certificate: Server at https://idm.xxxxxxxxxxxxxxxx/ipa/xml failed request, will retry: 4001 (RPC failed at server. The host 'ctl1.mgmtctl.xxxxxxxxxxxxxxxx' does not exist to add a service to.).", Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: /Stage[main]/Tripleo::Certmonger::Ca::Crl/Exec[tripleo-ca-crl]: Skipping because of failed dependencies", Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: /Stage[main]/Tripleo::Certmonger::Ca::Crl/File[tripleo-ca-crl-file]: Skipping because of failed dependencies", Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: /Stage[main]/Tripleo::Certmonger::Ca::Crl/Exec[tripleo-ca-crl-process-command]: Skipping because of failed dependencies", Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: /Stage[main]/Tripleo::Certmonger::Ca::Crl/Cron[tripleo-refresh-crl-file]: Skipping because of failed dependencies" Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: ] Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: } Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx_playbook.retry Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: PLAY RECAP ********************************************************************* Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: localhost : ok=26 changed=13 unreachable=0 failed=1 Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: [2019-11-22 12:03:22,217] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-ansible/_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxplaybook.yaml. [2] Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: [2019-11-22 12:03:22,223] (heat-config) [INFO] Completed /usr/libexec/heat-config/hooks/ansible Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: [2019-11-22 12:03:22,224] (heat-config) [DEBUG] Running heat-config-notify /var/lib/heat-config/deployed/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.json < /var/lib/heat-config/deployed/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.notify.json Version-Release number of selected component (if applicable): RHOSP-13z9 How reproducible: Every time.
I have reproduced this issue with just the default managment network enabled: Dec 05 13:04:57 overcloud-controller-0.lab.example.com os-collect-config[9657]: "Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Httpd[httpd-management]/Certmonger_certificate[httpd-management]: Could not evaluate: Could not get certificat e: Server at https://idm.lab.example.com/ipa/xml failed request, will retry: 4001 (RPC failed at server. The host 'overcloud-controller-0.management.lab.example.com' does not exist to add a service to.).", I will test the fix mentioned in #c6[1] now. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1778719#c6
This fix was also included in the THT hotfix - https://bugzilla.redhat.com/show_bug.cgi?id=1752900 and it includes a fix for tripleo-common that adds the tripleo.nova.v1.cellv2_discovery workflow, see https://code.engineering.redhat.com/gerrit/#/c/183249/. Its likely they also need to include openstack-tripleo-common-8.7.1-5.el7ost when using the latest THT. Irina - can we get the version of tripleo-common they have installed? I believe that fix went out in the latest zstream release so its possible they already have it but most likely not.
Hi Bob, I think you have a point: [ipetrova@supportshell sosreport-director-ctl-02525557-2019-12-10-pjtltfq]$ grep tripleo installed-rpms ansible-tripleo-ipsec-8.1.1-0.20190513184007.7eb892c.el7ost.noarch Tue Dec 10 11:02:14 2019 openstack-tripleo-common-8.7.1-2.el7ost.noarch Tue Dec 10 11:02:30 2019 <<<<<<<<<<<<<<<<<<<<<<<<<< openstack-tripleo-common-containers-8.7.1-2.el7ost.noarch Tue Dec 10 11:02:14 2019 openstack-tripleo-heat-templates-8.4.1-23.el7ost.noarch Tue Dec 10 11:02:30 2019 openstack-tripleo-image-elements-8.0.3-1.el7ost.noarch Tue Dec 10 11:02:17 2019 openstack-tripleo-puppet-elements-8.1.1-1.el7ost.noarch Tue Dec 10 11:01:49 2019 openstack-tripleo-ui-8.3.2-3.el7ost.noarch Tue Dec 10 11:06:09 2019 openstack-tripleo-validations-8.5.0-2.el7ost.noarch Tue Dec 10 11:01:48 2019 puppet-tripleo-8.5.1-3.el7ost.noarch Tue Dec 10 11:02:24 2019 python-tripleoclient-9.3.1-4.el7ost.noarch Tue Dec 10 11:02:31 2019 I just checked in the portal (access.redhat.com) and v 8.7.1-2 seems to be the latest. Bob, can we get them that rpm as well as part of the HF?
From https://access.redhat.com/support/cases/#/case/02525557 pbabbar (Bug 1798669) Mon, Feb 24, 2020, 09:36:50 AM Eastern Standard Time ================================================== Moving it to verified as the overcloud was deployed successfully with 1 controller and 4 compute node and nova_cellv2_host_discover.yaml script only ran once on one of the compute host as per the logs below: (undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep openstack-tripleo-common openstack-tripleo-common-8.7.1-12.el7ost.noarch ()[root@controller-0 /]# rpm -qa | grep openstack-tripleo openstack-tripleo-common-container-base-8.7.1-12.el7ost.noarch (undercloud) [stack@undercloud-0 ~]$ tail -20 overcloud_install.log 2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.BlockStoragePostConfig]: CREATE_IN_PROGRESS state changed 2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.BlockStoragePostConfig]: CREATE_COMPLETE state changed 2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.ObjectStoragePostConfig]: CREATE_COMPLETE state changed 2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.CephStoragePostConfig]: CREATE_IN_PROGRESS state changed 2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.CephStoragePostConfig]: CREATE_COMPLETE state changed 2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_IN_PROGRESS state changed 2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE state changed 2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_IN_PROGRESS state changed 2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_COMPLETE state changed 2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE Stack CREATE completed successfully 2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE state changed 2020-02-21 21:30:52Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully Stack overcloud CREATE_COMPLETE Started Mistral Workflow tripleo.deployment.v1.get_horizon_url. Execution ID: ea20e76f-fc70-4c97-af44-119c9026acbe Overcloud Endpoint: http://10.0.0.109:5000/ Overcloud Horizon Dashboard URL: http://10.0.0.109:80/dashboard Overcloud rc file: /home/stack/overcloudrc Overcloud Deployed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0760