Bug 1778719 - Filtering by VIP in /extraconfig/nova_metadata/krb-service-principals/role.role.j2.yaml breaks deployment
Summary: Filtering by VIP in /extraconfig/nova_metadata/krb-service-principals/role.ro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: All
OS: Linux
high
urgent
Target Milestone: z11
: 13.0 (Queens)
Assignee: Harald Jensås
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-02 11:38 UTC by Irina Petrova
Modified: 2023-12-15 17:00 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.4.1-23.el7ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-10 11:23:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1821377 0 None None None 2019-12-02 11:38:16 UTC
Launchpad 1854846 0 None None None 2019-12-02 17:57:29 UTC
OpenStack gerrit 697498 0 'None' MERGED Relax filtering in krb-service-principals jinja 2020-11-04 20:36:14 UTC
Red Hat Issue Tracker OSP-29385 0 None None None 2023-10-06 18:52:34 UTC
Red Hat Product Errata RHBA-2020:0760 0 None None None 2020-03-10 11:23:50 UTC

Description Irina Petrova 2019-12-02 11:38:16 UTC
Description of problem:

The following patch:

~~~~
Per-Role krb-service-principal for CompactServices

Filter krb-service-principals for the CompactServices
based on the networks associated with the role.

Filtering for the IndividualServices was added in previous
fix https://review.openstack.org/646005, which did'nt
fully fix the bug.

Closes-Bug: #1821377
Change-Id: Id54477ca5581e1f5fe8a09c3bc60a238d114dbb2
(cherry picked from commit 578bcb2ffa)

tags/10.6.1
~~~~

LINK: https://opendev.org/openstack/tripleo-heat-templates/commit/223ddba9137a3c9129fc33593db086518bf75a78?lang=en-US


Contains additional filtering (as means to workaround nova metadata fields size limit: each field cannot exceed 256 bytes). 

However, the filtering might be a little too aggressive: 

l63: {%- for network in networks if network.vip|default(false) and network.name in role.networks %} 
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^

$ cat network_data.yaml
 
#CTL-MANAGEMENT
- name: MgmtCtl
  name_lower: mgmtctl
  vip: false                 <<<<<<<<<<<<<<<<<
  ip_subnet: xxx.xxx.xxx.0/xx
  allocation_pools: [{'start': 'xxx.xxx.xxx.xxx', 'end': 'xxx.xxx.xxx.xxx'}]
  vlan: xxxxxxxx
 
< snip > 


os-collect-config logs:

Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: Could not get certificate: Execution of '/usr/bin/getcert request -I contrail -f /etc/contrail/ssl/certs/server.pem -c IPA -N CN=ctl1.xxxxxxxxxxxxxxxx -K contrail/ctl1.xxxxxxxxxxxxxxxx -D ctl1.ctlplane.xxxxxxxxxxxxxxxx -D ctl1.mgmtctl.xxxxxxxxxxxxxxxx -D < snip > -D  -D  -D  -D  -D  -C sudo docker ps -q --filter=name=\"contrail*\" | xargs -i sudo docker restart {} -w -k /etc/contrail/ssl/private/server-privkey.pem' returned 3: New signing request \"contrail\" added.",
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Error: /Stage[main]/Tripleo::Certmonger::Contrail/Certmonger_certificate[contrail]: Could not evaluate: Could not get certificate: Server at https://idm.xxxxxxxxxxxxxxxx/ipa/xml failed request, will retry: 4001 (RPC failed at server.  The service principal for subject alt name ctl1.mgmtctl.xxxxxxxxxxxxxxxx in certificate request does not exist).",
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: Could not get certificate: Execution of '/usr/bin/getcert request -I httpd-mgmtctl -f /etc/pki/tls/certs/httpd/httpd-mgmtctl.crt -c IPA -N CN=ctl1.mgmtctl.xxxxxxxxxxxxxxxx -K HTTP/ctl1.mgmtctl.xxxxxxxxxxxxxxxx -D ctl1.mgmtctlxxxxxxxxxxxxxxxxx -C pkill -USR1 httpd -w -k /etc/pki/tls/private/httpd/httpd-mgmtctl.key' returned 3: New signing request \"httpd-mgmtctl\" added.",
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Httpd[httpd-mgmtctl]/Certmonger_certificate[httpd-mgmtctl]: Could not evaluate: Could not get certificate: Server at https://idm.xxxxxxxxxxxxxxxx/ipa/xml failed request, will retry: 4001 (RPC failed at server.  The host 'ctl1.mgmtctl.xxxxxxxxxxxxxxxx' does not exist to add a service to.).",
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: /Stage[main]/Tripleo::Certmonger::Ca::Crl/Exec[tripleo-ca-crl]: Skipping because of failed dependencies",
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: /Stage[main]/Tripleo::Certmonger::Ca::Crl/File[tripleo-ca-crl-file]: Skipping because of failed dependencies",
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: /Stage[main]/Tripleo::Certmonger::Ca::Crl/Exec[tripleo-ca-crl-process-command]: Skipping because of failed dependencies",
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: "Warning: /Stage[main]/Tripleo::Certmonger::Ca::Crl/Cron[tripleo-refresh-crl-file]: Skipping because of failed dependencies"
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: ]
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: }
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx_playbook.retry
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: PLAY RECAP *********************************************************************
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: localhost                  : ok=26   changed=13   unreachable=0    failed=1
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: [2019-11-22 12:03:22,217] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-ansible/_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxplaybook.yaml. [2]
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: [2019-11-22 12:03:22,223] (heat-config) [INFO] Completed /usr/libexec/heat-config/hooks/ansible
Nov 22 12:03:22 ctl1.xxxxxxxxxxxxxxxx os-collect-config[11623]: [2019-11-22 12:03:22,224] (heat-config) [DEBUG] Running heat-config-notify /var/lib/heat-config/deployed/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.json < /var/lib/heat-config/deployed/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.notify.json

                                                                                                                                                                                                                                                                                                                                                                  

Version-Release number of selected component (if applicable):
RHOSP-13z9

How reproducible:
Every time.

Comment 8 Harald Jensås 2019-12-05 13:16:14 UTC
I have reproduced this issue with just the default managment network enabled:

Dec 05 13:04:57 overcloud-controller-0.lab.example.com os-collect-config[9657]: "Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Httpd[httpd-management]/Certmonger_certificate[httpd-management]: Could not evaluate: Could not get certificat
e: Server at https://idm.lab.example.com/ipa/xml failed request, will retry: 4001 (RPC failed at server.  The host 'overcloud-controller-0.management.lab.example.com' does not exist to add a service to.).",


I will test the fix mentioned in #c6[1] now.



[1] https://bugzilla.redhat.com/show_bug.cgi?id=1778719#c6

Comment 15 Bob Fournier 2019-12-10 13:57:05 UTC
This fix was also included in the THT hotfix - https://bugzilla.redhat.com/show_bug.cgi?id=1752900 and it includes a fix for tripleo-common that adds the 
tripleo.nova.v1.cellv2_discovery workflow, see https://code.engineering.redhat.com/gerrit/#/c/183249/.  Its likely they also need to include openstack-tripleo-common-8.7.1-5.el7ost when using the latest THT.

Irina - can we get the version of tripleo-common they have installed?  I believe that fix went out in the latest zstream release so its possible they already have it but most likely not.

Comment 16 Irina Petrova 2019-12-10 14:45:34 UTC
Hi Bob,

I think you have a point:

[ipetrova@supportshell sosreport-director-ctl-02525557-2019-12-10-pjtltfq]$ grep tripleo installed-rpms 
ansible-tripleo-ipsec-8.1.1-0.20190513184007.7eb892c.el7ost.noarch Tue Dec 10 11:02:14 2019
openstack-tripleo-common-8.7.1-2.el7ost.noarch              Tue Dec 10 11:02:30 2019        <<<<<<<<<<<<<<<<<<<<<<<<<<
openstack-tripleo-common-containers-8.7.1-2.el7ost.noarch   Tue Dec 10 11:02:14 2019
openstack-tripleo-heat-templates-8.4.1-23.el7ost.noarch     Tue Dec 10 11:02:30 2019
openstack-tripleo-image-elements-8.0.3-1.el7ost.noarch      Tue Dec 10 11:02:17 2019
openstack-tripleo-puppet-elements-8.1.1-1.el7ost.noarch     Tue Dec 10 11:01:49 2019
openstack-tripleo-ui-8.3.2-3.el7ost.noarch                  Tue Dec 10 11:06:09 2019
openstack-tripleo-validations-8.5.0-2.el7ost.noarch         Tue Dec 10 11:01:48 2019
puppet-tripleo-8.5.1-3.el7ost.noarch                        Tue Dec 10 11:02:24 2019
python-tripleoclient-9.3.1-4.el7ost.noarch                  Tue Dec 10 11:02:31 2019

I just checked in the portal (access.redhat.com) and v 8.7.1-2 seems to be the latest. 

Bob, can we get them that rpm as well as part of the HF?

Comment 20 Bob Fournier 2020-03-04 22:04:50 UTC
From https://access.redhat.com/support/cases/#/case/02525557

pbabbar (Bug 1798669)
Mon, Feb 24, 2020, 09:36:50 AM Eastern Standard Time
==================================================
Moving it to verified as the overcloud was deployed successfully with 1 controller and 4 compute node and nova_cellv2_host_discover.yaml script only ran once on one of the compute host as per the logs below:

(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep openstack-tripleo-common
openstack-tripleo-common-8.7.1-12.el7ost.noarch
()[root@controller-0 /]# rpm -qa | grep openstack-tripleo
openstack-tripleo-common-container-base-8.7.1-12.el7ost.noarch

(undercloud) [stack@undercloud-0 ~]$ tail -20 overcloud_install.log 
2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.BlockStoragePostConfig]: CREATE_IN_PROGRESS  state changed
2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.BlockStoragePostConfig]: CREATE_COMPLETE  state changed
2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.ObjectStoragePostConfig]: CREATE_COMPLETE  state changed
2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.CephStoragePostConfig]: CREATE_IN_PROGRESS  state changed
2020-02-21 21:30:51Z [overcloud.AllNodesDeploySteps.CephStoragePostConfig]: CREATE_COMPLETE  state changed
2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_IN_PROGRESS  state changed
2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE  state changed
2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_IN_PROGRESS  state changed
2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_COMPLETE  state changed
2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  Stack CREATE completed successfully
2020-02-21 21:30:52Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  state changed
2020-02-21 21:30:52Z [overcloud]: CREATE_COMPLETE  Stack CREATE completed successfully

 Stack overcloud CREATE_COMPLETE 

Started Mistral Workflow tripleo.deployment.v1.get_horizon_url. Execution ID: ea20e76f-fc70-4c97-af44-119c9026acbe
Overcloud Endpoint: http://10.0.0.109:5000/
Overcloud Horizon Dashboard URL: http://10.0.0.109:80/dashboard
Overcloud rc file: /home/stack/overcloudrc
Overcloud Deployed

Comment 23 errata-xmlrpc 2020-03-10 11:23:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760


Note You need to log in before you can comment on or make changes to this bug.