Bug 1956152

Summary: [OSP16.1] default libvirt certicate CAs to /etc/ipa/ca.crt
Product: Red Hat OpenStack Reporter: shaju <shajuvk>
Component: openstack-tripleo-heat-templatesAssignee: Martin Schuppert <mschuppe>
Status: CLOSED ERRATA QA Contact: Archit Modi <amodi>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: alee, amodi, aschultz, dvd, emacchi, jpretori, mburns, mschuppe, pveiga, rdiwakar, rurena
Target Milestone: z7Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210628123306.29a02c1.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1957152 (view as bug list) Environment:
Last Closed: 2021-12-09 20:19:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1957152    
Attachments:
Description Flags
sosreport-controller none

Description shaju 2021-05-03 03:25:14 UTC
Created attachment 1778809 [details]
sosreport-controller

Description of problem:

Overcloud deployment failed at the controller and compute while deploying with FreeIPA.

Puppet using "/etc/pki/CA/certs/vnc.crt", instead of /etc/ipa/ca.crt

See the puppet code:
[root@overcloud-novacompute-0 heat-admin]# vi /etc/puppet/modules/tripleo/manifests/certmonger/ca/libvirt_vnc.pp
#
class tripleo::certmonger::ca::libvirt_vnc(
  $origin_ca_pem = undef,
  $certmonger_ca = hiera('certmonger_ca_vnc', 'local'),
){
  if $origin_ca_pem {
    $ensure_file = 'link'
  } else {
    $ensure_file = 'absent'
  }
  file { '/etc/pki/libvirt-vnc/ca-cert.pem':
    ensure => $ensure_file,
    mode   => '0644',
    target => $origin_ca_pem,
  }
 
For me it looks like an error – origin_ca_pem is provided by tht and it is /etc/pki/libvirt-vnc/ca-cert.pem'
So puppet tries to create symlink to same file w/o checking if origin_ca_pem is different.


and it looks for qemu cert similar situation.. the diff form vnc is just puppet doesn’t fail but .. symlink created is broken
[root@overcloud-novacompute-0 heat-admin]# ls -lh /etc/pki/qemu/ca-cert.pem
lrwxrwxrwx. 1 root root 26 May  1 16:47 /etc/pki/qemu/ca-cert.pem -> /etc/pki/CA/certs/qemu.pem
[root@overcloud-novacompute-0 heat-admin]# file /etc/pki/qemu/ca-cert.pem
/etc/pki/qemu/ca-cert.pem: broken symbolic link to /etc/pki/CA/certs/qemu.pem
 

======

fatal: [overcloud-controller-2]: FAILED! => {"ansible_job_id": "2169753008.25540", "attempts": 43, "changed": true, "cmd": "set -o pipefail; puppet apply  --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --detailed-exitcodes --summarize --color=false   /var/lib/tripleo-config/puppet_step_config.pp 2>&1 | logger -s -t puppet-user", "delta": "0:02:17.047808", "end": "2021-05-01 16:49:43.656355", "failed_when_result": true, "finished": 1, "msg": "non-zero return code", "rc": 6, "start": "2021-05-01 16:47:26.608547", "stderr": "<13>May  1 16:47:26 puppet-user: Warning: The function 'hiera' is deprecated in favor of using 'lookup'. See https://puppet.com/docs/puppet/5.5/deprecated_language.html\\n   (file & line not available)\n<13>May  1 16:47:32 puppet-user: Warning: /etc/puppet/hiera.yaml: Use of 'hiera.yaml' version 3 is deprecated. It should be converted to version 5\n<13>May  1 16:47:32 puppet-user:    (file: /etc/puppet/hiera.yaml)\n<13>May  1 16:47:32 puppet-user: Warning: Undefined variable '::deploy_config_name'; \\n   (file & line not available)\n<13>May  1 16:47:32 puppet-user: Warning: ModuleLoader: module 'tripleo' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\\n   (file & line not available)\n<13>May  1 16:47:32 puppet-user: Warning: Undefined variable '::nova::params::vncproxy_service_name'; class nova::params has not been evaluated\\n   (file & line not available)\n<13>May  1 16:47:32 puppet-user: Warning: ModuleLoader: module 'nova' has unresolved dependencies - it will only see those that are resolved. 

Version-Release number of selected component (if applicable):


How reproducible:

deploy osp16.1 with freeipa.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 shaju 2021-05-03 03:29:29 UTC
This is because the hiera value provided by openstack tht:
  InternalTLSVncCAFile:
    default: '/etc/pki/CA/certs/vnc.crt'
    type: string
    description: Specifies the CA cert to use for VNC TLS.

Comment 3 Martin Schuppert 2021-05-03 11:09:32 UTC
> For me it looks like an error – origin_ca_pem is provided by tht and it is /etc/pki/libvirt-vnc/ca-cert.pem'
> So puppet tries to create symlink to same file w/o checking if origin_ca_pem is different.

The libvirt_vnc origin_ca_pem is usually not /etc/pki/libvirt-vnc/ca-cert.pem, it is /etc/pki/CA/certs/vnc.crt:

[root@compute-0 ~]# grep origin_ca_pem /etc/puppet/hieradata/*
/etc/puppet/hieradata/service_configs.json:    "tripleo::certmonger::ca::libvirt::origin_ca_pem": "/etc/ipa/ca.crt",                                                                                                
/etc/puppet/hieradata/service_configs.json:    "tripleo::certmonger::ca::libvirt_vnc::origin_ca_pem": "/etc/pki/CA/certs/vnc.crt", 

and /etc/pki/libvirt-vnc/ca-cert.pem is a link to /etc/pki/CA/certs/vnc.crt:

[root@compute-0 ~]# ll /etc/pki/libvirt-vnc/ca-cert.pem
lrwxrwxrwx. 1 root root 25 May  3 10:21 /etc/pki/libvirt-vnc/ca-cert.pem -> /etc/pki/CA/certs/vnc.crt

CA cert /etc/pki/CA/certs/vnc.crt is retrieved from IPA using tripleo::certmonger::libvirt_vnc class:
[root@compute-0 ~]# grep -A 8 libvirt_vnc_certificates_specs /etc/puppet/hieradata/*
/etc/puppet/hieradata/service_configs.json:    "libvirt_vnc_certificates_specs": {
/etc/puppet/hieradata/service_configs.json-        "libvirt-vnc-server-cert": {
/etc/puppet/hieradata/service_configs.json-            "cacertfile": "/etc/pki/CA/certs/vnc.crt",
/etc/puppet/hieradata/service_configs.json-            "hostname": "%{hiera('fqdn_internal_api')}",
/etc/puppet/hieradata/service_configs.json-            "principal": "libvirt-vnc/%{hiera('fqdn_internal_api')}",
/etc/puppet/hieradata/service_configs.json-            "service_certificate": "/etc/pki/libvirt-vnc/server-cert.pem",
/etc/puppet/hieradata/service_configs.json-            "service_key": "/etc/pki/libvirt-vnc/server-key.pem"
/etc/puppet/hieradata/service_configs.json-        }
/etc/puppet/hieradata/service_configs.json-    },

From the logs it looks like getting the ca cert /etc/pki/CA/certs/vnc.crt failed and as a result the link /etc/pki/libvirt-vnc/ca-cert.pem was not created:
May  1 16:48:53 overcloud-controller-0 puppet-user[25952]: Error: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]
May  1 16:48:53 overcloud-controller-0 puppet-user[25952]: Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-client-cert]/Exec[/etc/pki/CA/certs/vnc.crt]/returns: change from 'notrun' to ['0'] failed: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]
May  1 16:48:53 overcloud-controller-0 puppet-user[25952]: Notice: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-client-cert]/File[/etc/pki/CA/certs/vnc.crt]: Dependency Exec[/etc/pki/CA/certs/vnc.crt] has failures: true
May  1 16:48:53 overcloud-controller-0 puppet-user[25952]: Warning: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-client-cert]/File[/etc/pki/CA/certs/vnc.crt]: Skipping because of failed dependencies
May  1 16:48:53 overcloud-controller-0 puppet-user[25952]: Warning: /Stage[main]/Tripleo::Certmonger::Ca::Libvirt_vnc/File[/etc/pki/libvirt-vnc/ca-cert.pem]: Skipping because of failed dependencies

How does the libvirt_vnc_certificates_specs puppet hieradata look like in this environment?

Comment 4 Ade Lee 2021-05-03 17:29:34 UTC
I looked at the logs on supportshell, and determined the following:

1.  The error is definitely that the CA cert is not being written by certmonger in time.
2.  The patch that Brendan points to : https://github.com/openstack/puppet-tripleo/commit/2c241e393481d73161b8534bbeba388731112cc7
    is already present, and in fact you can see that the puppet call to test for the existence of the file fails after 1 minute
    -- ie. 60 attempts one second apart.

<13>May  1 16:47:39 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Certmonger_certificate[libvirt-vnc-server-cert]/ensure: created
<13>May  1 16:48:40 puppet-user: Error: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]
<13>May  1 16:48:40 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Exec[/etc/pki/CA/certs/vnc.crt]/returns: change from 'notrun' to ['0'] failed: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]

3. The certmonger request is correct and has all the right values.

The problem here is that certmonger doesn't behave in the way that we expect it to do.  When we make the cert request and ask for the ca cert to be retrieved, it issues the cert and schedules the
cert to be returned asynchronously, even if you specify -w to wait for the cert.  -w will block pending the cert being retrieved, but not for the CA cert.

You can always force the retrieval to happen by restarting certmonger, and this has helped in some cases in the past, but is a less than ideal solution.
This is a bug in certmonger IMHO, in that we should expect the CA cert to be returned synchronously along with the cert if we specify -w.
I'll file a BZ for this.

The BZ for certmonger is unlikely to be fixed anytime soon though, so we need to look at other options.

Certainly, for now, using the workaround of setting the template parameters mentioned above is a good one.
But for a more permanent fix, we should rather think about why we're asking certmonger to retrieve the CA cert to begin with anyways -
given that the ca cert is already on the system.  Why not just point to - or link to - what is there?

I'll be happy to work with Martin and the Compute DFG to iron out a better way to do this.

Comment 7 Priscila 2021-05-04 14:33:57 UTC
Regarding comment#5

Comment 25 Rafael Urena 2021-11-15 22:28:09 UTC
I've created https://access.redhat.com/solutions/6490701 as a KCS for this issue.

Rafael Ureña

Comment 33 errata-xmlrpc 2021-12-09 20:19:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762