Bug 1895758 - [FFU 13-16.1][Ceph] fails with nova-join and ceph-rgw
Summary: [FFU 13-16.1][Ceph] fails with nova-join and ceph-rgw
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: Dave Wilde
QA Contact: Jeremy Agee
URL:
Whiteboard:
Depends On: 1924106
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-09 01:09 UTC by David Sedgmen
Modified: 2024-03-25 16:59 UTC (History)
8 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210104205656.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-17 15:35:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
setup-ipa-client.log (18.11 KB, text/plain)
2020-11-11 04:42 UTC, David Sedgmen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 770212 0 None MERGED Add missing IPA services for queens to train upgrades 2021-02-15 13:58:49 UTC
Red Hat Knowledge Base (Solution) 5581281 0 None None None 2020-11-17 22:51:18 UTC
Red Hat Product Errata RHBA-2021:0817 0 None None None 2021-03-17 15:36:03 UTC

Description David Sedgmen 2020-11-09 01:09:26 UTC
Description of problem:
This seems to fail because of the addtion of  "/usr/share/openstack-puppet/modules/tripleo/manifests/certmonger/ceph_rgw.pp" in OSP 16
Now it tries to create a certificate for ceph rgw which fails because the host Insufficient permission to added a krbprincipal for the service to  storage dns domain.

~~~
 Insufficient access:  Insufficient 'add' privilege to add the entry  'krbprincipalname=ceph_rgw/controller-1.storage.redhat.local,cn=services,cn=accounts,dc=redhat,dc=local'
~~~

Version-Release number of selected component (if applicable):


How reproducible:

Everytime

Steps to Reproduce:
1. Deploy RHOSP 13 Integrated with IdM using novajoin https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/integrate_with_identity_service/idm-novajoin

2. Follow the  Framework for Upgrades (13 to 16.1) documentation.

Actual results:

"<13>Oct 23 14:46:24 puppet-user: Debug: /Stage[main]/Tripleo::Certmonger::Novnc_pr
oxy/Certmonger_certificate[novnc-proxy]:  The container Class[Tripleo::Certmonger::Novnc_proxy] will propagate my  refresh event", "<13>Oct 23 14:46:24 puppet-user: Debug:  Class[Tripleo::Certmonger::Novnc_proxy]: The container Stage[main] w
ill  propagate my refresh event", "<13>Oct 23 14:46:24 puppet-user:  Notice:  /Stage[main]/Tripleo::Certmonger::Ceph_rgw/Certmonger_certificate[ceph_rgw]/ensure:  created", "<13>Oct 23 14:46:24 puppet-user: Debug: Issuing  getcert command with
 args:  [\"request\", \"-I\", \"ceph_rgw\", \"-f\",  \"/etc/pki/tls/certs/ceph_rgw.crt\", \"-c\", \"IPA\", \"-N\",  \"CN=controller-0.storage.redhat.local\", \"-K\",  \"ceph_rgw/controller-0.storage.redhat.local\", \"-D\",  \"controller-0.stor
age.redhat.local\",  \"-C\", \"/usr/bin/certmonger-rgw-refresh.sh\", \"-w\", \"-k\",  \"/etc/pki/tls/private/ceph_rgw.key\"]", "<13>Oct 23 14:46:24  puppet-user: Debug: Executing: '/usr/bin/getcert request -I ceph_rgw -f  /etc/pki/tls/certs/c
eph_rgw.crt  -c IPA -N CN=controller-0.storage.redhat.local -K  ceph_rgw/controller-0.storage.redhat.local -D  controller-0.storage.redhat.local -C /usr/bin/certmonger-rgw-refresh.sh  -w -k /etc/pki/tls/private/ceph_rgw.key'", "<13>Oct 23 14:
46:24  puppet-user: Warning: Could not get certificate: Execution of  '/usr/bin/getcert request -I ceph_rgw -f /etc/pki/tls/certs/ceph_rgw.crt  -c IPA -N CN=controller-0.storage.redhat.local -K  ceph_rgw/controller-0.storage.redhat.local -D c
ontroller-0.storage.redhat.local  -C /usr/bin/certmonger-rgw-refresh.sh -w -k  /etc/pki/tls/private/ceph_rgw.key' returned 2: New signing request  \"ceph_rgw\" added.", "<13>Oct 23 14:46:24 puppet-user: Debug:  Executing: '/usr/bin/getcert li
st  -i ceph_rgw'", "<13>Oct 23 14:46:24 puppet-user: Error:  /Stage[main]/Tripleo::Certmonger::Ceph_rgw/Certmonger_certificate[ceph_rgw]:  Could not evaluate: Could not get certificate: Server at https://freeipa-0.redhat.local/ipa/xml denied
 our  request, giving up: 2100 (RPC failed at server.  Insufficient access:  Insufficient 'add' privilege to add the entry  'krbprincipalname=ceph_rgw/controller-0.storage.redhat.local,cn=services,cn=accounts,dc=redhat,dc=local'
.).",  "<13>Oct 23 14:46:24 puppet-user: Notice:  /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Httpd[httpd-ctlplane]/Certmonger_certificate[httpd-ctlplane]/principal:  defined 'principal' as 'HTTP/controller-0.ct
lplane.redhat.local'",  "<13>Oct 23 14:46:24 puppet-user: Debug: Executing:  '/usr/bin/getcert resubmit -i httpd-ctlplane -f  /etc/pki/tls/certs/httpd/httpd-ctlplane.crt -c IPA -N  CN=controller-0.ctlplane.redhat.local -K HTTP/controller-0.ct
lplane.redhat.local -D controller-0.ctlplane.redhat.local -C pkill -USR1 httpd -w'", "

Expected results:

For the host to have permission to create the certificates it needs.

Additional info:

Worked around this by adding the DNS, the service and allowing the controller-n.redhat.local mange the service.

[root@controller-0 ~]# kinit admin
Password for admin: 

[root@controller-0 ~]# ipa dnsrecord-add
Record name: controller-0
Zone name: storage.redhat.local
Please choose a type of DNS resource record to be added
The most common types for this type of zone are: A, AAAA

DNS resource record type: A
A IP Address: 172.17.3.132
  Record name: controller-0
  A record: 172.17.3.132

[root@controller-0 ~]# ipa service-add ceph_rgw/controller-0.storage.redhat.local
-----------------------------------------------------------------------
Added service "ceph_rgw/controller-0.storage.redhat.local"
-----------------------------------------------------------------------
  Principal name: ceph_rgw/controller-0.storage.redhat.local
  Principal alias: ceph_rgw/controller-0.storage.redhat.local
  Managed by: controller-0.storage.redhat.local

[root@controller-0  ~]#  ipa service-add-host --hosts controller-0.redhat.local  ceph_rgw/controller-0.storage.redhat.local
  Principal name: ceph_rgw/controller-0.storage.redhat.local
  Principal alias: ceph_rgw/controller-0.storage.redhat.local
  Managed by: controller-0.storage.redhat.local, controller-0.redhat.local
-------------------------
Number of members added 1
-------------------------

[root@controller-0 ~]# /usr/bin/getcert resubmit -i ceph_rgw
Resubmitting "ceph_rgw" to "IPA".
[root@controller-0 ~]# /usr/bin/getcert list -i ceph_rgw -v
Number of certificates and requests being tracked: 17.
Request ID 'ceph_rgw':
    status: MONITORING
    stuck: no
    key pair storage: type=FILE,location='/etc/pki/tls/private/ceph_rgw.key'
    certificate: type=FILE,location='/etc/pki/tls/certs/ceph_rgw.crt'
    CA: IPA
    issuer: CN=Certificate Authority,O=REDHAT.LOCAL
    subject: CN=controller-0.storage.redhat.local,O=REDHAT.LOCAL
    expires: 2022-10-29 01:00:25 UTC
    dns: controller-0.storage.redhat.local
    principal name: ceph_rgw/controller-0.storage.redhat.local
    key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
    eku: id-kp-serverAuth,id-kp-clientAuth
    pre-save command: 
    post-save command: /usr/bin/certmonger-rgw-refresh.sh
    track: yes
    auto-renew: yes

Comment 2 Ade Lee 2020-11-09 19:41:25 UTC
It is novajoin's responsibility to add the missing services etc. in IPA prior to certmonger requesting the certificates.
The question then, is why novajoin presumably is not being triggered to do this.

First off, we need to confirm that the metadata for the server has been updated. David -- Can you provide the metadata for the server once the FFU
updates the overcloud? 

Assuming that the metadata has been updated, I think I might see why novajoin might not be triggered:

TLS-E using novajoin is triggered by the following template:
https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml

and, in particular, in https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml#L185-L194
The line that triggers updates by novajoin is: 

https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml#L91

which gets the config data from the config drive, causing a call to novajoin as a dynamic vendor data service.  Novajoin would then look at the (updated) metadata
and add services/hosts etc. as needed.

However, as you can see on line : https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml#L194
this only takes place when the server is not already an ipa client - which would be the case for instance, if we were attempting to do a brownfield deployment.
But in this case, we have a server which is already an ipa client - as all the other services were already enrolled in ipa, and so any further updates were skipped.

We'd need to examine this logic to see if we can be smarter about the ipa-client check.

We should note that this is not a problem if you chose to migrate to tripleo-ipa instead, because the correct services etc. are created as an undercloud
task beforehand:

https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaservices-baremetal-ansible.yaml#L98-L122

Comment 3 David Sedgmen 2020-11-09 21:59:47 UTC
It looks like the novajoin_notifier was down because of a missed configured transport_url. 

Looking at the original it had the correct rabbitmq user and password
~~~
less /etc/novajoin/join.conf.rpmsave 

transport_url=rabbit://d4285439706d8dfc62f3cd78ff751b1a599baf51:4f238728a84593674eb967100e4ec06bb89995ed.24.1//
~~~

But the upgrade configured the transport_url with the user guest.

~~~
less /var/lib/config-data/puppet-generated/novajoin/etc/novajoin/join.conf
transport_url=rabbit://guest:4f238728a84593674eb967100e4ec06bb89995ed.redhat.local:5672/?ssl=0
~~~

After reverting this the service is back up. 
~~~
2020-11-09 21:49:00.134 7 ERROR join     (class_id, method_id), ConnectionError)
2020-11-09 21:49:00.134 7 ERROR join amqp.exceptions.AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.
2020-11-09 21:49:00.134 7 ERROR join 
2020-11-09 21:49:04.157 7 INFO novajoin.notifications [-] Starting
2020-11-09 21:49:07.093 7 INFO novajoin.notifications [-] [3dc41ebe-db20-405b-8084-573741ae068b] compute instance update for controller-0.redhat.local
2020-11-09 21:49:07.123 7 INFO novajoin.notifications [-] [18eedb92-727c-458c-86dd-326988be8c59] compute instance update for controller-2.redhat.local
2020-11-09 21:49:08.113 7 INFO novajoin.notifications [-] [79a4d964-7e1c-4a05-9de8-82e71cfd299f] compute instance update for controller-1.redhat.local
2020-11-09 21:49:26.229 7 INFO novajoin.notifications [-] Starting
~~~

Comment 4 Ade Lee 2020-11-10 18:50:12 UTC
David, 

Thats good to know, but it doesn't necessarily affect this situation.  The notifier is used primarily to clean up IPA when servers are deleted.
The question I had was - what is the metadata for the nodes?

That is, what is the output of "openstack server show <<uuid of controller-0>> , and maybe for the other controllers too.
I want to see whats in the server metadata.  The server metadata should contain for instance, ipa_enroll: True , but also the lists of compact and managed services.

Comment 5 David Sedgmen 2020-11-11 02:11:24 UTC
OS-DCF:diskConfig: MANUAL
OS-EXT-AZ:availability_zone: nova
OS-EXT-SRV-ATTR:host: undercloud-0.redhat.local
OS-EXT-SRV-ATTR:hypervisor_hostname: 2cd08d8f-cf30-499e-89bb-ad88e89d219c
OS-EXT-SRV-ATTR:instance_name: instance-0000006d
OS-EXT-STS:power_state: Running
OS-EXT-STS:task_state: null
OS-EXT-STS:vm_state: active
OS-SRV-USG:launched_at: '2020-10-12T08:26:16.000000'
OS-SRV-USG:terminated_at: null
accessIPv4: ''
accessIPv6: ''
addresses: ctlplane=192.168.24.33
config_drive: 'True'
created: '2020-10-12T08:20:51Z'
flavor: controller (2cb99cee-b0c6-40cf-a105-f0b5aa8babd6)
hostId: e89609a4acca37ece09a0a31f5d2983c56edbd655240dad21c90de9d
id: 29cacbd7-c092-4e5a-875b-de81c66af778
image: overcloud-full_20201003T070010Z (bea5fd78-ad6b-4aa0-914b-395eea196f96)
key_name: default
name: controller-0
progress: 0
project_id: 61d63d0900df4ceaaa9ca08353af64a8
properties: compact_service_HTTP='["ctlplane", "storage", "storagemgmt", "internalapi",
  "external"]', compact_service_ceph_rgw='["storage"]', compact_service_haproxy='["ctlplane",
  "storage", "storagemgmt", "internalapi"]', compact_service_libvirt-vnc='["internalapi"]',
  compact_service_mysql='["internalapi"]', compact_service_neutron='["internalapi"]',
  compact_service_novnc-proxy='["internalapi"]', compact_service_rabbitmq='["internalapi"]',
  compact_service_redis='["internalapi"]', ipa_enroll='true', managed_service_haproxyctlplane='haproxy/overcloud.ctlplane.redhat.local',
  managed_service_haproxyexternal='haproxy/overcloud.redhat.local', managed_service_haproxyinternal_api='haproxy/overcloud.internalapi.redhat.local',
  managed_service_haproxystorage='haproxy/overcloud.storage.redhat.local', managed_service_haproxystorage_mgmt='haproxy/overcloud.storagemgmt.redhat.local',
  managed_service_mysqlinternal_api='mysql/overcloud.internalapi.redhat.local', managed_service_redisinternal_api='redis/overcloud.internalapi.redhat.local'
security_groups: name='default'
status: ACTIVE
updated: '2020-10-23T03:45:16Z'
user_id: 074c7d66e40243908472df0e417cede6
volumes_attached: ''


OS-DCF:diskConfig: MANUAL
OS-EXT-AZ:availability_zone: nova
OS-EXT-SRV-ATTR:host: undercloud-0.redhat.local
OS-EXT-SRV-ATTR:hypervisor_hostname: 85c785ae-7fdd-4efd-b087-6078263e60f4
OS-EXT-SRV-ATTR:instance_name: instance-0000006a
OS-EXT-STS:power_state: Running
OS-EXT-STS:task_state: null
OS-EXT-STS:vm_state: active
OS-SRV-USG:launched_at: '2020-10-12T08:23:38.000000'
OS-SRV-USG:terminated_at: null
accessIPv4: ''
accessIPv6: ''
addresses: ctlplane=192.168.24.35
config_drive: 'True'
created: '2020-10-12T08:20:50Z'
flavor: controller (2cb99cee-b0c6-40cf-a105-f0b5aa8babd6)
hostId: e89609a4acca37ece09a0a31f5d2983c56edbd655240dad21c90de9d
id: c1d5e113-e8d6-4412-bc8a-b12b0e3cebae
image: overcloud-full_20201003T070010Z (bea5fd78-ad6b-4aa0-914b-395eea196f96)
key_name: default
name: controller-2
progress: 0
project_id: 61d63d0900df4ceaaa9ca08353af64a8
properties: compact_service_HTTP='["ctlplane", "storage", "storagemgmt", "internalapi",
  "external"]', compact_service_ceph_rgw='["storage"]', compact_service_haproxy='["ctlplane",
  "storage", "storagemgmt", "internalapi"]', compact_service_libvirt-vnc='["internalapi"]',
  compact_service_mysql='["internalapi"]', compact_service_neutron='["internalapi"]',
  compact_service_novnc-proxy='["internalapi"]', compact_service_rabbitmq='["internalapi"]',
  compact_service_redis='["internalapi"]', ipa_enroll='true', managed_service_haproxyctlplane='haproxy/overcloud.ctlplane.redhat.local',
  managed_service_haproxyexternal='haproxy/overcloud.redhat.local', managed_service_haproxyinternal_api='haproxy/overcloud.internalapi.redhat.local',
  managed_service_haproxystorage='haproxy/overcloud.storage.redhat.local', managed_service_haproxystorage_mgmt='haproxy/overcloud.storagemgmt.redhat.local',
  managed_service_mysqlinternal_api='mysql/overcloud.internalapi.redhat.local', managed_service_redisinternal_api='redis/overcloud.internalapi.redhat.local'
security_groups: name='default'
status: ACTIVE
updated: '2020-10-23T03:45:16Z'
user_id: 074c7d66e40243908472df0e417cede6
volumes_attached: ''


OS-DCF:diskConfig: MANUAL
OS-EXT-AZ:availability_zone: nova
OS-EXT-SRV-ATTR:host: undercloud-0.redhat.local
OS-EXT-SRV-ATTR:hypervisor_hostname: e003381f-95b3-455f-a114-56390ebdbd38
OS-EXT-SRV-ATTR:instance_name: instance-0000006c
OS-EXT-STS:power_state: Running
OS-EXT-STS:task_state: null
OS-EXT-STS:vm_state: active
OS-SRV-USG:launched_at: '2020-10-12T08:23:33.000000'
OS-SRV-USG:terminated_at: null
accessIPv4: ''
accessIPv6: ''
addresses: ctlplane=192.168.24.20
config_drive: 'True'
created: '2020-10-12T08:20:50Z'
flavor: controller (2cb99cee-b0c6-40cf-a105-f0b5aa8babd6)
hostId: e89609a4acca37ece09a0a31f5d2983c56edbd655240dad21c90de9d
id: e5d1b72b-c114-4cfd-baa6-b2b7bb4d16de
image: overcloud-full_20201003T070010Z (bea5fd78-ad6b-4aa0-914b-395eea196f96)
key_name: default
name: controller-1
progress: 0
project_id: 61d63d0900df4ceaaa9ca08353af64a8
properties: compact_service_HTTP='["ctlplane", "storage", "storagemgmt", "internalapi",
  "external"]', compact_service_ceph_rgw='["storage"]', compact_service_haproxy='["ctlplane",
  "storage", "storagemgmt", "internalapi"]', compact_service_libvirt-vnc='["internalapi"]',
  compact_service_mysql='["internalapi"]', compact_service_neutron='["internalapi"]',
  compact_service_novnc-proxy='["internalapi"]', compact_service_rabbitmq='["internalapi"]',
  compact_service_redis='["internalapi"]', ipa_enroll='true', managed_service_haproxyctlplane='haproxy/overcloud.ctlplane.redhat.local',
  managed_service_haproxyexternal='haproxy/overcloud.redhat.local', managed_service_haproxyinternal_api='haproxy/overcloud.internalapi.redhat.local',
  managed_service_haproxystorage='haproxy/overcloud.storage.redhat.local', managed_service_haproxystorage_mgmt='haproxy/overcloud.storagemgmt.redhat.local',
  managed_service_mysqlinternal_api='mysql/overcloud.internalapi.redhat.local', managed_service_redisinternal_api='redis/overcloud.internalapi.redhat.local'
security_groups: name='default'
status: ACTIVE
updated: '2020-10-23T03:45:16Z'
user_id: 074c7d66e40243908472df0e417cede6
volumes_attached: ''

Comment 6 David Sedgmen 2020-11-11 04:40:00 UTC
I don't believe this has been changed or rerun since the orginal OSP 13 deploy 

~~~

{"join": {"hostname": "controller-1.redhat.local"
 "ipaotp": "1d163c1672dc4c32a2474051306bca07"
 "krb_realm": "REDHAT.LOCAL"} "static": {"cloud-init": "#cloud-config
packages:
 - python-simplejson
 - ipa-client
 - ipa-admintools
 - openldap-clients
 - hostname
write_files:
 - content: |
     #!/bin/sh
     
     function get_metadata_config_drive {
         if [ -f /run/cloud-init/status.json ]; then
             # Get metadata from config drive
             data=`cat /run/cloud-init/status.json`
             config_drive=`echo $data | python -c 'import json,re,sys;obj=json.load(sys.stdin);ds=obj.get(\"v1\", {}).get(\"datasource\"); print(re.findall(r\"source=(.*)]\", ds)[0])'`
             if [[ -b $config_drive ]]; then
                 temp_dir=`mktemp -d`
                 mount $config_drive $temp_dir
                 if [ -f $temp_dir/openstack/latest/vendor_data2.json ]; then
                     data=`cat $temp_dir/openstack/latest/vendor_data2.json`
                     umount $config_drive
                     rmdir $temp_dir
                 else
                     umount $config_drive
                     rmdir $temp_dir
                 fi
             else 
                 echo \"Unable to retrieve metadata from config drive.\"
                 return 1
             fi
         else
             echo \"Unable to retrieve metadata from config drive.\"
             return 1
         fi
     
         return 0
     }
     
     function get_metadata_network {
         # Get metadata over the network
         data=$(timeout 300 /bin/bash -c 'data=\"\"; while [ -z \"$data\" ]; do sleep $[ ( $RANDOM % 10 )  + 1 ]s; data=`curl -s http://169.254.169.254/openstack/2016-10-06/vendor_data2.json 2>/dev/null`; done; echo $data')
     
         if [[ $? != 0 ]] ; then
             echo \"Unable to retrieve metadata from metadata service.\"
             return 1
         fi
     }
     
     function get_fqdn {
         # Get the instance hostname out of the metadata
         fqdn=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"hostname\", \"\"))'`
         if [ -z \"$fqdn\"]; then
             echo \"Unable to determine hostname\"
             return 1
         fi
         return 0
     }
     
     if ! get_metadata_config_drive || ! get_fqdn; then
        if ! get_metadata_network || ! get_fqdn; then
            echo \"FATAL: No metadata available or could not read the hostname from the metadata\"
            exit 1
        fi
     fi
     
     realm=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"krb_realm\", \"\"))'`
     otp=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"ipaotp\", \"\"))'`
     
     if [ -z \"$otp\" ]; then
         echo \"FATAL: Could not read OTP from the metadata. This means that a host with the same name was already enrolled in IPA.\"
         exit 1
     fi
     
     # run ipa-client-install
     OPTS=\"-U -w $otp --hostname $fqdn --mkhomedir\"
     if [ -n \"$realm\" ]; then
         OPTS=\"$OPTS --realm=$realm\"
     fi
     ipa-client-install $OPTS
   path: /root/setup-ipa-client.sh
   permissions: '0700'
   owner: root:root
runcmd:
- sh -x /root/setup-ipa-client.sh > /var/log/setup-ipa-client.log 2>&1"}}

~~~

Comment 7 David Sedgmen 2020-11-11 04:42:39 UTC
Created attachment 1728260 [details]
setup-ipa-client.log

Comment 8 David Sedgmen 2020-11-11 04:48:55 UTC
{"admin_pass": "vbciaFfye5QE"
 "random_seed": "ZcJ3R+j9wVFFdFJYubnlnc5XIZr5DlIFF7Wt33Mb97bF3lU0RilTwVUaB0Y7dSqrOZjYqyrQZcWXvrUazOAspuoEmMEPMiwCQgtZPU7KOPYgr5Zf7uBuxTCuKbI1QkujxgSb+WPbR1QYsXdUqAijAdOvoMjU2rV7a+2fus0jpPE9qGhWK/sJYNOxWilMKq0hLB4lPl/pQJOoNyIOnLAtJDPbipqx3WmG2HAZaXTDhCBNy8sRRKLkiqC+AJ6g4DDpygqGtTockdMRGiatxSFNBK5i1ANiF4ecjgTWFa52xVk1ZuqMl2CdpLG5aF89kqXX0SD95O0NZ3Rq3DvKGndEOzzBHkwgwPHUFhnDBLwTnTXViRH5z6iKCx1MuC0MBTekrl4RVu3XGi+Q+NQhe/THzIIL6zeV9GqQcwgSS5oT+cWzCS2IUnF6gwP9IE2LDglKeC/eLspu8S8hv1BQtB74PkluTQDWtU270Hid8ek9kfzkgH1qawSL9tqqwaKWvzaVz03+qN4XcOBbXIy3FUkw/s7zGckdydyl8dh/kH5X0/NZa3QhVMJP1k5npYOaNAemVhKEenpWbN3quH27hzZ/W69AvWUptCVFaLZzLZEHCS7/zbctdblcYyyKLbD+n1lvYhh9bReFpPeOaD5a59OxBJOr4qPaXUE6/KvfaeUMivA="
 "uuid": "e5d1b72b-c114-4cfd-baa6-b2b7bb4d16de"
 "availability_zone": "nova"
 "keys": [{"data": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9yrgjkk2hsgmSU2/1YjJLDFrmEs3fOnDCOCd1Qc0ETq8fQyPAiZvbC8xiYWMUTdG8741irib28+ujP1LoUmvJo5j45+JbikNQiEcgTEVCqJy1eheAKkL8ES8Xq/HQ7pTmxYwoxOQHoqOFEPOwY3ICJ6GQObdD0/7n530eNx5SwfzjfJ8Zs9bxONRXv/b38TgcIkCWpfxucFzEo8ZQFhSbtO253Vd/s7gHTKGpE9xMdx304F+A0yOGTGQIVKtH7D1FPtJj6OMmnLuaqpeA6G6ODnQdNSAmweDZNGpxvQ3/BKuGcU+s9MXJkRIHLVgxyOrK7WjENPBKLzMvEhexSSSz"
 "type": "ssh"
 "name": "default"}]
 "hostname": "controller-1"
 "launch_index": 0
 "devices": []
 "meta": {"compact_service_libvirt-vnc": "[\"internalapi\"]"
 "compact_service_novnc-proxy": "[\"internalapi\"]"
 "managed_service_mysqlinternal_api": "mysql/overcloud.internalapi.redhat.local"
 "managed_service_haproxyctlplane": "haproxy/overcloud.ctlplane.redhat.local"
 "compact_service_redis": "[\"internalapi\"]"
 "managed_service_haproxystorage": "haproxy/overcloud.storage.redhat.local"
 "compact_service_mysql": "[\"internalapi\"]"
 "compact_service_HTTP": "[\"ctlplane\", \"storage\", \"storagemgmt\", \"internalapi\", \"external\"]"
 "managed_service_haproxyexternal": "haproxy/overcloud.redhat.local"
 "compact_service_haproxy": "[\"ctlplane\", \"storage\", \"storagemgmt\", \"internalapi\"]"
 "compact_service_rabbitmq": "[\"internalapi\"]"
 "ipa_enroll": "true"
 "compact_service_neutron": "[\"internalapi\"]"
 "managed_service_haproxyinternal_api": "haproxy/overcloud.internalapi.redhat.local"
 "managed_service_redisinternal_api": "redis/overcloud.internalapi.redhat.local"
 "managed_service_haproxystorage_mgmt": "haproxy/overcloud.storagemgmt.redhat.local"}
 "public_keys": {"default": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9yrgjkk2hsgmSU2/1YjJLDFrmEs3fOnDCOCd1Qc0ETq8fQyPAiZvbC8xiYWMUTdG8741irib28+ujP1LoUmvJo5j45+JbikNQiEcgTEVCqJy1eheAKkL8ES8Xq/HQ7pTmxYwoxOQHoqOFEPOwY3ICJ6GQObdD0/7n530eNx5SwfzjfJ8Zs9bxONRXv/b38TgcIkCWpfxucFzEo8ZQFhSbtO253Vd/s7gHTKGpE9xMdx304F+A0yOGTGQIVKtH7D1FPtJj6OMmnLuaqpeA6G6ODnQdNSAmweDZNGpxvQ3/BKuGcU+s9MXJkRIHLVgxyOrK7WjENPBKLzMvEhexSSSz"}
 "project_id": "61d63d0900df4ceaaa9ca08353af64a8"
 "name": "controller-1"}

Comment 10 Ade Lee 2020-11-12 20:50:05 UTC
David,

Thanks for the data.  From what I see in https://bugzilla.redhat.com/show_bug.cgi?id=1895758#c5, there appears to be metadata that indicates that the ceph_rgw data should be set:

properties: compact_service_HTTP='["ctlplane", "storage", "storagemgmt", "internalapi",
  "external"]', compact_service_ceph_rgw='["storage"]', compact_service_haproxy='["ctlplane", ...

With this data, this principal should have been added by novajoin (krbprincipalname=ceph_rgw/controller-1.storage.redhat.local,cn=services,cn=accounts,dc=redhat,dc=local), 
but was not -- probably because of the code issue I just mentioned.

I think at this point, we have enough information to conclude this is likely a bug.
Would you be able to test a small fix to THT to test out the theory I suggested above?

Also, what is the data in https://bugzilla.redhat.com/show_bug.cgi?id=1895758#c8 ?

Comment 11 Ade Lee 2020-11-12 20:55:01 UTC
Actually, not sure if we could do a simple THT fix ..

Comment 15 David Sedgmen 2020-11-12 21:38:25 UTC
Sorry I should have mentioned it in the comment, that is the meta_data on controller one. 

It was located on the first partition of the disk, so I believe this would be the metadata from the server when it was deployed in OSP 13.

How is the does novajoin add these services, is it from the service on the director or part of the IPA enrolment on the overcloud node?

Comment 17 Ade Lee 2020-11-18 15:45:38 UTC
@dsedgman, see my comment in https://bugzilla.redhat.com/show_bug.cgi?id=1895758#c2  above.  Novajoin adds this through the service on the director.
(https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml)

The service invokes the script /root/setup-ipa-client.sh on the node - which retrieves the metadata -> which triggers novajoin to add the services.
Unfortunately, right now, because of the logic I pointed out above, we are not running this script on upgrade.

So -> service in director -> host_prep_tasks -> run setup-ipa-client.sh script -> retrieve metadata -> invoke novajoin metadata service -> add services

Comment 35 errata-xmlrpc 2021-03-17 15:35:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817


Note You need to log in before you can comment on or make changes to this bug.