Bug 1895758
| Summary: | [FFU 13-16.1][Ceph] fails with nova-join and ceph-rgw | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Sedgmen <dsedgmen> | ||||
| Component: | openstack-tripleo-heat-templates | Assignee: | Dave Wilde <dwilde> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Jeremy Agee <jagee> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 13.0 (Queens) | CC: | alee, camorris, hrybacki, jpretori, mburns, mgarciac, michele, ramishra | ||||
| Target Milestone: | z4 | Keywords: | Triaged, ZStream | ||||
| Target Release: | 16.1 (Train on RHEL 8.2) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-1.20210104205656.el8ost | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-03-17 15:35:39 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1924106 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
David Sedgmen
2020-11-09 01:09:26 UTC
It is novajoin's responsibility to add the missing services etc. in IPA prior to certmonger requesting the certificates. The question then, is why novajoin presumably is not being triggered to do this. First off, we need to confirm that the metadata for the server has been updated. David -- Can you provide the metadata for the server once the FFU updates the overcloud? Assuming that the metadata has been updated, I think I might see why novajoin might not be triggered: TLS-E using novajoin is triggered by the following template: https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml and, in particular, in https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml#L185-L194 The line that triggers updates by novajoin is: https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml#L91 which gets the config data from the config drive, causing a call to novajoin as a dynamic vendor data service. Novajoin would then look at the (updated) metadata and add services/hosts etc. as needed. However, as you can see on line : https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml#L194 this only takes place when the server is not already an ipa client - which would be the case for instance, if we were attempting to do a brownfield deployment. But in this case, we have a server which is already an ipa client - as all the other services were already enrolled in ipa, and so any further updates were skipped. We'd need to examine this logic to see if we can be smarter about the ipa-client check. We should note that this is not a problem if you chose to migrate to tripleo-ipa instead, because the correct services etc. are created as an undercloud task beforehand: https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaservices-baremetal-ansible.yaml#L98-L122 It looks like the novajoin_notifier was down because of a missed configured transport_url. Looking at the original it had the correct rabbitmq user and password ~~~ less /etc/novajoin/join.conf.rpmsave transport_url=rabbit://d4285439706d8dfc62f3cd78ff751b1a599baf51:4f238728a84593674eb967100e4ec06bb89995ed.24.1// ~~~ But the upgrade configured the transport_url with the user guest. ~~~ less /var/lib/config-data/puppet-generated/novajoin/etc/novajoin/join.conf transport_url=rabbit://guest:4f238728a84593674eb967100e4ec06bb89995ed.redhat.local:5672/?ssl=0 ~~~ After reverting this the service is back up. ~~~ 2020-11-09 21:49:00.134 7 ERROR join (class_id, method_id), ConnectionError) 2020-11-09 21:49:00.134 7 ERROR join amqp.exceptions.AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile. 2020-11-09 21:49:00.134 7 ERROR join 2020-11-09 21:49:04.157 7 INFO novajoin.notifications [-] Starting 2020-11-09 21:49:07.093 7 INFO novajoin.notifications [-] [3dc41ebe-db20-405b-8084-573741ae068b] compute instance update for controller-0.redhat.local 2020-11-09 21:49:07.123 7 INFO novajoin.notifications [-] [18eedb92-727c-458c-86dd-326988be8c59] compute instance update for controller-2.redhat.local 2020-11-09 21:49:08.113 7 INFO novajoin.notifications [-] [79a4d964-7e1c-4a05-9de8-82e71cfd299f] compute instance update for controller-1.redhat.local 2020-11-09 21:49:26.229 7 INFO novajoin.notifications [-] Starting ~~~ David, Thats good to know, but it doesn't necessarily affect this situation. The notifier is used primarily to clean up IPA when servers are deleted. The question I had was - what is the metadata for the nodes? That is, what is the output of "openstack server show <<uuid of controller-0>> , and maybe for the other controllers too. I want to see whats in the server metadata. The server metadata should contain for instance, ipa_enroll: True , but also the lists of compact and managed services. OS-DCF:diskConfig: MANUAL OS-EXT-AZ:availability_zone: nova OS-EXT-SRV-ATTR:host: undercloud-0.redhat.local OS-EXT-SRV-ATTR:hypervisor_hostname: 2cd08d8f-cf30-499e-89bb-ad88e89d219c OS-EXT-SRV-ATTR:instance_name: instance-0000006d OS-EXT-STS:power_state: Running OS-EXT-STS:task_state: null OS-EXT-STS:vm_state: active OS-SRV-USG:launched_at: '2020-10-12T08:26:16.000000' OS-SRV-USG:terminated_at: null accessIPv4: '' accessIPv6: '' addresses: ctlplane=192.168.24.33 config_drive: 'True' created: '2020-10-12T08:20:51Z' flavor: controller (2cb99cee-b0c6-40cf-a105-f0b5aa8babd6) hostId: e89609a4acca37ece09a0a31f5d2983c56edbd655240dad21c90de9d id: 29cacbd7-c092-4e5a-875b-de81c66af778 image: overcloud-full_20201003T070010Z (bea5fd78-ad6b-4aa0-914b-395eea196f96) key_name: default name: controller-0 progress: 0 project_id: 61d63d0900df4ceaaa9ca08353af64a8 properties: compact_service_HTTP='["ctlplane", "storage", "storagemgmt", "internalapi", "external"]', compact_service_ceph_rgw='["storage"]', compact_service_haproxy='["ctlplane", "storage", "storagemgmt", "internalapi"]', compact_service_libvirt-vnc='["internalapi"]', compact_service_mysql='["internalapi"]', compact_service_neutron='["internalapi"]', compact_service_novnc-proxy='["internalapi"]', compact_service_rabbitmq='["internalapi"]', compact_service_redis='["internalapi"]', ipa_enroll='true', managed_service_haproxyctlplane='haproxy/overcloud.ctlplane.redhat.local', managed_service_haproxyexternal='haproxy/overcloud.redhat.local', managed_service_haproxyinternal_api='haproxy/overcloud.internalapi.redhat.local', managed_service_haproxystorage='haproxy/overcloud.storage.redhat.local', managed_service_haproxystorage_mgmt='haproxy/overcloud.storagemgmt.redhat.local', managed_service_mysqlinternal_api='mysql/overcloud.internalapi.redhat.local', managed_service_redisinternal_api='redis/overcloud.internalapi.redhat.local' security_groups: name='default' status: ACTIVE updated: '2020-10-23T03:45:16Z' user_id: 074c7d66e40243908472df0e417cede6 volumes_attached: '' OS-DCF:diskConfig: MANUAL OS-EXT-AZ:availability_zone: nova OS-EXT-SRV-ATTR:host: undercloud-0.redhat.local OS-EXT-SRV-ATTR:hypervisor_hostname: 85c785ae-7fdd-4efd-b087-6078263e60f4 OS-EXT-SRV-ATTR:instance_name: instance-0000006a OS-EXT-STS:power_state: Running OS-EXT-STS:task_state: null OS-EXT-STS:vm_state: active OS-SRV-USG:launched_at: '2020-10-12T08:23:38.000000' OS-SRV-USG:terminated_at: null accessIPv4: '' accessIPv6: '' addresses: ctlplane=192.168.24.35 config_drive: 'True' created: '2020-10-12T08:20:50Z' flavor: controller (2cb99cee-b0c6-40cf-a105-f0b5aa8babd6) hostId: e89609a4acca37ece09a0a31f5d2983c56edbd655240dad21c90de9d id: c1d5e113-e8d6-4412-bc8a-b12b0e3cebae image: overcloud-full_20201003T070010Z (bea5fd78-ad6b-4aa0-914b-395eea196f96) key_name: default name: controller-2 progress: 0 project_id: 61d63d0900df4ceaaa9ca08353af64a8 properties: compact_service_HTTP='["ctlplane", "storage", "storagemgmt", "internalapi", "external"]', compact_service_ceph_rgw='["storage"]', compact_service_haproxy='["ctlplane", "storage", "storagemgmt", "internalapi"]', compact_service_libvirt-vnc='["internalapi"]', compact_service_mysql='["internalapi"]', compact_service_neutron='["internalapi"]', compact_service_novnc-proxy='["internalapi"]', compact_service_rabbitmq='["internalapi"]', compact_service_redis='["internalapi"]', ipa_enroll='true', managed_service_haproxyctlplane='haproxy/overcloud.ctlplane.redhat.local', managed_service_haproxyexternal='haproxy/overcloud.redhat.local', managed_service_haproxyinternal_api='haproxy/overcloud.internalapi.redhat.local', managed_service_haproxystorage='haproxy/overcloud.storage.redhat.local', managed_service_haproxystorage_mgmt='haproxy/overcloud.storagemgmt.redhat.local', managed_service_mysqlinternal_api='mysql/overcloud.internalapi.redhat.local', managed_service_redisinternal_api='redis/overcloud.internalapi.redhat.local' security_groups: name='default' status: ACTIVE updated: '2020-10-23T03:45:16Z' user_id: 074c7d66e40243908472df0e417cede6 volumes_attached: '' OS-DCF:diskConfig: MANUAL OS-EXT-AZ:availability_zone: nova OS-EXT-SRV-ATTR:host: undercloud-0.redhat.local OS-EXT-SRV-ATTR:hypervisor_hostname: e003381f-95b3-455f-a114-56390ebdbd38 OS-EXT-SRV-ATTR:instance_name: instance-0000006c OS-EXT-STS:power_state: Running OS-EXT-STS:task_state: null OS-EXT-STS:vm_state: active OS-SRV-USG:launched_at: '2020-10-12T08:23:33.000000' OS-SRV-USG:terminated_at: null accessIPv4: '' accessIPv6: '' addresses: ctlplane=192.168.24.20 config_drive: 'True' created: '2020-10-12T08:20:50Z' flavor: controller (2cb99cee-b0c6-40cf-a105-f0b5aa8babd6) hostId: e89609a4acca37ece09a0a31f5d2983c56edbd655240dad21c90de9d id: e5d1b72b-c114-4cfd-baa6-b2b7bb4d16de image: overcloud-full_20201003T070010Z (bea5fd78-ad6b-4aa0-914b-395eea196f96) key_name: default name: controller-1 progress: 0 project_id: 61d63d0900df4ceaaa9ca08353af64a8 properties: compact_service_HTTP='["ctlplane", "storage", "storagemgmt", "internalapi", "external"]', compact_service_ceph_rgw='["storage"]', compact_service_haproxy='["ctlplane", "storage", "storagemgmt", "internalapi"]', compact_service_libvirt-vnc='["internalapi"]', compact_service_mysql='["internalapi"]', compact_service_neutron='["internalapi"]', compact_service_novnc-proxy='["internalapi"]', compact_service_rabbitmq='["internalapi"]', compact_service_redis='["internalapi"]', ipa_enroll='true', managed_service_haproxyctlplane='haproxy/overcloud.ctlplane.redhat.local', managed_service_haproxyexternal='haproxy/overcloud.redhat.local', managed_service_haproxyinternal_api='haproxy/overcloud.internalapi.redhat.local', managed_service_haproxystorage='haproxy/overcloud.storage.redhat.local', managed_service_haproxystorage_mgmt='haproxy/overcloud.storagemgmt.redhat.local', managed_service_mysqlinternal_api='mysql/overcloud.internalapi.redhat.local', managed_service_redisinternal_api='redis/overcloud.internalapi.redhat.local' security_groups: name='default' status: ACTIVE updated: '2020-10-23T03:45:16Z' user_id: 074c7d66e40243908472df0e417cede6 volumes_attached: '' I don't believe this has been changed or rerun since the orginal OSP 13 deploy
~~~
{"join": {"hostname": "controller-1.redhat.local"
"ipaotp": "1d163c1672dc4c32a2474051306bca07"
"krb_realm": "REDHAT.LOCAL"} "static": {"cloud-init": "#cloud-config
packages:
- python-simplejson
- ipa-client
- ipa-admintools
- openldap-clients
- hostname
write_files:
- content: |
#!/bin/sh
function get_metadata_config_drive {
if [ -f /run/cloud-init/status.json ]; then
# Get metadata from config drive
data=`cat /run/cloud-init/status.json`
config_drive=`echo $data | python -c 'import json,re,sys;obj=json.load(sys.stdin);ds=obj.get(\"v1\", {}).get(\"datasource\"); print(re.findall(r\"source=(.*)]\", ds)[0])'`
if [[ -b $config_drive ]]; then
temp_dir=`mktemp -d`
mount $config_drive $temp_dir
if [ -f $temp_dir/openstack/latest/vendor_data2.json ]; then
data=`cat $temp_dir/openstack/latest/vendor_data2.json`
umount $config_drive
rmdir $temp_dir
else
umount $config_drive
rmdir $temp_dir
fi
else
echo \"Unable to retrieve metadata from config drive.\"
return 1
fi
else
echo \"Unable to retrieve metadata from config drive.\"
return 1
fi
return 0
}
function get_metadata_network {
# Get metadata over the network
data=$(timeout 300 /bin/bash -c 'data=\"\"; while [ -z \"$data\" ]; do sleep $[ ( $RANDOM % 10 ) + 1 ]s; data=`curl -s http://169.254.169.254/openstack/2016-10-06/vendor_data2.json 2>/dev/null`; done; echo $data')
if [[ $? != 0 ]] ; then
echo \"Unable to retrieve metadata from metadata service.\"
return 1
fi
}
function get_fqdn {
# Get the instance hostname out of the metadata
fqdn=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"hostname\", \"\"))'`
if [ -z \"$fqdn\"]; then
echo \"Unable to determine hostname\"
return 1
fi
return 0
}
if ! get_metadata_config_drive || ! get_fqdn; then
if ! get_metadata_network || ! get_fqdn; then
echo \"FATAL: No metadata available or could not read the hostname from the metadata\"
exit 1
fi
fi
realm=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"krb_realm\", \"\"))'`
otp=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"ipaotp\", \"\"))'`
if [ -z \"$otp\" ]; then
echo \"FATAL: Could not read OTP from the metadata. This means that a host with the same name was already enrolled in IPA.\"
exit 1
fi
# run ipa-client-install
OPTS=\"-U -w $otp --hostname $fqdn --mkhomedir\"
if [ -n \"$realm\" ]; then
OPTS=\"$OPTS --realm=$realm\"
fi
ipa-client-install $OPTS
path: /root/setup-ipa-client.sh
permissions: '0700'
owner: root:root
runcmd:
- sh -x /root/setup-ipa-client.sh > /var/log/setup-ipa-client.log 2>&1"}}
~~~
Created attachment 1728260 [details]
setup-ipa-client.log
{"admin_pass": "vbciaFfye5QE"
"random_seed": "ZcJ3R+j9wVFFdFJYubnlnc5XIZr5DlIFF7Wt33Mb97bF3lU0RilTwVUaB0Y7dSqrOZjYqyrQZcWXvrUazOAspuoEmMEPMiwCQgtZPU7KOPYgr5Zf7uBuxTCuKbI1QkujxgSb+WPbR1QYsXdUqAijAdOvoMjU2rV7a+2fus0jpPE9qGhWK/sJYNOxWilMKq0hLB4lPl/pQJOoNyIOnLAtJDPbipqx3WmG2HAZaXTDhCBNy8sRRKLkiqC+AJ6g4DDpygqGtTockdMRGiatxSFNBK5i1ANiF4ecjgTWFa52xVk1ZuqMl2CdpLG5aF89kqXX0SD95O0NZ3Rq3DvKGndEOzzBHkwgwPHUFhnDBLwTnTXViRH5z6iKCx1MuC0MBTekrl4RVu3XGi+Q+NQhe/THzIIL6zeV9GqQcwgSS5oT+cWzCS2IUnF6gwP9IE2LDglKeC/eLspu8S8hv1BQtB74PkluTQDWtU270Hid8ek9kfzkgH1qawSL9tqqwaKWvzaVz03+qN4XcOBbXIy3FUkw/s7zGckdydyl8dh/kH5X0/NZa3QhVMJP1k5npYOaNAemVhKEenpWbN3quH27hzZ/W69AvWUptCVFaLZzLZEHCS7/zbctdblcYyyKLbD+n1lvYhh9bReFpPeOaD5a59OxBJOr4qPaXUE6/KvfaeUMivA="
"uuid": "e5d1b72b-c114-4cfd-baa6-b2b7bb4d16de"
"availability_zone": "nova"
"keys": [{"data": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9yrgjkk2hsgmSU2/1YjJLDFrmEs3fOnDCOCd1Qc0ETq8fQyPAiZvbC8xiYWMUTdG8741irib28+ujP1LoUmvJo5j45+JbikNQiEcgTEVCqJy1eheAKkL8ES8Xq/HQ7pTmxYwoxOQHoqOFEPOwY3ICJ6GQObdD0/7n530eNx5SwfzjfJ8Zs9bxONRXv/b38TgcIkCWpfxucFzEo8ZQFhSbtO253Vd/s7gHTKGpE9xMdx304F+A0yOGTGQIVKtH7D1FPtJj6OMmnLuaqpeA6G6ODnQdNSAmweDZNGpxvQ3/BKuGcU+s9MXJkRIHLVgxyOrK7WjENPBKLzMvEhexSSSz"
"type": "ssh"
"name": "default"}]
"hostname": "controller-1"
"launch_index": 0
"devices": []
"meta": {"compact_service_libvirt-vnc": "[\"internalapi\"]"
"compact_service_novnc-proxy": "[\"internalapi\"]"
"managed_service_mysqlinternal_api": "mysql/overcloud.internalapi.redhat.local"
"managed_service_haproxyctlplane": "haproxy/overcloud.ctlplane.redhat.local"
"compact_service_redis": "[\"internalapi\"]"
"managed_service_haproxystorage": "haproxy/overcloud.storage.redhat.local"
"compact_service_mysql": "[\"internalapi\"]"
"compact_service_HTTP": "[\"ctlplane\", \"storage\", \"storagemgmt\", \"internalapi\", \"external\"]"
"managed_service_haproxyexternal": "haproxy/overcloud.redhat.local"
"compact_service_haproxy": "[\"ctlplane\", \"storage\", \"storagemgmt\", \"internalapi\"]"
"compact_service_rabbitmq": "[\"internalapi\"]"
"ipa_enroll": "true"
"compact_service_neutron": "[\"internalapi\"]"
"managed_service_haproxyinternal_api": "haproxy/overcloud.internalapi.redhat.local"
"managed_service_redisinternal_api": "redis/overcloud.internalapi.redhat.local"
"managed_service_haproxystorage_mgmt": "haproxy/overcloud.storagemgmt.redhat.local"}
"public_keys": {"default": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9yrgjkk2hsgmSU2/1YjJLDFrmEs3fOnDCOCd1Qc0ETq8fQyPAiZvbC8xiYWMUTdG8741irib28+ujP1LoUmvJo5j45+JbikNQiEcgTEVCqJy1eheAKkL8ES8Xq/HQ7pTmxYwoxOQHoqOFEPOwY3ICJ6GQObdD0/7n530eNx5SwfzjfJ8Zs9bxONRXv/b38TgcIkCWpfxucFzEo8ZQFhSbtO253Vd/s7gHTKGpE9xMdx304F+A0yOGTGQIVKtH7D1FPtJj6OMmnLuaqpeA6G6ODnQdNSAmweDZNGpxvQ3/BKuGcU+s9MXJkRIHLVgxyOrK7WjENPBKLzMvEhexSSSz"}
"project_id": "61d63d0900df4ceaaa9ca08353af64a8"
"name": "controller-1"}
David, Thanks for the data. From what I see in https://bugzilla.redhat.com/show_bug.cgi?id=1895758#c5, there appears to be metadata that indicates that the ceph_rgw data should be set: properties: compact_service_HTTP='["ctlplane", "storage", "storagemgmt", "internalapi", "external"]', compact_service_ceph_rgw='["storage"]', compact_service_haproxy='["ctlplane", ... With this data, this principal should have been added by novajoin (krbprincipalname=ceph_rgw/controller-1.storage.redhat.local,cn=services,cn=accounts,dc=redhat,dc=local), but was not -- probably because of the code issue I just mentioned. I think at this point, we have enough information to conclude this is likely a bug. Would you be able to test a small fix to THT to test out the theory I suggested above? Also, what is the data in https://bugzilla.redhat.com/show_bug.cgi?id=1895758#c8 ? Actually, not sure if we could do a simple THT fix .. Sorry I should have mentioned it in the comment, that is the meta_data on controller one. It was located on the first partition of the disk, so I believe this would be the metadata from the server when it was deployed in OSP 13. How is the does novajoin add these services, is it from the service on the director or part of the IPA enrolment on the overcloud node? @dsedgman, see my comment in https://bugzilla.redhat.com/show_bug.cgi?id=1895758#c2 above. Novajoin adds this through the service on the director. (https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/ipa/ipaclient-baremetal-ansible.yaml) The service invokes the script /root/setup-ipa-client.sh on the node - which retrieves the metadata -> which triggers novajoin to add the services. Unfortunately, right now, because of the logic I pointed out above, we are not running this script on upgrade. So -> service in director -> host_prep_tasks -> run setup-ipa-client.sh script -> retrieve metadata -> invoke novajoin metadata service -> add services Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0817 |