Description of problem: configuration drive isn't returning the same information as the metadata services and nova-join fails to enroll new nodes because of the following code logic: if ! get_metadata_config_drive; then if ! get_metadata_network; then echo \"FATAL: No metadata available\" exit 1 fi fi as you can see with the following outputs: { "static": { "cloud-init": "#cloud-config\npackages:\n - python-simplejson\n - ipa-client\n - ipa-admintools\n - openldap-clients\n - hostname\nwrite_files:\n - content: |\n #!/bin/sh\n \n function get_metadata_config_drive {\n if [ -f /run/cloud-init/status.json ]; then\n # Get metadata from config drive\n data=`cat /run/cloud-init/status.json`\n config_drive=`echo $data | python -c 'import json,re,sys;obj=json.load(sys.stdin);ds=obj.get(\"v1\", {}).get(\"datasource\"); print(re.findall(r\"source=(.*)]\", ds)[0])'`\n if [[ -b $config_drive ]]; then\n temp_dir=`mktemp -d`\n mount $config_drive $temp_dir\n if [ -f $temp_dir/openstack/latest/vendor_data2.json ]; then\n data=`cat $temp_dir/openstack/latest/vendor_data2.json`\n umount $config_drive\n rmdir $temp_dir\n else\n umount $config_drive\n rmdir $temp_dir\n fi\n else \n echo \"Unable to retrieve metadata from config drive.\"\n return 1\n fi\n else\n echo \"Unable to retrieve metadata from config drive.\"\n return 1\n fi\n \n return 0\n }\n \n function get_metadata_network {\n # Get metadata over the network\n data=$(timeout 300 /bin/bash -c 'data=\"\"; while [ -z \"$data\" ]; do sleep $[ ( $RANDOM % 10 ) + 1 ]s; data=`curl -s http://169.254.169.254/openstack/2016-10-06/vendor_data2.json 2>/dev/null`; done; echo $data')\n \n if [[ $? != 0 ]] ; then\n echo \"Unable to retrieve metadata from metadata service.\"\n return 1\n fi\n }\n \n \n if ! get_metadata_config_drive; then\n if ! get_metadata_network; then\n echo \"FATAL: No metadata available\"\n exit 1\n fi\n fi\n \n # Get the instance hostname out of the metadata\n fqdn=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"hostname\", \"\"))'`\n \n if [ -z \"$fqdn\" ]; then\n echo \"Unable to determine hostname\"\n exit 1\n fi\n \n realm=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"krb_realm\", \"\"))'`\n otp=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"ipaotp\", \"\"))'`\n \n hostname=`/bin/hostname -f`\n \n # run ipa-client-install\n OPTS=\"-U -w $otp\"\n if [ $hostname != $fqdn ]; then\n OPTS=\"$OPTS --hostname $fqdn\"\n fi\n if [ -n \"$realm\" ]; then\n OPTS=\"$OPTS --realm=$realm\"\n fi\n ipa-client-install $OPTS\n path: /root/setup-ipa-client.sh\n permissions: '0700'\n owner: root:root\nruncmd:\n- sh -x /root/setup-ipa-client.sh > /var/log/setup-ipa-client.log 2>&1" }, "join": { "krb_realm": "IDM.localdomain", "hostname": "overcloud-controller-1.idm.localdomain" } } versus: [root@overcloud-controller-1 ~]# curl -s http://169.254.169.254/openstack/2016-10-06/vendor_data2.json | jq . { "static": { "cloud-init": "#cloud-config\npackages:\n - python-simplejson\n - ipa-client\n - ipa-admintools\n - openldap-clients\n - hostname\nwrite_files:\n - content: |\n #!/bin/sh\n \n function get_metadata_config_drive {\n if [ -f /run/cloud-init/status.json ]; then\n # Get metadata from config drive\n data=`cat /run/cloud-init/status.json`\n config_drive=`echo $data | python -c 'import json,re,sys;obj=json.load(sys.stdin);ds=obj.get(\"v1\", {}).get(\"datasource\"); print(re.findall(r\"source=(.*)]\", ds)[0])'`\n if [[ -b $config_drive ]]; then\n temp_dir=`mktemp -d`\n mount $config_drive $temp_dir\n if [ -f $temp_dir/openstack/latest/vendor_data2.json ]; then\n data=`cat $temp_dir/openstack/latest/vendor_data2.json`\n umount $config_drive\n rmdir $temp_dir\n else\n umount $config_drive\n rmdir $temp_dir\n fi\n else \n echo \"Unable to retrieve metadata from config drive.\"\n return 1\n fi\n else\n echo \"Unable to retrieve metadata from config drive.\"\n return 1\n fi\n \n return 0\n }\n \n function get_metadata_network {\n # Get metadata over the network\n data=$(timeout 300 /bin/bash -c 'data=\"\"; while [ -z \"$data\" ]; do sleep $[ ( $RANDOM % 10 ) + 1 ]s; data=`curl -s http://169.254.169.254/openstack/2016-10-06/vendor_data2.json 2>/dev/null`; done; echo $data')\n \n if [[ $? != 0 ]] ; then\n echo \"Unable to retrieve metadata from metadata service.\"\n return 1\n fi\n }\n \n \n if ! get_metadata_config_drive; then\n if ! get_metadata_network; then\n echo \"FATAL: No metadata available\"\n exit 1\n fi\n fi\n \n # Get the instance hostname out of the metadata\n fqdn=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"hostname\", \"\"))'`\n \n if [ -z \"$fqdn\" ]; then\n echo \"Unable to determine hostname\"\n exit 1\n fi\n \n realm=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"krb_realm\", \"\"))'`\n otp=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"ipaotp\", \"\"))'`\n \n hostname=`/bin/hostname -f`\n \n # run ipa-client-install\n OPTS=\"-U -w $otp\"\n if [ $hostname != $fqdn ]; then\n OPTS=\"$OPTS --hostname $fqdn\"\n fi\n if [ -n \"$realm\" ]; then\n OPTS=\"$OPTS --realm=$realm\"\n fi\n ipa-client-install $OPTS\n path: /root/setup-ipa-client.sh\n permissions: '0700'\n owner: root:root\nruncmd:\n- sh -x /root/setup-ipa-client.sh > /var/log/setup-ipa-client.log 2>&1" }, "join": { "krb_realm": "IDM.localdomain", "ipaotp": "41cca7447d054e5eaa06b100dac38629", "hostname": "overcloud-controller-1.idm.localdomain" } } the ipaotp value is missing in the later one. Version-Release number of selected component (if applicable): Latest How reproducible: All nodes Steps to Reproduce: 1. Deploy this overcloud in this environment. 2. 3. Actual results: Nodes fail to enroll with IPA Expected results: No issues Additional info:
My interpretation of the original report is that there are two issues: * not reading metadata from network, when we don't have it in config-drive, which we fix in https://review.opendev.org/#/c/677765/ * missing OTP token in config-drive metadata, which we suspect is because a host with the same name was already enrolled in IPA when the node was provisioned. It was deleted afterwards. We are adding better logging for this case in https://review.opendev.org/#/c/677455/ It might be the case, that the host was deleted too late, while a new one was already being provisioned. Hosts are deleted from IPA asynchronously via notifications which are sent by nova, received by the novajoin-notifier service. Their processing could be delayed and retried, depending on connectivity between nova <-> rabbitmq <-> novajoin <-> IPA.
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3791