Bug 1734156 - configuration drive isn't returning the same information as the metadata services and nova-join fails to enroll new nodes
Summary: configuration drive isn't returning the same information as the metadata serv...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-novajoin
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: z9
: 13.0 (Queens)
Assignee: Grzegorz Grasza
QA Contact: Jeremy Agee
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-29 19:28 UTC by David Hill
Modified: 2019-12-02 10:11 UTC (History)
9 users (show)

Fixed In Version: python-novajoin-1.2.0-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-07 14:04:36 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 677455 'None' 'MERGED' 'Fix error message when OTP is missing, add logging' 2019-12-06 18:58:06 UTC
OpenStack gerrit 677765 'None' 'MERGED' 'Sync cloud-init script with tripleo ipaclient*.yaml' 2019-12-06 18:58:06 UTC
Red Hat Product Errata RHBA-2019:3791 None None None 2019-11-07 14:04:51 UTC

Description David Hill 2019-07-29 19:28:02 UTC
Description of problem:
configuration drive isn't returning the same information as the metadata services and nova-join fails to enroll new nodes because of the following code logic:

     if ! get_metadata_config_drive; then
        if ! get_metadata_network; then
            echo \"FATAL: No metadata available\"
            exit 1
        fi
     fi

as you can see with the following outputs:



{
  "static": {
    "cloud-init": "#cloud-config\npackages:\n - python-simplejson\n - ipa-client\n - ipa-admintools\n - openldap-clients\n - hostname\nwrite_files:\n - content: |\n     #!/bin/sh\n     \n     function get_metadata_config_drive {\n         if [ -f /run/cloud-init/status.json ]; then\n             # Get metadata from config drive\n             data=`cat /run/cloud-init/status.json`\n             config_drive=`echo $data | python -c 'import json,re,sys;obj=json.load(sys.stdin);ds=obj.get(\"v1\", {}).get(\"datasource\"); print(re.findall(r\"source=(.*)]\", ds)[0])'`\n             if [[ -b $config_drive ]]; then\n                 temp_dir=`mktemp -d`\n                 mount $config_drive $temp_dir\n                 if [ -f $temp_dir/openstack/latest/vendor_data2.json ]; then\n                     data=`cat $temp_dir/openstack/latest/vendor_data2.json`\n                     umount $config_drive\n                     rmdir $temp_dir\n                 else\n                     umount $config_drive\n                     rmdir $temp_dir\n                 fi\n             else \n                 echo \"Unable to retrieve metadata from config drive.\"\n                 return 1\n             fi\n         else\n             echo \"Unable to retrieve metadata from config drive.\"\n             return 1\n         fi\n     \n         return 0\n     }\n     \n     function get_metadata_network {\n         # Get metadata over the network\n         data=$(timeout 300 /bin/bash -c 'data=\"\"; while [ -z \"$data\" ]; do sleep $[ ( $RANDOM % 10 )  + 1 ]s; data=`curl -s http://169.254.169.254/openstack/2016-10-06/vendor_data2.json 2>/dev/null`; done; echo $data')\n     \n         if [[ $? != 0 ]] ; then\n             echo \"Unable to retrieve metadata from metadata service.\"\n             return 1\n         fi\n     }\n     \n     \n     if ! get_metadata_config_drive; then\n        if ! get_metadata_network; then\n            echo \"FATAL: No metadata available\"\n            exit 1\n        fi\n     fi\n     \n     # Get the instance hostname out of the metadata\n     fqdn=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"hostname\", \"\"))'`\n      \n     if [ -z \"$fqdn\" ]; then\n         echo \"Unable to determine hostname\"\n         exit 1\n     fi\n      \n     realm=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"krb_realm\", \"\"))'`\n     otp=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"ipaotp\", \"\"))'`\n     \n     hostname=`/bin/hostname -f`\n      \n     # run ipa-client-install\n     OPTS=\"-U -w $otp\"\n     if [ $hostname != $fqdn ]; then\n         OPTS=\"$OPTS --hostname $fqdn\"\n     fi\n     if [ -n \"$realm\" ]; then\n         OPTS=\"$OPTS --realm=$realm\"\n     fi\n     ipa-client-install $OPTS\n   path: /root/setup-ipa-client.sh\n   permissions: '0700'\n   owner: root:root\nruncmd:\n- sh -x /root/setup-ipa-client.sh > /var/log/setup-ipa-client.log 2>&1"
  },
  "join": {
    "krb_realm": "IDM.localdomain",
    "hostname": "overcloud-controller-1.idm.localdomain"
  }
}

versus:


[root@overcloud-controller-1 ~]# curl -s http://169.254.169.254/openstack/2016-10-06/vendor_data2.json | jq .
{
  "static": {
    "cloud-init": "#cloud-config\npackages:\n - python-simplejson\n - ipa-client\n - ipa-admintools\n - openldap-clients\n - hostname\nwrite_files:\n - content: |\n     #!/bin/sh\n     \n     function get_metadata_config_drive {\n         if [ -f /run/cloud-init/status.json ]; then\n             # Get metadata from config drive\n             data=`cat /run/cloud-init/status.json`\n             config_drive=`echo $data | python -c 'import json,re,sys;obj=json.load(sys.stdin);ds=obj.get(\"v1\", {}).get(\"datasource\"); print(re.findall(r\"source=(.*)]\", ds)[0])'`\n             if [[ -b $config_drive ]]; then\n                 temp_dir=`mktemp -d`\n                 mount $config_drive $temp_dir\n                 if [ -f $temp_dir/openstack/latest/vendor_data2.json ]; then\n                     data=`cat $temp_dir/openstack/latest/vendor_data2.json`\n                     umount $config_drive\n                     rmdir $temp_dir\n                 else\n                     umount $config_drive\n                     rmdir $temp_dir\n                 fi\n             else \n                 echo \"Unable to retrieve metadata from config drive.\"\n                 return 1\n             fi\n         else\n             echo \"Unable to retrieve metadata from config drive.\"\n             return 1\n         fi\n     \n         return 0\n     }\n     \n     function get_metadata_network {\n         # Get metadata over the network\n         data=$(timeout 300 /bin/bash -c 'data=\"\"; while [ -z \"$data\" ]; do sleep $[ ( $RANDOM % 10 )  + 1 ]s; data=`curl -s http://169.254.169.254/openstack/2016-10-06/vendor_data2.json 2>/dev/null`; done; echo $data')\n     \n         if [[ $? != 0 ]] ; then\n             echo \"Unable to retrieve metadata from metadata service.\"\n             return 1\n         fi\n     }\n     \n     \n     if ! get_metadata_config_drive; then\n        if ! get_metadata_network; then\n            echo \"FATAL: No metadata available\"\n            exit 1\n        fi\n     fi\n     \n     # Get the instance hostname out of the metadata\n     fqdn=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"hostname\", \"\"))'`\n      \n     if [ -z \"$fqdn\" ]; then\n         echo \"Unable to determine hostname\"\n         exit 1\n     fi\n      \n     realm=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"krb_realm\", \"\"))'`\n     otp=`echo $data | python -c 'import json,sys;obj=json.load(sys.stdin);print(obj.get(\"join\", {}).get(\"ipaotp\", \"\"))'`\n     \n     hostname=`/bin/hostname -f`\n      \n     # run ipa-client-install\n     OPTS=\"-U -w $otp\"\n     if [ $hostname != $fqdn ]; then\n         OPTS=\"$OPTS --hostname $fqdn\"\n     fi\n     if [ -n \"$realm\" ]; then\n         OPTS=\"$OPTS --realm=$realm\"\n     fi\n     ipa-client-install $OPTS\n   path: /root/setup-ipa-client.sh\n   permissions: '0700'\n   owner: root:root\nruncmd:\n- sh -x /root/setup-ipa-client.sh > /var/log/setup-ipa-client.log 2>&1"
  },
  "join": {
    "krb_realm": "IDM.localdomain",
    "ipaotp": "41cca7447d054e5eaa06b100dac38629", 
    "hostname": "overcloud-controller-1.idm.localdomain"
  }
}


the ipaotp value is missing in the later one.





Version-Release number of selected component (if applicable):
Latest

How reproducible:
All nodes

Steps to Reproduce:
1. Deploy this overcloud in this environment.
2.
3.

Actual results:
Nodes fail to enroll with IPA

Expected results:
No issues

Additional info:

Comment 7 Grzegorz Grasza 2019-09-04 14:31:04 UTC
My interpretation of the original report is that there are two issues:
 * not reading metadata from network, when we don't have it in config-drive, which we fix in https://review.opendev.org/#/c/677765/
 * missing OTP token in config-drive metadata, which we suspect is because a host with the same name was already enrolled in IPA when the node was provisioned. It was deleted afterwards. We are adding better logging for this case in https://review.opendev.org/#/c/677455/

It might be the case, that the host was deleted too late, while a new one was already being provisioned. Hosts are deleted from IPA asynchronously via notifications which are sent by nova, received by the novajoin-notifier service. Their processing could be delayed and retried, depending on connectivity between nova <-> rabbitmq <-> novajoin <-> IPA.

Comment 16 Alex McLeod 2019-10-31 11:29:01 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 19 errata-xmlrpc 2019-11-07 14:04:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3791


Note You need to log in before you can comment on or make changes to this bug.