Description of problem: Overcloud nodes lose a custom domain name on reboot. Version-Release number of selected component (if applicable): OSP 8 How reproducible: Create a yaml environment file containing: parameter_defaults: CloudDomain: thedomain Deploy the overcloud including the environment file in the "openstack overcloud deploy" command using -e. ssh onto an overcloud node and "cat /etc/hostname". Note that the file contains ".localdomain" as the domain name and not the one specified. Reboot the node and note that the custom domain name is lost. Steps to Reproduce: 1. See above. Actual results: Custom domain name is lost on reboot. Expected results: Custom domain name should be retained on reboot. Additional info: The following is from an email conversation with Steve Hardy: *However* I noticed this isn't set correctly: $ cat /etc/hostname overcloud-controller-0.localdomain Also these settings don't appear to survive a reboot, I rebooted the controller, then I see: $ hostname -f localhost [heat-admin@overcloud-controller-0 ~]$ hostname overcloud-controller-0.localdomain [heat-admin@overcloud-controller-0 ~]$ cat /etc/hostname overcloud-controller-0.localdomain Here we can see why: [root@overcloud-controller-0 ~]# journalctl | grep hostname | grep CLOUDINIT | grep hostnamectl May 13 09:27:56 overcloud-controller-0 cloud-init[1086]: [CLOUDINIT] util.py[DEBUG]: Running command ['hostnamectl', 'set-hostname', 'overcloud-controller-0.localdomain'] with allowed return codes [0] (shell=False, capture=True) May 13 09:58:42 overcloud-controller-0.localdomain cloud-init[1119]: [CLOUDINIT] util.py[DEBUG]: Running command ['hostnamectl', 'set-hostname', 'overcloud-controller-0.localdomain'] with allowed return codes [0] (shell=False, capture=True) And this is the code that does it: http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/sources/__init__.py#L195 Which gets its data from nova: [heat-admin@overcloud-controller-0 ~]$ curl http://169.254.169.254/latest/meta-data/local-hostname overcloud-controller-0 As we can see from the cloud-init code, it defaults to "localdomain" because local-hostname isn't a fqdn. #### Solutions #### So. At this point we have two options: 1 - Get local-hostname metadata to reflect the metadata we want This is done by setting the dhcp_domain in nova.conf (not neutron as mentioned above). I set it then restarted the nova services, then re-deployed and everything (including /etc/hostname) is set correctly and persists over reboot. 2 - Stop cloud-init messing with the hostnames and make the initial setting persistent Clearly we ideally want to stop cloud-init messing with the correct CloudDomain derived hostname, so that we can allow operators to specify a domain via CloudDomain and have things just work. I'm looking at the cleanest ways to do this, but the basic steps will be to correctly persist the CloudDomain derived fqdn on deployment, and disable the cloud-init update_hostnames module so it survives reboot, exact implementation tbc. I raised this upstream bug so we can track (2): https://bugs.launchpad.net/tripleo/+bug/1581472
"To clarify, there is a workaround for this, which is to set dhcp_domain to match CloudDomain in nova.conf, the final fix is still TODO tho." ==> AFAICT this only works on redeployment though. actually only for newly craeted nodes? Or am I wrong? Because the actual hostname/domain name comes from the configdrive, and not from the http call to metadata agent. And the config drive data persists, as far as I see it. At least at a customer's site we found that, and could find a discrepancy between: curl://169.254.169.254/openstack/2013-10-17/meta_data.json (which provided o.k. data) and between mounting the config drive: mount /dev/disk/by-label/config-2 /mnt/config cat /mnt/config/openstack/2013-10-17/meta_data.json (which did not contain the domain name) Andreas Karis (akaris) wrote a moment ago: #15 -------------------------------------------------------- Here's another workaround which is permanent (it won't be modified by director) and which can easily be pushed via postconfig or ansible: ~~~ [root@compute-0 cloud.cfg.d]# cat /etc/cloud/cloud.cfg.d/99_hostname.cfg #cloud-config hostname: compute-0 fqdn: compute-0.example.com ~~~
What is the proper procedure for using Director to deploy the Domain Name and Hostname correctly? As the work arounds above are not repeatable.
Hi, Can you please have a look at the suggested workaround here? https://access.redhat.com/solutions/2838221 It goes along with comment #4 "As the work arounds above are not repeatable" The above workaround is repeatable and would not be overwritten by Director, if this is your concern. Otherwise, I may misunderstand your statement / question. Could you clarify the question if this is the case? Thanks, Andreas
So that workaround requires you to configure each individual server, how would you do this in Director. I am trying to reduce touch points.
Hi Andreas, Randy responded to your request for clarification. Would like to hear your feedback and if this bug will be fixed. They are looking for the fix in OSP 10 (as mentioned by them in comment 10 of BZ 1336955 [0]). In BZ 1336955, shardy mentions that BZ is a duplicate of this one [1]. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1336955#c10 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1336955#c5 Thanks, Sean
Dell EMC is looking for this fix to be completed and backported to OSP 10.
Actually, it appears this BZ was fixed in OSP 11 in the following BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1391758 How do we get it backported to OSP 10? Can we use this BZ to do that or is a new BZ required? We already have three others to choose from BZ 1336952 BZ 1399735 BZ 1336955 Sean
Haven't seen this in JS 6.0.1 (osp8) or later. Can we close? Was it backported?
We've verified this works as expected in OSP10+. We won't be backporting any fixes for this at this time. If there are new issues, feel free to open this bug again and we can re-evaluate if there's something we can do.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days