Bug 1336952 - Overcloud nodes forget custom domain name on reboot - OSP 8
Summary: Overcloud nodes forget custom domain name on reboot - OSP 8
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Emilien Macchi
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks: 1261979 1399735
TreeView+ depends on / blocked
 
Reported: 2016-05-17 21:25 UTC by Chris Dearborn
Modified: 2023-09-14 03:22 UTC (History)
26 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1399735 (view as bug list)
Environment:
Last Closed: 2018-02-28 22:48:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1581472 0 None None None 2016-06-09 13:51:36 UTC

Description Chris Dearborn 2016-05-17 21:25:07 UTC
Description of problem:
Overcloud nodes lose a custom domain name on reboot.

Version-Release number of selected component (if applicable):
OSP 8

How reproducible:
Create a yaml environment file containing:
parameter_defaults:
  CloudDomain: thedomain

Deploy the overcloud including the environment file in the "openstack overcloud deploy" command using -e.

ssh onto an overcloud node and "cat /etc/hostname".  Note that the file contains ".localdomain" as the domain name and not the one specified.

Reboot the node and note that the custom domain name is lost.

Steps to Reproduce:
1. See above.

Actual results:
Custom domain name is lost on reboot.

Expected results:
Custom domain name should be retained on reboot.

Additional info:
The following is from an email conversation with Steve Hardy:

*However* I noticed this isn't set correctly:
$ cat /etc/hostname
overcloud-controller-0.localdomain

Also these settings don't appear to survive a reboot, I rebooted the controller, then I see:

$ hostname -f
localhost
[heat-admin@overcloud-controller-0 ~]$ hostname overcloud-controller-0.localdomain

[heat-admin@overcloud-controller-0 ~]$ cat /etc/hostname overcloud-controller-0.localdomain

Here we can see why:

[root@overcloud-controller-0 ~]# journalctl  | grep hostname | grep CLOUDINIT | grep hostnamectl May 13 09:27:56 overcloud-controller-0 cloud-init[1086]: [CLOUDINIT]
util.py[DEBUG]: Running command ['hostnamectl', 'set-hostname', 'overcloud-controller-0.localdomain'] with allowed return codes [0] (shell=False, capture=True) May 13 09:58:42 overcloud-controller-0.localdomain cloud-init[1119]:
[CLOUDINIT] util.py[DEBUG]: Running command ['hostnamectl', 'set-hostname', 'overcloud-controller-0.localdomain'] with allowed return codes [0] (shell=False, capture=True)

And this is the code that does it:
http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/sources/__init__.py#L195

Which gets its data from nova:
[heat-admin@overcloud-controller-0 ~]$ curl http://169.254.169.254/latest/meta-data/local-hostname
overcloud-controller-0

As we can see from the cloud-init code, it defaults to "localdomain"
because local-hostname isn't a fqdn.

#### Solutions ####

So.  At this point we have two options:

1 - Get local-hostname metadata to reflect the metadata we want

This is done by setting the dhcp_domain in nova.conf (not neutron as mentioned above).  I set it then restarted the nova services, then re-deployed and everything (including /etc/hostname) is set correctly and persists over reboot.

2 - Stop cloud-init messing with the hostnames and make the initial setting persistent

Clearly we ideally want to stop cloud-init messing with the correct CloudDomain derived hostname, so that we can allow operators to specify a domain via CloudDomain and have things just work.

I'm looking at the cleanest ways to do this, but the basic steps will be to correctly persist the CloudDomain derived fqdn on deployment, and disable the cloud-init update_hostnames module so it survives reboot, exact implementation tbc.

I raised this upstream bug so we can track (2):

https://bugs.launchpad.net/tripleo/+bug/1581472

Comment 4 Andreas Karis 2016-12-28 19:48:43 UTC
"To clarify, there is a workaround for this, which is to set dhcp_domain to match CloudDomain in nova.conf, the final fix is still TODO tho."

==> AFAICT this only works on redeployment though. actually only for newly craeted nodes? Or am I wrong? Because the actual hostname/domain name comes from the configdrive, and not from the http call to metadata agent. And the config drive data persists, as far as I see it. At least at a customer's site we found that, and could find a discrepancy between:
curl://169.254.169.254/openstack/2013-10-17/meta_data.json (which provided o.k. data)
and between mounting the config drive:
 mount /dev/disk/by-label/config-2 /mnt/config
 cat /mnt/config/openstack/2013-10-17/meta_data.json (which did not contain the domain name)
Andreas Karis (akaris) wrote a moment ago: 	#15

--------------------------------------------------------

Here's another workaround which is permanent (it won't be modified by director) and which can easily be pushed via postconfig or ansible:
~~~
[root@compute-0 cloud.cfg.d]# cat /etc/cloud/cloud.cfg.d/99_hostname.cfg
#cloud-config
hostname: compute-0
fqdn: compute-0.example.com
~~~

Comment 5 Randy Perryman 2017-01-11 19:00:53 UTC
What is the proper procedure for using Director to deploy the Domain Name and Hostname correctly?  

As the work arounds above are not repeatable.

Comment 6 Andreas Karis 2017-01-11 23:44:50 UTC
Hi,

Can you please have a look at the suggested workaround here?
https://access.redhat.com/solutions/2838221
It goes along with comment #4

"As the work arounds above are not repeatable"
The above workaround is repeatable and would not be overwritten by Director, if this is your concern. Otherwise, I may misunderstand your statement / question. Could you clarify the question if this is the case?

Thanks,

Andreas

Comment 7 Randy Perryman 2017-01-13 15:23:10 UTC
So that workaround requires you to configure each individual server, how would you do this in Director. I am trying to reduce touch points.

Comment 8 Sean Merrow 2017-03-15 13:40:15 UTC
Hi Andreas,

Randy responded to your request for clarification. Would like to hear your feedback and if this bug will be fixed. They are looking for the fix in OSP 10 (as mentioned by them in comment 10 of BZ 1336955 [0]). In BZ 1336955, shardy mentions that BZ is a duplicate of this one [1].

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1336955#c10
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1336955#c5

Thanks,
Sean

Comment 9 Sean Merrow 2017-06-15 15:07:22 UTC
Dell EMC is looking for this fix to be completed and backported to OSP 10.

Comment 10 Sean Merrow 2017-06-15 15:13:11 UTC
Actually, it appears this BZ was fixed in OSP 11 in the following BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1391758

How do we get it backported to OSP 10? Can we use this BZ to do that or is a new BZ required?  We already have three others to choose from

BZ 1336952
BZ 1399735
BZ 1336955

Sean

Comment 11 Wayne Allen 2017-07-14 21:58:23 UTC
Haven't seen this in JS 6.0.1 (osp8) or later. Can we close? Was it backported?

Comment 14 Alex Schultz 2018-02-28 22:48:50 UTC
We've verified this works as expected in OSP10+. We won't be backporting any fixes for this at this time. If there are new issues, feel free to open this bug again and we can re-evaluate if there's something we can do.

Comment 15 Red Hat Bugzilla 2023-09-14 03:22:49 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.