1336952 – Overcloud nodes forget custom domain name on reboot - OSP 8

Bug 1336952 - Overcloud nodes forget custom domain name on reboot - OSP 8

Summary: Overcloud nodes forget custom domain name on reboot - OSP 8

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	8.0 (Liberty)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Emilien Macchi
QA Contact:	Gurenko Alex
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1261979 1399735
TreeView+	depends on / blocked

Reported:	2016-05-17 21:25 UTC by Chris Dearborn
Modified:	2023-09-14 03:22 UTC (History)
CC List:	26 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1399735 (view as bug list)
Environment:
Last Closed:	2018-02-28 22:48:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Launchpad	1581472	0	None	None	None	2016-06-09 13:51:36 UTC

Description Chris Dearborn 2016-05-17 21:25:07 UTC

Description of problem:
Overcloud nodes lose a custom domain name on reboot.

Version-Release number of selected component (if applicable):
OSP 8

How reproducible:
Create a yaml environment file containing:
parameter_defaults:
  CloudDomain: thedomain

Deploy the overcloud including the environment file in the "openstack overcloud deploy" command using -e.

ssh onto an overcloud node and "cat /etc/hostname".  Note that the file contains ".localdomain" as the domain name and not the one specified.

Reboot the node and note that the custom domain name is lost.

Steps to Reproduce:
1. See above.

Actual results:
Custom domain name is lost on reboot.

Expected results:
Custom domain name should be retained on reboot.

Additional info:
The following is from an email conversation with Steve Hardy:

*However* I noticed this isn't set correctly:
$ cat /etc/hostname
overcloud-controller-0.localdomain

Also these settings don't appear to survive a reboot, I rebooted the controller, then I see:

$ hostname -f
localhost
[heat-admin@overcloud-controller-0 ~]$ hostname overcloud-controller-0.localdomain

[heat-admin@overcloud-controller-0 ~]$ cat /etc/hostname overcloud-controller-0.localdomain

Here we can see why:

[root@overcloud-controller-0 ~]# journalctl  | grep hostname | grep CLOUDINIT | grep hostnamectl May 13 09:27:56 overcloud-controller-0 cloud-init[1086]: [CLOUDINIT]
util.py[DEBUG]: Running command ['hostnamectl', 'set-hostname', 'overcloud-controller-0.localdomain'] with allowed return codes [0] (shell=False, capture=True) May 13 09:58:42 overcloud-controller-0.localdomain cloud-init[1119]:
[CLOUDINIT] util.py[DEBUG]: Running command ['hostnamectl', 'set-hostname', 'overcloud-controller-0.localdomain'] with allowed return codes [0] (shell=False, capture=True)

And this is the code that does it:
http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/sources/__init__.py#L195

Which gets its data from nova:
[heat-admin@overcloud-controller-0 ~]$ curl http://169.254.169.254/latest/meta-data/local-hostname
overcloud-controller-0

As we can see from the cloud-init code, it defaults to "localdomain"
because local-hostname isn't a fqdn.

#### Solutions ####

So.  At this point we have two options:

1 - Get local-hostname metadata to reflect the metadata we want

This is done by setting the dhcp_domain in nova.conf (not neutron as mentioned above).  I set it then restarted the nova services, then re-deployed and everything (including /etc/hostname) is set correctly and persists over reboot.

2 - Stop cloud-init messing with the hostnames and make the initial setting persistent

Clearly we ideally want to stop cloud-init messing with the correct CloudDomain derived hostname, so that we can allow operators to specify a domain via CloudDomain and have things just work.

I'm looking at the cleanest ways to do this, but the basic steps will be to correctly persist the CloudDomain derived fqdn on deployment, and disable the cloud-init update_hostnames module so it survives reboot, exact implementation tbc.

I raised this upstream bug so we can track (2):

https://bugs.launchpad.net/tripleo/+bug/1581472

Comment 4 Andreas Karis 2016-12-28 19:48:43 UTC

"To clarify, there is a workaround for this, which is to set dhcp_domain to match CloudDomain in nova.conf, the final fix is still TODO tho."

==> AFAICT this only works on redeployment though. actually only for newly craeted nodes? Or am I wrong? Because the actual hostname/domain name comes from the configdrive, and not from the http call to metadata agent. And the config drive data persists, as far as I see it. At least at a customer's site we found that, and could find a discrepancy between:
curl://169.254.169.254/openstack/2013-10-17/meta_data.json (which provided o.k. data)
and between mounting the config drive:
 mount /dev/disk/by-label/config-2 /mnt/config
 cat /mnt/config/openstack/2013-10-17/meta_data.json (which did not contain the domain name)
Andreas Karis (akaris) wrote a moment ago: 	#15

--------------------------------------------------------

Here's another workaround which is permanent (it won't be modified by director) and which can easily be pushed via postconfig or ansible:
~~~
[root@compute-0 cloud.cfg.d]# cat /etc/cloud/cloud.cfg.d/99_hostname.cfg
#cloud-config
hostname: compute-0
fqdn: compute-0.example.com
~~~

Comment 5 Randy Perryman 2017-01-11 19:00:53 UTC

What is the proper procedure for using Director to deploy the Domain Name and Hostname correctly?  

As the work arounds above are not repeatable.

Comment 6 Andreas Karis 2017-01-11 23:44:50 UTC

Hi,

Can you please have a look at the suggested workaround here?
https://access.redhat.com/solutions/2838221
It goes along with comment #4

"As the work arounds above are not repeatable"
The above workaround is repeatable and would not be overwritten by Director, if this is your concern. Otherwise, I may misunderstand your statement / question. Could you clarify the question if this is the case?

Thanks,

Andreas

Comment 7 Randy Perryman 2017-01-13 15:23:10 UTC

So that workaround requires you to configure each individual server, how would you do this in Director. I am trying to reduce touch points.

Comment 8 Sean Merrow 2017-03-15 13:40:15 UTC

Hi Andreas,

Randy responded to your request for clarification. Would like to hear your feedback and if this bug will be fixed. They are looking for the fix in OSP 10 (as mentioned by them in comment 10 of BZ 1336955 [0]). In BZ 1336955, shardy mentions that BZ is a duplicate of this one [1].

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1336955#c10
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1336955#c5

Thanks,
Sean

Comment 9 Sean Merrow 2017-06-15 15:07:22 UTC

Dell EMC is looking for this fix to be completed and backported to OSP 10.

Comment 10 Sean Merrow 2017-06-15 15:13:11 UTC

Actually, it appears this BZ was fixed in OSP 11 in the following BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1391758

How do we get it backported to OSP 10? Can we use this BZ to do that or is a new BZ required?  We already have three others to choose from

BZ 1336952
BZ 1399735
BZ 1336955

Sean

Comment 11 Wayne Allen 2017-07-14 21:58:23 UTC

Haven't seen this in JS 6.0.1 (osp8) or later. Can we close? Was it backported?

Comment 14 Alex Schultz 2018-02-28 22:48:50 UTC

We've verified this works as expected in OSP10+. We won't be backporting any fixes for this at this time. If there are new issues, feel free to open this bug again and we can re-evaluate if there's something we can do.

Comment 15 Red Hat Bugzilla 2023-09-14 03:22:49 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.

akaris
arkady_kanevsky
aschultz
athomas
cdevine
christopher_dearborn
dbecker
ebarrera
emacchi
gael_rehault
John_walsh
jslagle
kurt_hey
mburns
morazi
owalsh
randy_perryman
rhel-osp-director-maint
rsussman
sclewis
shardy
skinjo
smerrow
sreichar
wayne_allen
wrichter