1278854 – RFE: Assign pre-determined hostnames to overcloud nodes

Bug 1278854 - RFE: Assign pre-determined hostnames to overcloud nodes

Summary: RFE: Assign pre-determined hostnames to overcloud nodes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ga
Target Release:	8.0 (Liberty)
Assignee:	Angus Thomas
QA Contact:	Udi Kalifon
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1261633 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-11-06 14:50 UTC by Jeremy
Modified:	2023-02-22 23:02 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-04-15 14:30:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0637	0	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 8 director release candidate Bug Fix Advisory	2016-04-15 18:28:05 UTC

Description Jeremy 2015-11-06 14:50:06 UTC

Description of problem:
The Director only supports a hostname format when assigning hostnames to overcloud nodes. We need to assign hostnames per our naming scheme, which includes the physical location of each host. This aides our operations group in physically locating hosts during troubleshooting. We also need to be able to specify which IP address is assign to which host, as these IP addresses will already have been entered into DNS.

The Director currently supports: controller-0/1/2, compute-0/1/2, Etc names.
We need: crt-omcntl-c1n1, crt-omcom-c1n3 (the cntl = controller and com = compute, the c1n1 represents chassis and node location within the chassis)

We understand that the current product lacks a feature to accomplish the above, however we require assistance coming up with a work-around which does not break the build chain (we can still use the undercloud to replace overcloud nodes or grow the overcloud).

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jeremy 2015-11-06 14:52:12 UTC

overcloud-without-mergepy.yaml
# Hostname format for each role
# Note %index% is translated into the index of the node, e.g 0/1/2 etc
# and %stackname% is replaced with OS::stack_name in the template below.
# If you want to use the heat generated names, pass '' (empty string).
  ControllerHostnameFormat:
    type: string
    description: Format for Controller node hostnames
    default: '%stackname%-controller-%index%'
  ComputeHostnameFormat:
    type: string
    description: Format for Compute node hostnames
    default: '%stackname%-compute-%index%'

Comment 4 chris alfonso 2015-11-20 16:39:54 UTC

This should probably be part of the test matrix, but I'm 99% sure it will break things that depend on host names.

Comment 9 Jaison Raju 2016-03-08 16:46:15 UTC

I am thinking of a scenario where the hostnames are already set of the nodes by following yaml :
$ cat hostnames.yaml 
parameter_defaults:
  ControllerHostnameFormat:  qestack-ctrl%index%
  ComputeHostnameFormat:     qestack-c%index%
  CephStorageHostnameFormat: qestack-ceph%index%
--------------
I am trying to have the qestack-ctrl1 qestack-ctrl2 .... specific IP from internal network , based on their index .

$ cat environments/ips-from-pool.yaml
resource_registry:
  OS::TripleO::Controller::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml

parameter_defaults:
  ControllerIPs:
    internal_api: {list_join: ['',['192.168.124.5', {get_param: NodeIndex}]]}

Then i realised NodeIndex is not what i expected.
--------------
I tried again using %index%
$ cat environments/ips-from-pool.yaml
resource_registry:
  OS::TripleO::Controller::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml

parameter_defaults:
  ControllerIPs:
    internal_api:  192.168.124.5%index%
But that didn't come as expected :

[stack@instack default_woceph2]$ nova list
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                   | Status | Task State | Power State | Networks            |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| 3064a0e4-0fcb-47c9-a4c2-a0dbab3f6c62 | overcloud-compute-0    | ACTIVE | -          | Running     | ctlplane=192.0.2.19 |
| eede9fe4-edf0-4341-9aaf-721e018631f3 | overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.21 |
| 7d771585-c63a-4e96-8c50-1c2276e92d02 | overcloud-controller-1 | ACTIVE | -          | Running     | ctlplane=192.0.2.18 |
| 223d62d8-bdd3-4fd2-aa29-ebe358d3f398 | overcloud-controller-2 | ACTIVE | -          | Running     | ctlplane=192.0.2.20 |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+

[stack@instack default_woceph2]$ ssh heat-admin.2.18 sudo ifconfig vlan30
vlan30: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 9.0.0.0  netmask 255.255.255.0  broadcast 9.0.0.255
        inet6 fe80::2ca8:4aff:fec2:17ab  prefixlen 64  scopeid 0x20<link>
        ether 2e:a8:4a:c2:17:ab  txqueuelen 0  (Ethernet)
        RX packets 117  bytes 4998 (4.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18  bytes 1068 (1.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[stack@instack default_woceph2]$ ssh heat-admin.2.20 sudo ifconfig vlan30
vlan30: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 2.0.0.0  netmask 255.255.255.0  broadcast 2.0.0.255
        inet6 fe80::305d:c2ff:fe2a:26fc  prefixlen 64  scopeid 0x20<link>
        ether 32:5d:c2:2a:26:fc  txqueuelen 0  (Ethernet)
        RX packets 146  bytes 6160 (6.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18  bytes 1068 (1.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[stack@instack default_woceph2]$ ssh heat-admin.2.21 sudo ifconfig vlan30
vlan30: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 1.0.0.0  netmask 255.255.255.0  broadcast 1.0.0.255
        inet6 fe80::3c60:76ff:fe86:6d3d  prefixlen 64  scopeid 0x20<link>
        ether 3e:60:76:86:6d:3d  txqueuelen 0  (Ethernet)
        RX packets 162  bytes 6832 (6.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18  bytes 1068 (1.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

---------------------------------------------------------------------------

Need help to implement ./network/ports/from_pool.yaml in such a way that i can
control the each %index% node of controller / compute / ceph gets .

Regards,
Jaison R

Comment 10 Jaison Raju 2016-03-08 17:12:26 UTC

I also used something as this , just to make sure i can assign a pool to controller's intenalapi interface , but puppet manifest fails to get this value while assigning on node .

$ cat environments/ips-from-pool.yaml 
resource_registry:
  OS::TripleO::Controller::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml

parameter_defaults:
  ControllerIPs:
    internal_api:  {'192.168.124.50','192.168.124.51','192.168.124.52','192.168.124.53'}

Comment 15 Hugh Brock 2016-03-21 11:43:00 UTC

I believe we have added this feature to 8, but it will not be backported to 7.x (no feature backports). Giulio, can you suggest a workaround?

Comment 16 Giulio Fidente 2016-03-21 14:31:33 UTC

Looking at comment #11 , the furthest we can get in 7 is with a parameter_defaults: that looks like the following:

parameter_defaults:
  ControllerHostnameFormat: qestack-ctrl%index%
  ControllerIPs:
    internal_api:
    - 192.168.124.50
    - 192.168.124.51
    - 192.168.124.52
    - 192.168.124.53

and the expectations are that:

qestack-ctrl0 gets 192.168.124.50 (on internal_api)
qestack-ctrl1 gets 192.168.124.51
...

Comment 17 Giulio Fidente 2016-03-21 14:34:54 UTC

Jaison, Jeremy, can you test if the workaround in comment #16 works as expected?

Comment 21 Mike Burns 2016-04-07 20:57:01 UTC

This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 23 Steven Hardy 2016-04-08 08:26:37 UTC

We recently added some features upstream which I believe address this requirement:

http://docs.openstack.org/developer/tripleo-docs/advanced_deployment/node_placement.html

The way this works is you tag nodes in ironic so we can select them for a specific node capability, e.g "node:controller-0"

Then in your environment file you can specify a scheduler hint that will cause nova to select nodes based on this capability/tag:

Additionally we added a HostnameMap interface, which enables mapping from the default hostnames (e.g those specified via ControllerHostnameFormat) to any hostname required.

E.g:

parameter_defaults:
  ControllerSchedulerHints:
    'capabilities:node': 'controller-%index%'
  HostnameMap:
    overcloud-controller-0: overcloud-controller-prod-123-0
    overcloud-controller-1: overcloud-controller-prod-456-0
    overcloud-controller-2: overcloud-controller-prod-789-0

(Note I did not change ControllerHostnameFormat here or the keys to HostnameMap would be different).

This can then be combined with the ControllerIPs examples from previous comments to achive predictable placement, full control of the node hostnames and predictable IP assignment.

Awaiting confirmation from Mike as to which downstream builds this will be available in, but as previously mentioned I do not consider this a candidate for 7.x backport.

Comment 24 Mike Burns 2016-04-08 11:46:16 UTC

Based on comments, this is resolved in OSP 8, just not updated.  Moving the bug back to OSP 8 for testing.

Comment 26 Udi Kalifon 2016-04-10 10:01:51 UTC

HostnameFormat was verified in 7.3, and node placement with predictable IPs and hostname mapping were verified in 8.0.

Comment 27 Mike Burns 2016-04-11 11:36:21 UTC

*** Bug 1261633 has been marked as a duplicate of this bug. ***

Comment 29 errata-xmlrpc 2016-04-15 14:30:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0637.html

Comment 30 jliberma@redhat.com 2016-05-29 14:57:42 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1278854

The controller replacement procedure relies on the node index.

Will the predictable hostname feature break the controller replacement procedure by changing or obscuring the node index?

Comment 31 Udi Kalifon 2016-05-30 15:17:19 UTC

(In reply to jliberma from comment #30)
> Will the predictable hostname feature break the controller replacement
> procedure by changing or obscuring the node index?

That's a good point and I think it was never tested. I am trying to test it now, but having problems with all my test environments at the moment :(. I am also trying to contact the relevant developers and QE's (unfortunately it's a holiday), so sorry that I don't have an answer yet.

Comment 32 Udi Kalifon 2016-05-31 12:21:13 UTC

I mapped the controller host names to oc-ctl-a, oc-ctl-b and oc-ctl-c. Even though the resource index numbers are not featured any more in the controller names, it is still possible to know what the index is because the mapping was provided by the user and you can use it to map back to the index number.

Another way to know the index number is to use heat resource-list and heat resource-show:
heat resource-list -n2 overcloud  |grep "TripleO::Controller "
...
| 0  | ... | OS::TripleO::Controller | ... | overcloud-Controller-uflsluh2ssxj |

heat resource-show overcloud-Controller-uflsluh2ssxj 0
...
| logical_resource_id    | 1
| attributes             | {
...
|                        |   "hostname": "oc-ctl-b",
...

You can then use this index in the removal policy. I successfully removed a node this way, so it looks like the feature is working, however I was not 100% successful because corosync didn't come up on the new node and I never figured out why. You can test it also and see if you're able to complete the process, just to be sure that the failure is not related to using predictable hostnames.

Also note that the new node that will be created gets the next index number - so make sure you map that as well to a hostname of your choosing.

Comment 33 jliberma@redhat.com 2016-05-31 12:50:59 UTC

Udi -- thanks for looking into this.

Do you think we should document using the hostname attribute to identify the logical_resource_id in the controller replacement procedure?

Currently the replacement procedure says to use the last number in the hostname as the index, but

1) the last number may be absent if using a predictable hostname
2) the last number may be wrong if a node has been replaced

Corosync not starting after the replacement procedure might be related to this pcs bug introduced in a RHEL 7.2 z-stream update: 

https://bugzilla.redhat.com/show_bug.cgi?id=1326507

Thanks, Jacob

Comment 34 Udi Kalifon 2016-05-31 14:02:22 UTC

We should certainly document how to determine the index number, either by referring back to the original mappings used during the deploy, or by the heat commands as in the example I showed. And, I don't think the last number is wrong if the node was replaced (it gets the next index and it's correct).

Note You need to log in before you can comment on or make changes to this bug.

achernet
bnemec
chris.brown
gfidente
hbrock
jbuchta
jhoffer
jliberma
jraju
jthomas
mburns
mcornea
nstephan
ochalups
racedoro
rcernin
rhel-osp-director-maint
sasha
shardy
sputhenp
tvignaud
ukalifon