Description of problem: The Director only supports a hostname format when assigning hostnames to overcloud nodes. We need to assign hostnames per our naming scheme, which includes the physical location of each host. This aides our operations group in physically locating hosts during troubleshooting. We also need to be able to specify which IP address is assign to which host, as these IP addresses will already have been entered into DNS. The Director currently supports: controller-0/1/2, compute-0/1/2, Etc names. We need: crt-omcntl-c1n1, crt-omcom-c1n3 (the cntl = controller and com = compute, the c1n1 represents chassis and node location within the chassis) We understand that the current product lacks a feature to accomplish the above, however we require assistance coming up with a work-around which does not break the build chain (we can still use the undercloud to replace overcloud nodes or grow the overcloud). Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
overcloud-without-mergepy.yaml # Hostname format for each role # Note %index% is translated into the index of the node, e.g 0/1/2 etc # and %stackname% is replaced with OS::stack_name in the template below. # If you want to use the heat generated names, pass '' (empty string). ControllerHostnameFormat: type: string description: Format for Controller node hostnames default: '%stackname%-controller-%index%' ComputeHostnameFormat: type: string description: Format for Compute node hostnames default: '%stackname%-compute-%index%'
This should probably be part of the test matrix, but I'm 99% sure it will break things that depend on host names.
I am thinking of a scenario where the hostnames are already set of the nodes by following yaml : $ cat hostnames.yaml parameter_defaults: ControllerHostnameFormat: qestack-ctrl%index% ComputeHostnameFormat: qestack-c%index% CephStorageHostnameFormat: qestack-ceph%index% -------------- I am trying to have the qestack-ctrl1 qestack-ctrl2 .... specific IP from internal network , based on their index . $ cat environments/ips-from-pool.yaml resource_registry: OS::TripleO::Controller::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml parameter_defaults: ControllerIPs: internal_api: {list_join: ['',['192.168.124.5', {get_param: NodeIndex}]]} Then i realised NodeIndex is not what i expected. -------------- I tried again using %index% $ cat environments/ips-from-pool.yaml resource_registry: OS::TripleO::Controller::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml parameter_defaults: ControllerIPs: internal_api: 192.168.124.5%index% But that didn't come as expected : [stack@instack default_woceph2]$ nova list +--------------------------------------+------------------------+--------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------------+--------+------------+-------------+---------------------+ | 3064a0e4-0fcb-47c9-a4c2-a0dbab3f6c62 | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.0.2.19 | | eede9fe4-edf0-4341-9aaf-721e018631f3 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.0.2.21 | | 7d771585-c63a-4e96-8c50-1c2276e92d02 | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.0.2.18 | | 223d62d8-bdd3-4fd2-aa29-ebe358d3f398 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.0.2.20 | +--------------------------------------+------------------------+--------+------------+-------------+---------------------+ [stack@instack default_woceph2]$ ssh heat-admin.2.18 sudo ifconfig vlan30 vlan30: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 9.0.0.0 netmask 255.255.255.0 broadcast 9.0.0.255 inet6 fe80::2ca8:4aff:fec2:17ab prefixlen 64 scopeid 0x20<link> ether 2e:a8:4a:c2:17:ab txqueuelen 0 (Ethernet) RX packets 117 bytes 4998 (4.8 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 18 bytes 1068 (1.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [stack@instack default_woceph2]$ ssh heat-admin.2.20 sudo ifconfig vlan30 vlan30: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 2.0.0.0 netmask 255.255.255.0 broadcast 2.0.0.255 inet6 fe80::305d:c2ff:fe2a:26fc prefixlen 64 scopeid 0x20<link> ether 32:5d:c2:2a:26:fc txqueuelen 0 (Ethernet) RX packets 146 bytes 6160 (6.0 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 18 bytes 1068 (1.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [stack@instack default_woceph2]$ ssh heat-admin.2.21 sudo ifconfig vlan30 vlan30: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 1.0.0.0 netmask 255.255.255.0 broadcast 1.0.0.255 inet6 fe80::3c60:76ff:fe86:6d3d prefixlen 64 scopeid 0x20<link> ether 3e:60:76:86:6d:3d txqueuelen 0 (Ethernet) RX packets 162 bytes 6832 (6.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 18 bytes 1068 (1.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 --------------------------------------------------------------------------- Need help to implement ./network/ports/from_pool.yaml in such a way that i can control the each %index% node of controller / compute / ceph gets . Regards, Jaison R
I also used something as this , just to make sure i can assign a pool to controller's intenalapi interface , but puppet manifest fails to get this value while assigning on node . $ cat environments/ips-from-pool.yaml resource_registry: OS::TripleO::Controller::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml parameter_defaults: ControllerIPs: internal_api: {'192.168.124.50','192.168.124.51','192.168.124.52','192.168.124.53'}
I believe we have added this feature to 8, but it will not be backported to 7.x (no feature backports). Giulio, can you suggest a workaround?
Looking at comment #11 , the furthest we can get in 7 is with a parameter_defaults: that looks like the following: parameter_defaults: ControllerHostnameFormat: qestack-ctrl%index% ControllerIPs: internal_api: - 192.168.124.50 - 192.168.124.51 - 192.168.124.52 - 192.168.124.53 and the expectations are that: qestack-ctrl0 gets 192.168.124.50 (on internal_api) qestack-ctrl1 gets 192.168.124.51 ...
Jaison, Jeremy, can you test if the workaround in comment #16 works as expected?
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
We recently added some features upstream which I believe address this requirement: http://docs.openstack.org/developer/tripleo-docs/advanced_deployment/node_placement.html The way this works is you tag nodes in ironic so we can select them for a specific node capability, e.g "node:controller-0" Then in your environment file you can specify a scheduler hint that will cause nova to select nodes based on this capability/tag: Additionally we added a HostnameMap interface, which enables mapping from the default hostnames (e.g those specified via ControllerHostnameFormat) to any hostname required. E.g: parameter_defaults: ControllerSchedulerHints: 'capabilities:node': 'controller-%index%' HostnameMap: overcloud-controller-0: overcloud-controller-prod-123-0 overcloud-controller-1: overcloud-controller-prod-456-0 overcloud-controller-2: overcloud-controller-prod-789-0 (Note I did not change ControllerHostnameFormat here or the keys to HostnameMap would be different). This can then be combined with the ControllerIPs examples from previous comments to achive predictable placement, full control of the node hostnames and predictable IP assignment. Awaiting confirmation from Mike as to which downstream builds this will be available in, but as previously mentioned I do not consider this a candidate for 7.x backport.
Based on comments, this is resolved in OSP 8, just not updated. Moving the bug back to OSP 8 for testing.
HostnameFormat was verified in 7.3, and node placement with predictable IPs and hostname mapping were verified in 8.0.
*** Bug 1261633 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0637.html
https://bugzilla.redhat.com/show_bug.cgi?id=1278854 The controller replacement procedure relies on the node index. Will the predictable hostname feature break the controller replacement procedure by changing or obscuring the node index?
(In reply to jliberma from comment #30) > Will the predictable hostname feature break the controller replacement > procedure by changing or obscuring the node index? That's a good point and I think it was never tested. I am trying to test it now, but having problems with all my test environments at the moment :(. I am also trying to contact the relevant developers and QE's (unfortunately it's a holiday), so sorry that I don't have an answer yet.
I mapped the controller host names to oc-ctl-a, oc-ctl-b and oc-ctl-c. Even though the resource index numbers are not featured any more in the controller names, it is still possible to know what the index is because the mapping was provided by the user and you can use it to map back to the index number. Another way to know the index number is to use heat resource-list and heat resource-show: heat resource-list -n2 overcloud |grep "TripleO::Controller " ... | 0 | ... | OS::TripleO::Controller | ... | overcloud-Controller-uflsluh2ssxj | heat resource-show overcloud-Controller-uflsluh2ssxj 0 ... | logical_resource_id | 1 | attributes | { ... | | "hostname": "oc-ctl-b", ... You can then use this index in the removal policy. I successfully removed a node this way, so it looks like the feature is working, however I was not 100% successful because corosync didn't come up on the new node and I never figured out why. You can test it also and see if you're able to complete the process, just to be sure that the failure is not related to using predictable hostnames. Also note that the new node that will be created gets the next index number - so make sure you map that as well to a hostname of your choosing.
Udi -- thanks for looking into this. Do you think we should document using the hostname attribute to identify the logical_resource_id in the controller replacement procedure? Currently the replacement procedure says to use the last number in the hostname as the index, but 1) the last number may be absent if using a predictable hostname 2) the last number may be wrong if a node has been replaced Corosync not starting after the replacement procedure might be related to this pcs bug introduced in a RHEL 7.2 z-stream update: https://bugzilla.redhat.com/show_bug.cgi?id=1326507 Thanks, Jacob
We should certainly document how to determine the index number, either by referring back to the original mappings used during the deploy, or by the heat commands as in the example I showed. And, I don't think the last number is wrong if the node was replaced (it gets the next index and it's correct).