Red Hat Bugzilla – Bug 1613933
[Docs] The Ceph Guide for OpenStack should have a NodeDataLookup OSD list override example
Last modified: 2018-10-17 16:34:59 EDT
The Deploying an Overcloud with Containerized Red Hat Ceph document [1] section 5.1 covers Mapping the Ceph Storage Node Disk Layout. This section is good, but an additional section should be added which covers how to deal with scenarios in which a particular node may have a disk missing. TripleO supports this feature and it is documented upstream [1] but not downstream. This bug asks a new section be added to this document called something like "Mapping the Disk Layout to Non-Homogeneous Ceph Storage Nodes" which then explains what to do in this scenario and provides an example of how to do it. I will update this bugzilla with content to help the above be written. [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/deploying_an_overcloud_with_containerized_red_hat_ceph/ [2] https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/node_specific_hieradata.html
The closest example we have to this already in our documentation pertains to Nova. We need an example for Ceph too. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/advanced_overcloud_customization/#sect-Customizing_Hieradata_for_Individual_Nodes
Created attachment 1474375 [details] Heat Environment File which uses NodeDataLookup for Ceph deployment This attachment includes a Heat environment file I used in the scale lab to deal with a server which has one disk missing. All of the servers had a devices list with 35 disks except one of them had a disk missing. This environment file overrides the default devices list for only that single node and gives it the list of 34 disks it should use instead the global list.
Created attachment 1474376 [details] Updated Heat Environment File which uses NodeDataLookup for Ceph deployment Updating attachment as I accidentally attached an old version which was missing the dedicated_devices list.
Proposed content: By default all nodes of a role which will host Ceph OSDs (indicated by the OS::TripleO::Services::CephOSD service in roles_data.yaml), for example CephStorage or ComputeHCI nodes, will use the global devices list and dedicated_devices list set in section 5.1, "Mapping the Ceph Storage Node Disk Layout". This assumes that all of these servers have homogeneous hardware. If a subset of these do not have homogeneous hardware, then it's possible to indicate to director that each of these individual servers should have a different devices and dedicated_devices list. Also known as a "node-specific disk configuration". To pass director a node-specific disk configuration a Heat environment file, e.g. node-spec-overrides.yaml, must be passed to the `openstack overcloud deploy` command and the file's content must identify each server by a machine unique UUID and a list of local variables which override the global variables. The machine unique UUID may be extracted for each individual server by running `dmidecode -s system-uuid` on that server or it may be extracted from the Ironic database by running `openstack baremetal introspection data save NODE-ID | jq .extra.system.product.uuid` on the undercloud. Warning: If the undercloud.conf does not have inspection_extras = true prior to undercloud installation/upgrade and introspection, then the machine unique UUID will not be in the Ironic database. Warning: The machine unique UUID is not the Ironic UUID. A valid node-spec-overrides.yaml file may look like the following: parameter_defaults: NodeDataLookup: | {"32E87B4C-C4A7-418E-865B-191684A6883B": {"devices": ["/dev/sdc"]}} All lines after the first two lines must be valid JSON. An easy way to verify that the JSON is valid is to use the `jq` command. For example, remove the first two lines ("parameter_defaults:" and "NodeDataLookup: |") from the file temporarily and run `cat node-spec-overrides.yaml | jq .` . As the node-spec-overrides.yaml file grows, `jq` may also be used to ensure that the embedded JSON is valid. For example, because we know the 'devices' and 'dedicated_devices' list should be the same length, we can use the following to verify that they are the same length before starting the deployment. (undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .devices | length' 33 30 33 (undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .dedicated_devices | length' 33 30 33 (undercloud) [stack@b08-h02-r620 tht]$ In the above example, the node-spec-c05-h17-h21-h25-6048r.yaml has three servers in rack c05 in which slots h17, h21, and h25 are missing disks. A more complicated example is available at https://bugzilla.redhat.com/attachment.cgi?id=1474376 After the JSON has been validated add back the two two lines which makes it a valid environment YAML file ("parameter_defaults:" and "NodeDataLookup: |") and include it with a `-e` in the deployment.