Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1613933 - [Docs] The Ceph Guide for OpenStack should have a NodeDataLookup OSD list override example
[Docs] The Ceph Guide for OpenStack should have a NodeDataLookup OSD list ove...
Status: ON_DEV
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation (Show other bugs)
13.0 (Queens)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kim Nylander
RHOS Documentation Team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-08-08 11:05 EDT by John Fulton
Modified: 2018-10-17 16:34 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Heat Environment File which uses NodeDataLookup for Ceph deployment (2.05 KB, text/plain)
2018-08-08 11:14 EDT, John Fulton
no flags Details
Updated Heat Environment File which uses NodeDataLookup for Ceph deployment (3.73 KB, text/plain)
2018-08-08 11:16 EDT, John Fulton
no flags Details

  None (edit)
Description John Fulton 2018-08-08 11:05:27 EDT
The Deploying an Overcloud with Containerized Red Hat Ceph document [1] section 5.1 covers Mapping the Ceph Storage Node Disk Layout. This section is good, but an additional section should be added which covers how to deal with scenarios in which a particular node may have a disk missing. TripleO supports this feature and it is documented upstream [1] but not downstream. 

This bug asks a new section be added to this document called something like "Mapping the Disk Layout to Non-Homogeneous Ceph Storage Nodes" which then explains what to do in this scenario and provides an example of how to do it.

I will update this bugzilla with content to help the above be written.


[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/deploying_an_overcloud_with_containerized_red_hat_ceph/

[2] https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/node_specific_hieradata.html
Comment 1 John Fulton 2018-08-08 11:07:43 EDT
The closest example we have to this already in our documentation pertains to Nova. We need an example for Ceph too. 

 https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/advanced_overcloud_customization/#sect-Customizing_Hieradata_for_Individual_Nodes
Comment 2 John Fulton 2018-08-08 11:14 EDT
Created attachment 1474375 [details]
Heat Environment File which uses NodeDataLookup for Ceph deployment

This attachment includes a Heat environment file I used in the scale lab to deal with a server which has one disk missing. 

All of the servers had a devices list with 35 disks except one of them had a disk missing. This environment file overrides the default devices list for only that single node and gives it the list of 34 disks it should use instead the global list.
Comment 3 John Fulton 2018-08-08 11:16 EDT
Created attachment 1474376 [details]
Updated Heat Environment File which uses NodeDataLookup for Ceph deployment

Updating attachment as I accidentally attached an old version which was missing the dedicated_devices list.
Comment 4 John Fulton 2018-08-08 11:45:27 EDT
Proposed content:

By default all nodes of a role which will host Ceph OSDs (indicated by the OS::TripleO::Services::CephOSD service in roles_data.yaml), for example CephStorage or ComputeHCI nodes, will use the global devices list and dedicated_devices list set in section 5.1, "Mapping the Ceph Storage Node Disk Layout". This assumes that all of these servers have homogeneous hardware. If a subset of these do not have homogeneous hardware, then it's possible to indicate to director that each of these individual servers should have a different devices and dedicated_devices list. Also known as a "node-specific disk configuration".

To pass director a node-specific disk configuration a Heat environment file, e.g. node-spec-overrides.yaml, must be passed to the `openstack overcloud deploy` command and the file's content must identify each server by a machine unique UUID and a list of local variables which override the global variables.

The machine unique UUID may be extracted for each individual server by running 
`dmidecode -s system-uuid` on that server or it may be extracted from the Ironic database by running `openstack baremetal introspection data save NODE-ID | jq .extra.system.product.uuid` on the undercloud.

Warning: If the undercloud.conf does not have inspection_extras = true prior to undercloud installation/upgrade and introspection, then the machine unique UUID will not be in the Ironic database.

Warning: The machine unique UUID is not the Ironic UUID.

A valid node-spec-overrides.yaml file may look like the following:

parameter_defaults:
  NodeDataLookup: |
    {"32E87B4C-C4A7-418E-865B-191684A6883B": {"devices": ["/dev/sdc"]}}

All lines after the first two lines must be valid JSON. An easy way to verify that the JSON is valid is to use the `jq` command. For example, remove the first two lines ("parameter_defaults:" and "NodeDataLookup: |") from the file temporarily and run `cat node-spec-overrides.yaml | jq .` . As the node-spec-overrides.yaml file grows, `jq` may also be used to ensure that the embedded JSON is valid. For example, because we know the 'devices' and 'dedicated_devices' list should be the same length, we can use the following to verify that they are the same length before starting the deployment. 

(undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .devices | length'
33
30
33
(undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .dedicated_devices | length'
33
30
33
(undercloud) [stack@b08-h02-r620 tht]$ 

In the above example, the node-spec-c05-h17-h21-h25-6048r.yaml has three servers in rack c05 in which slots h17, h21, and h25 are missing disks. 

A more complicated example is available at https://bugzilla.redhat.com/attachment.cgi?id=1474376

After the JSON has been validated add back the two two lines which makes it a valid environment YAML file ("parameter_defaults:" and "NodeDataLookup: |") and include it with a `-e` in the deployment.

Note You need to log in before you can comment on or make changes to this bug.