Description of problem:
Ceph cluster creation
(input parameter : List of nodes and roles : OSD / MON for each of them and failure domain information like zones and racks, journal size is optional )
Cluster creation should be resilient to some failures. For e.g. If one OSD creation or a few OSD creation fails, the cluster creation still should go ahead and the cluster should be created.
Calamari lite service should be configured and turned on as part of cluster creation.
The errors / failures should be communicated to the user of API.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Mrugesh and I spoke. We concluded that USM will work out the information like Failure domain and global cluster defaults config.
How will the request convey this information i.e. share and example of what you expect to sent in the request.
Create cluster will be satisfied by this workflow.
Red Hat Storage Controller will request package installation
Mariner will install packages and report status of jobs
Red Hat Storage Controller will work out how to avoid asking for monitors that violate failure domain.
Red Hat Storage Controller will request monitor cluster creation
Mariner will configure monitors and report status of jobs
Red Hat Storage Controller will work out how to avoid asking for OSDs that violate failure domain.
Red Hat Storage Controller will request OSD creation
Mariner will configure OSDs and report status of jobs
This is how we would like the cluster creation to happen, keeping with your proposal. (Note : we want to have package installation as a separate step, rather than being clubbed with cluster creation : We had a discussion with Mrugesh on this and he agrees to this proposal)
1. We invoke the package installation for MONs from Mariner. Something like, we will provide the list-of-nodes, on which ceph packages for MON needs to be installed by Mariner.
2. Using the returned task id of the above step, we will poll and find the progress of this task.
2a. In the meanwhile we will ask Mariner to install the ceph packages for the OSD nodes on another-list-of-nodes
3. Once the MONs package installation is complete, we will invoke the CreateCluster API, to which we will pass-on the list-of-MONs, using which Mariner will create the cluster. It will return a task-id for this operation.
4. Using this task-id, we monitor the progress of cluster creation.
5. Once the cluster creation using MONs is successful, we will invoke the addition of OSD nodes to the cluster. We will pass-on a list-of-OSD-Nodes to the Mariner API "AddOSDs", which will internally add all these OSD-Nodes to the specified cluster. Along with the OSDs, we will pass-on disk specific information, journal infromation etc to the API. "AddOSDs" API is expected to prepare the OSDs in parallel, instead of serializing it. The other expected behaviour is, even if one or few number of OSDs fail, the task should proceed, instead of aborting.
This API will return a task-id, using which we will poll regularly to see the progress of OSD addition. We want the progress to be in format, so that, we can see the succeeded and failed OSDs list.
The package installation API call could be just a single call. Unlike the mon and osd creation calls, there's no need for sequential actions here - the package installation can be carried out in parallel on all the nodes in question.
We would love to have that kind of API.
But, as per Gregory, they have some complexity involved in Mariner, if they have to provide that kind of API, which can take a list of nodes. It would be easy for them to have each node package installation being invoked separately in parallel, so that they can give task id for each, i suppose. And we use that task id to poll to know how the task is progressing.
Gregory, can you confirm that?
Failure domain configuration require changes in the ceph.conf as well as the crush maps. The configuration some what similar to the crush configuration
These are different bucket types supported:
- type 9 region
- type 8 datacenter
- type 7 room
- type 6 pod
- type 5 pdu
- type 4 row
- type 3 rack
- type 2 chassis
- type 1 root
Each host can have a single or a combination of these hierarchies. say region(APAC)->datacenter(BLR)->room()->rack-chassis etc. By default all the hosts will be added to root bucket.
Configuring the Journals
There three different uses cases here.
A disk can used as :
1. OSD Data and Journal co-existing on same disk - We need to create two partitions here, one for OSD and other for journal
2. Dedicated OSD Data - the disk is dedicated for OSD data only. Journals will be created in a separate disk
3. Dedicated Journal - Need to create multiple partitions based on the requirements. OSD can utilize the journals created on this disk
USM will provide journal size(system calculated or user provided) , type(one among the three) and disk(in case of #3)to mariner so that mariner can take care of creating the journals based on the input
We have been thinking of allowing USM to provide a callback URL when a request to the API is made so that polling is not needed.
The callback URL would be requested when the API has completed (in either failure or success).
This feature would not be hard to implement in the installer and I think it would allow a better way to handle updates for USM. Individual tasks would still exist and USM could still poll those if needed.
USM needs to know the periodic updates not only the completion status. Suppose a requested operation has 5 steps, USM should get an indication when each step is completed.This is important because the admin want to see progress of the task in UI
We have documented a few things on the API in hope this makes it a bit clear:
API endpoint interactions are per-host except for install tasks. Reasoning for each and expected behavior are documented:
There are no "composite" tasks.
The docs now document how a full cluster install would look like:
(In reply to Alfredo Deza from comment #14)
/etc/sudoers should not be modified. /etc/sudoers.d should be used to configure ceph-installer specific settings, including disabling requiretty for this user. No system configuration must be overridden, except for the ceph-installer user.
(In reply to Mrugesh Karnik from comment #16)
Thank you for catching that, we were not doing it in code anymore, docs were not updated. I have just made the changes to reflect this. We are only making changes to `/etc/sudoers.d/ceph-installer`
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see email@example.com with any questions
It is possible to create ceph cluster. Checked with ceph-installer-1.0.11-1.el7scon.noarch ->Verified
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.