Bug 1319856
Summary: | ceph-installer executes installation of packages in sequences for different nodes | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Shubhendu Tripathi <shtripat> |
Component: | Ceph-Installer | Assignee: | Christina Meno <gmeno> |
Status: | CLOSED WONTFIX | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 3.0 | CC: | adeza, anharris, aschoen, ceph-eng-bugs, flucifre, hnallurv, kdreyer, mbukatov, mhackett, mkudlej, nlevine, nthomas, sankarshan, shtripat |
Target Milestone: | rc | ||
Target Release: | 3.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-03-15 16:44:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1291304, 1319833 |
Description
Shubhendu Tripathi
2016-03-21 16:36:42 UTC
@mbukatov, you may add additional details for this. It depends on how requests are being sent to the installer. If an install request is being POSTed for each host then the install will be sequential. If a group of hosts (say MONs) is sent, then the process will be using the default parallel value (5). 30 minutes doesn't sound right. The description doesn't mention how these hosts are being installed, how the requests are being handled, and what the output of these tasks look like. The ceph-installer captures start/end times for tasks and other useful information like the command being used to call Ansible. would you please provide an example of the request you are making to achieve sequential package install? For each of the hosts (mon/osd) we invoke /api/mon/install or /api/osd/install respectively one by one in threads. This makes sure that we are able to track and report in UI of successful installation done for each node. (In reply to Shubhendu Tripathi from comment #7) > For each of the hosts (mon/osd) we invoke /api/mon/install or > /api/osd/install respectively one by one in threads. > > This makes sure that we are able to track and report in UI of successful > installation done for each node. Then this is not a bug. Per #comment5 if an install request is being POSTed for each host then it is forcing the installer to go sequentially. To avoid this the client must pass multiple hosts for the install process at once (which is allowed by the API). Alfredo, is this a design restriction or due to ceph-ansible. My understanding is, even if multiple http POST are submitted to the server, it can create async tasks for each of the POST and return the task ids to the client. These async tasks can run in parallel. Also this should not cause any issues within tasks as they are executed for different hosts as such. This is something similar we are doing in USM for node accept/initialize. UI does submit multiple POST to the server, but the async tasks are run in parallel for different hosts. @Nishanth, anything to add here? (In reply to Shubhendu Tripathi from comment #9) > Alfredo, is this a design restriction or due to ceph-ansible. > My understanding is, even if multiple http POST are submitted to the server, > it can create async tasks for each of the POST and return the task ids to > the client. This is correct, but the "async" process here is placing these requests in a queue of which only one worker is consuming from, so even though the client gets an immediate response because of the asynchronous nature the tasks will get completed because of the single worker. > > These async tasks can run in parallel. Also this should not cause any issues > within tasks as they are executed for different hosts as such. They cannot run in parallel now. As I mentioned, to allow parallelization of task execution it would require more work. If parallel execution is required by the client then it must pass multiple hosts when installing. > > This is something similar we are doing in USM for node accept/initialize. UI > does submit multiple POST to the server, but the async tasks are run in > parallel for different hosts. > > @Nishanth, anything to add here? Martin: not sure why this is blocking 1319833. Like I mentioned in comment #8 and #10: To allow parallel installation of multiple hosts *the API allows this* by accepting multiple hosts at the same time. See the "Install Operations" in http://docs.ceph.com/ceph-installer/docs/#install-operations From that section: The install requests to the API are allowed to pass a list of multiple hosts. This process is not sequential: all hosts are operated against at once and if a single host fails to install the entire task will report as a failure. This is expected Ansible behavior and this API adheres to that. (In reply to Alfredo Deza from comment #13) > Martin: not sure why this is blocking 1319833. Like I mentioned in comment > #8 and #10: I just noticed that this BZ was moved into "Red Hat Storage Console" product, and interpreted this as an acknowledgement of the point you mention, that the issue actually is in the RHSC rather then ceph-installer itself. But based on your comment #13, it seems that I may misunderstood the meaning of ceph-installer component of RHSC. If this is the case, I'm sorry for the confusion and feel free to revert link to BZ 1319833 back to "see also" state. From USM integration point view this is an issue. Suppose you passed 50 host to API call and one fails means the request itself fail? I dont think this is the right behaviour. Also from usm should show the user what is failed what is not. Based on current output from task, there is no way to figure out this information. So that is reason we are sending each installation request as a separate request. What is blocking you to create separate tasks for each of these and run it parallel? (In reply to Martin Bukatovic from comment #14) > (In reply to Alfredo Deza from comment #13) > > Martin: not sure why this is blocking 1319833. Like I mentioned in comment > > #8 and #10: > > I just noticed that this BZ was moved into "Red Hat Storage Console" product, > and interpreted this as an acknowledgement of the point you mention, that the > issue actually is in the RHSC rather then ceph-installer itself. But based on > your comment #13, it seems that I may misunderstood the meaning of > ceph-installer component of RHSC. If this is the case, I'm sorry for the > confusion and feel free to revert link to BZ 1319833 back to "see also" > state. The move to the RH Storage Console product simply means that we are now trying to track all our installer bugs in the RH Storage Console product. This aligns with the fact that the ceph-installer RPM and its dependencies will ship in the RH Storage Console product, not the RH Ceph Product. It's confusing to have "ceph-installer" BZ components in two products, and it's my understanding that we will disable the "ceph-installer" sub-component in the RH Ceph Storage product soon, so we need to be tracking all ceph-installer bugs here instead. (In reply to Nishanth Thomas from comment #15) > From USM integration point view this is an issue. Suppose you passed 50 host > to API call and one fails means the request itself fail? I dont think this > is the right behaviour. That is not the right behavior for USM's domain logic. It is entirely valid for Ansible. This is why it is crucial to determine what behavior is needed by the client and not the installer. I do understand that having 50 individual requests would be a problem. But that wouldn't be solved by an increment in the number of workers for the installer. For example, if we increased that number to, say, 8 workers, it would mean that the client would see five servers at a time which would still take very long to complete. > Also from usm should show the user what is failed > what is not. Based on current output from task, there is no way to figure > out this information. So that is reason we are sending each installation > request as a separate request. This looks more like the problem we should solve in the installer: "if N hosts are used for a task and it fails, report back what host(s) failed" > What is blocking you to create separate tasks > for each of these and run it parallel? This is tricky because it means we would now need to create distinct queues (as opposed to a single one): one for installs and another one for configurations. Having two queues is not that complex but it would require a good amount of effort to configure it correctly. Once those queues are correctly set and separated, then we need to come up with an increased number of workers to help with the amount of requests. The common approach here is using one per CPU/core, which will probably be 8. Even that number wouldn't help that much in the case of 50 requests. The added caveat here is that a machine's load (using a worker per core) could get high enough to cause severe usage issues. Since the console is installed in the same host, it would no doubt have repercussions for that. Is that a risk that is OK to take? (In reply to Alfredo Deza from comment #17) > (In reply to Nishanth Thomas from comment #15) > > From USM integration point view this is an issue. Suppose you passed 50 host > > to API call and one fails means the request itself fail? I dont think this > > is the right behaviour. > > That is not the right behavior for USM's domain logic. It is entirely valid > for Ansible. This is why it is crucial to determine what behavior is needed > by the client and not the installer. > > I do understand that having 50 individual requests would be a problem. But > that wouldn't be solved by an increment in the number of workers for the > installer. For example, if we increased that number to, say, 8 workers, it > would mean that the client would see five servers at a time which would > still take very long to complete. > But still that will be a huge improvement compared to what we have today > > Also from usm should show the user what is failed > > what is not. Based on current output from task, there is no way to figure > > out this information. So that is reason we are sending each installation > > request as a separate request. > > This looks more like the problem we should solve in the installer: "if N > hosts are used for a task and it fails, report back what host(s) failed" > How easy it will be for you to do this? a failed list and a list with successful nodes > > What is blocking you to create separate tasks > > for each of these and run it parallel? > > This is tricky because it means we would now need to create distinct queues > (as opposed to a single one): one for installs and another one for > configurations. Having two queues is not that complex but it would require a > good amount of effort to configure it correctly. > > Once those queues are correctly set and separated, then we need to come up > with an increased number of workers to help > with the amount of requests. The common approach here is using one per > CPU/core, which will probably be 8. Even that number wouldn't help that much > in the case of 50 requests. > > The added caveat here is that a machine's load (using a worker per core) > could get high enough to cause severe usage issues. Since the console is > installed in the same host, it would no doubt have repercussions for that. > Is that a risk that is OK to take? I think we need to do some benchmarking before we decide on this. From USM stand point it is a goo feature to explore and implemented as we have a better control if we run installation tasks separately. I added a priority and severity as an experiment to see if those carry over when moving this from Console to Ceph. Looks like a successful transition. |