Bug 1319833

Summary: Nodes are processed one by one (non parallel) way during Create Cluster task
Product: Red Hat Storage Console Reporter: Martin Bukatovic <mbukatov>
Component: CephAssignee: Shubhendu Tripathi <shtripat>
Ceph sub component: configuration QA Contact: sds-qe-bugs
Status: CLOSED WONTFIX Docs Contact:
Severity: high    
Priority: unspecified CC: nthomas
Version: 2Keywords: TestBlocker
Target Milestone: ---   
Target Release: 3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-19 05:42:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1319856    
Bug Blocks:    

Description Martin Bukatovic 2016-03-21 15:50:26 UTC
Description of problem
======================

When machines are configured during *Create Cluster* task, the process is not
done in parallel. For example: when packages are installed on one machine,
the other machines are waiting and nothing is happening on them.

Without fixing this BZ, it would not be possible to use USM to 
create a production sized clusters.

Version-Release number of selected component
============================================

osd machine:
ceph-0.94.5-9.el7cp.x86_64
ceph-common-0.94.5-9.el7cp.x86_64
ceph-osd-0.94.5-9.el7cp.x86_64
rhscon-agent-0.0.3-3.el7.noarch

mon machine:
ceph-0.94.5-9.el7cp.x86_64
ceph-common-0.94.5-9.el7cp.x86_64
ceph-mon-0.94.5-9.el7cp.x86_64
rhscon-agent-0.0.3-3.el7.noarch

usm server machine:
ceph-0.94.5-9.el7cp.x86_64
ceph-ansible-1.0.1-1.20160307gitb354445.el7.noarch
ceph-common-0.94.5-9.el7cp.x86_64
redhat-ceph-installer-0.2.3-1.20160304gitb3e3c68.el7.noarch
rhscon-ceph-0.0.6-14.el7.x86_64
rhscon-core-0.0.8-14.el7.x86_64
rhscon-ui-0.0.23-1.el7.noarch

How reproducible
================

100 %

Steps to Reproduce
==================

1. Prepare machines (following usm documentation) for USM.
   Allocate at least 3 machines for monitor machines (no extra disks), and
   4 machines (each with at least 2 extre disk for ceph OSDs)
2. In USM web interface, accept all machines.
3. Use *Create Cluster* wizard to setup a cluster, use all machines prepared
   in the step #1.
4. Check what is happening on each machine (ssh there and monitor cpu, disk
   and memory usage, run `top` there).
5. Wait for the process to finish.

Actual results
==============

The setup takes too long because setup is running only on the single machine
at a time.

This may also lead to the timeout caused failure of the Create Cluster task,
but this is not a concern of this BZ.

Expected results
================

There is some window of machines which are processed at the same time, and the
admin can configure the size of this window.

We should also evaluate if full parallel installation on all nodes is feasible.

Additional info
===============

The actuall component concerned with this may be different (devs told me that
this may be a ceph-installer problem as well) - dev should proper reevaluate
component of this BZ during investigation.

Comment 2 Martin Bukatovic 2016-03-21 17:55:23 UTC
Additional information
======================

From the task details page, we can see that it took about 4 minutes to install
packages on a machine, but since all were processed one by one, this process
takes too long and would not scale for actual sized clusters with hunderds of
machines:

~~~
Installing packages     Mar 18 2016, 08:34:14 PM                            
Installed packages on dhcp-126-80.lab.eng.brq.redhat.com:   Mar 18 2016, 08:39:06 PM
Installed packages on dhcp-126-85.lab.eng.brq.redhat.com:   Mar 18 2016, 08:43:10 PM
Installed packages on dhcp-126-81.lab.eng.brq.redhat.com:   Mar 18 2016, 08:46:08 PM
Installed packages on dhcp-126-84.lab.eng.brq.redhat.com:   Mar 18 2016, 08:49:21 PM
Installed packages on dhcp-126-82.lab.eng.brq.redhat.com:   Mar 18 2016, 08:50:09 PM
Installed packages on dhcp-126-79.lab.eng.brq.redhat.com:   Mar 18 2016, 08:53:41 PM
~~~

Comment 3 Shubhendu Tripathi 2018-11-19 05:42:18 UTC
This product is EOL now