Bug 1354586

Summary:	CRUSH cluster map contains 2 independent cluster hierarchies
Product:	[Red Hat Storage] Red Hat Storage Console	Reporter:	Martin Bukatovic <mbukatov>
Component:	Ceph	Assignee:	Nishanth Thomas <nthomas>
Ceph sub component:	configuration	QA Contact:	sds-qe-bugs
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	unspecified	CC:	ltrilety, mbukatov, mkudlej, nthomas, shtripat, vsarmila
Version:	2
Target Milestone:	---
Target Release:	2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	rhscon-core-0.0.34-1.el7scon.x86_64 rhscon-ceph-0.0.33-1.el7scon.x86_64 rhscon-ui-0.0.47-1.el7scon.noarch	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-08-23 19:56:35 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1344195

Description Martin Bukatovic 2016-07-11 15:59:16 UTC

Description of problem
======================

When a ceph cluster is created via RHSC 2.0, two independent cluster
hierarchies can be found in it's CRUSH cluster map.

Version-Release
===============

On RHSC 2.0 server:

rhscon-ceph-0.0.27-1.el7scon.x86_64
rhscon-core-0.0.28-1.el7scon.x86_64
rhscon-core-selinux-0.0.28-1.el7scon.noarch
rhscon-ui-0.0.42-1.el7scon.noarch
ceph-ansible-1.0.5-23.el7scon.noarch
ceph-installer-1.0.12-3.el7scon.noarch

On Ceph Storage nodes:

rhscon-agent-0.0.13-1.el7scon.noarch
ceph-osd-10.2.2-5.el7cp.x86_64

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.
2. Accept few nodes for the ceph cluster.
3. Create new ceph cluster named 'alpha'.
4. Check CRUSH cluster map

Actual results
==============

There are 2 intependent cluster hierarchies in cluster map:

~~~
# ceph -c /etc/ceph/alpha.conf osd tree
ID  WEIGHT  TYPE NAME                                                UP/DOWN REWEIGHT PRIMARY-AFFINITY
-10 0.03998 root general
 -6 0.00999     host mbukatov-usm1-node2.os1.phx2.redhat.com-general
  1 0.00999         osd.1                                                 up  1.00000          1.00000
 -7 0.00999     host mbukatov-usm1-node3.os1.phx2.redhat.com-general
  2 0.00999         osd.2                                                 up  1.00000          1.00000
 -8 0.00999     host mbukatov-usm1-node1.os1.phx2.redhat.com-general
  0 0.00999         osd.0                                                 up  1.00000          1.00000
 -9 0.00999     host mbukatov-usm1-node4.os1.phx2.redhat.com-general
  3 0.00999         osd.3                                                 up  1.00000          1.00000
 -1       0 root default
 -2       0     host mbukatov-usm1-node1
 -3       0     host mbukatov-usm1-node2
 -4       0     host mbukatov-usm1-node3
 -5       0     host mbukatov-usm1-node4
~~~

In this case:

 * 1st hierarchy has root with ID -10 (named "general")
 * 2nd hierarchy has root with ID -1 (named "default")

Expected results
================

There is only single cluster hierarchies in cluster map, so that the output
would look something like this:

~~~
# ceph -c /etc/ceph/alpha.conf osd tree
ID  WEIGHT  TYPE NAME                                                UP/DOWN REWEIGHT PRIMARY-AFFINITY
-10 0.03998 root general
 -6 0.00999     host mbukatov-usm1-node2.os1.phx2.redhat.com-general
  1 0.00999         osd.1                                                 up  1.00000          1.00000
 -7 0.00999     host mbukatov-usm1-node3.os1.phx2.redhat.com-general
  2 0.00999         osd.2                                                 up  1.00000          1.00000
 -8 0.00999     host mbukatov-usm1-node1.os1.phx2.redhat.com-general
  0 0.00999         osd.0                                                 up  1.00000          1.00000
 -9 0.00999     host mbukatov-usm1-node4.os1.phx2.redhat.com-general
  3 0.00999         osd.3                                                 up  1.00000          1.00000
~~~

Additional info
===============

For details about CRUSH map, see:

http://docs.ceph.com/docs/master/rados/operations/crush-map/

Comment 1 Martin Bukatovic 2016-07-11 16:06:33 UTC

I have few additional questions here:

1) Why do we have 2 cluster hierarchies in the crush map? Is it intentional or
just a remnant of some action during cluster setup? 

2) Why does each hierarchy use different bucket naming scheme for hosts?

3) Which component created each hierarchy?

4) Why does only one hierarchy have OSD's attached while the other doesn't?

That said, based on my current understanding of CRUSH cluster map, I don't
think that makes any sense to have 2 hierarchies in cluster map like that.

Without a clear purpose, RHSC 2.0  shouldn't create such complicated and
error prone configuration.

Comment 2 Nishanth Thomas 2016-07-13 10:21:21 UTC

(In reply to Martin Bukatovic from comment #1)
> I have few additional questions here:
> 
> 1) Why do we have 2 cluster hierarchies in the crush map? Is it intentional
> or
> just a remnant of some action during cluster setup? 

you will as many hierarchies as the number of valid storage profiles applicable to the cluster

> 
> 2) Why does each hierarchy use different bucket naming scheme for hosts?
> 

As mentioned above it is based on the storage profile

> 3) Which component created each hierarchy?

it will be created by the ceph provider after the cluster is created

> 
> 4) Why does only one hierarchy have OSD's attached while the other doesn't?

This was due to the earlier implementation where default tree is ignored. Also calamari won't allow a OSD to be present inn two different hierarchies. hence the original tree will be empty if you take out all the OSDs from the the dafult to otheres 

> 
> That said, based on my current understanding of CRUSH cluster map, I don't
> think that makes any sense to have 2 hierarchies in cluster map like that.
> 
> Without a clear purpose, RHSC 2.0  shouldn't create such complicated and
> error prone configuration.

This is implemented based on the requirement where pools to the created on top specified OSDs grouped by the storage profile


As part of this patch, fixed the issue of ignoring default hierarchy. Still it is possible to see a blank default if the user move all the OSDs out from default

Comment 4 Lubos Trilety 2016-07-26 15:11:59 UTC

Tested on:
rhscon-core-0.0.36-1.el7scon.x86_64
rhscon-ui-0.0.50-1.el7scon.noarch
rhscon-ceph-0.0.36-1.el7scon.x86_64
rhscon-core-selinux-0.0.36-1.el7scon.noarch

I have several active storage profiles
# ceph osd tree
ID  WEIGHT  TYPE NAME                     UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-11 0.00999 root test                                                       
-10 0.00999     host dhcp-126-101-test                                      
  0 0.00999         osd.0                      up  1.00000          1.00000 
 -9 0.02998 root ec_test                                                    
 -6 0.00999     host dhcp-126-103-ec_test                                   
  4 0.00999         osd.4                      up  1.00000          1.00000 
 -7 0.00999     host dhcp-126-102-ec_test                                   
  3 0.00999         osd.3                      up  1.00000          1.00000 
 -8 0.00999     host dhcp-126-105-ec_test                                   
  6 0.00999         osd.6                      up  1.00000          1.00000 
 -1 0.03998 root default                                                    
 -2 0.00999     host dhcp-126-101                                           
  1 0.00999         osd.1                      up  1.00000          1.00000 
 -3 0.00999     host dhcp-126-102                                           
  2 0.00999         osd.2                      up  1.00000          1.00000 
 -4 0.00999     host dhcp-126-103                                           
  5 0.00999         osd.5                      up  1.00000          1.00000 
 -5 0.00999     host dhcp-126-105                                           
  7 0.00999         osd.7                      up  1.00000          1.00000

It works as it should, pools could be created, data could be stored etc from GUI and from CLI too.
However because of multiple hierarchies ceph statistics are not correct. On some places it seems that ceph counts with all osds available for a pool.

Comment 5 Martin Bukatovic 2016-07-27 08:46:21 UTC

(In reply to Nishanth Thomas from comment #2)
> (In reply to Martin Bukatovic from comment #1)
> > 1) Why do we have 2 cluster hierarchies in the crush map? Is it intentional
> > or
> > just a remnant of some action during cluster setup? 
> 
> you will as many hierarchies as the number of valid storage profiles
> applicable to the cluster

Ok, we will assume this is intended RHSC 2.0 design.

(In reply to Lubos Trilety from comment #4)
> Tested on:
> rhscon-core-0.0.36-1.el7scon.x86_64
> rhscon-ui-0.0.50-1.el7scon.noarch
> rhscon-ceph-0.0.36-1.el7scon.x86_64
> rhscon-core-selinux-0.0.36-1.el7scon.noarch
> 
> I have several active storage profiles
> # ceph osd tree
> ID  WEIGHT  TYPE NAME                     UP/DOWN REWEIGHT PRIMARY-AFFINITY 
> -11 0.00999 root test                                                       
> -10 0.00999     host dhcp-126-101-test                                      
>   0 0.00999         osd.0                      up  1.00000          1.00000 
>  -9 0.02998 root ec_test                                                    
>  -6 0.00999     host dhcp-126-103-ec_test                                   
>   4 0.00999         osd.4                      up  1.00000          1.00000 
>  -7 0.00999     host dhcp-126-102-ec_test                                   
>   3 0.00999         osd.3                      up  1.00000          1.00000 
>  -8 0.00999     host dhcp-126-105-ec_test                                   
>   6 0.00999         osd.6                      up  1.00000          1.00000 
>  -1 0.03998 root default                                                    
>  -2 0.00999     host dhcp-126-101                                           
>   1 0.00999         osd.1                      up  1.00000          1.00000 
>  -3 0.00999     host dhcp-126-102                                           
>   2 0.00999         osd.2                      up  1.00000          1.00000 
>  -4 0.00999     host dhcp-126-103                                           
>   5 0.00999         osd.5                      up  1.00000          1.00000 
>  -5 0.00999     host dhcp-126-105                                           
>   7 0.00999         osd.7                      up  1.00000          1.00000
> 
> It works as it should, pools could be created, data could be stored etc from
> GUI and from CLI too.
> However because of multiple hierarchies ceph statistics are not correct. On
> some places it seems that ceph counts with all osds available for a pool.

Since Nishanth stated that the current design is that dedicated cluster
hierarchy is maintained for each storage profile, I would consider this
behaviour correct and so I think it's ok to validate this BZ.

That said, personally I'm not completely sure current design of storage
profiles feature makes sense. But that would be a starting point for another
discussion (and another BZ).

Comment 6 Martin Bukatovic 2016-07-27 08:47:14 UTC

Based information provided in comment 4 and comment 5, moving to VERIFIED.

Comment 8 errata-xmlrpc 2016-08-23 19:56:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754