1354586 – CRUSH cluster map contains 2 independent cluster hierarchies

Bug 1354586 - CRUSH cluster map contains 2 independent cluster hierarchies

Summary: CRUSH cluster map contains 2 independent cluster hierarchies

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	Ceph
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	2
Assignee:	Nishanth Thomas
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Console-2-DevFreeze
TreeView+	depends on / blocked

Reported:	2016-07-11 15:59 UTC by Martin Bukatovic
Modified:	2016-08-23 19:56 UTC (History)
CC List:	6 users (show)
Fixed In Version:	rhscon-core-0.0.34-1.el7scon.x86_64 rhscon-ceph-0.0.33-1.el7scon.x86_64 rhscon-ui-0.0.47-1.el7scon.noarch
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-23 19:56:35 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1354603	0	unspecified	CLOSED	CRUSH cluster hierarchy gets corrupted so that entire cluster could be stuck in unusable state	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHEA-2016:1754	0	normal	SHIPPED_LIVE	New packages: Red Hat Storage Console 2.0	2017-04-18 19:09:06 UTC

Internal Links: 1354603

Description Martin Bukatovic 2016-07-11 15:59:16 UTC

Description of problem
======================

When a ceph cluster is created via RHSC 2.0, two independent cluster
hierarchies can be found in it's CRUSH cluster map.

Version-Release
===============

On RHSC 2.0 server:

rhscon-ceph-0.0.27-1.el7scon.x86_64
rhscon-core-0.0.28-1.el7scon.x86_64
rhscon-core-selinux-0.0.28-1.el7scon.noarch
rhscon-ui-0.0.42-1.el7scon.noarch
ceph-ansible-1.0.5-23.el7scon.noarch
ceph-installer-1.0.12-3.el7scon.noarch

On Ceph Storage nodes:

rhscon-agent-0.0.13-1.el7scon.noarch
ceph-osd-10.2.2-5.el7cp.x86_64

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.
2. Accept few nodes for the ceph cluster.
3. Create new ceph cluster named 'alpha'.
4. Check CRUSH cluster map

Actual results
==============

There are 2 intependent cluster hierarchies in cluster map:

~~~
# ceph -c /etc/ceph/alpha.conf osd tree
ID  WEIGHT  TYPE NAME                                                UP/DOWN REWEIGHT PRIMARY-AFFINITY
-10 0.03998 root general
 -6 0.00999     host mbukatov-usm1-node2.os1.phx2.redhat.com-general
  1 0.00999         osd.1                                                 up  1.00000          1.00000
 -7 0.00999     host mbukatov-usm1-node3.os1.phx2.redhat.com-general
  2 0.00999         osd.2                                                 up  1.00000          1.00000
 -8 0.00999     host mbukatov-usm1-node1.os1.phx2.redhat.com-general
  0 0.00999         osd.0                                                 up  1.00000          1.00000
 -9 0.00999     host mbukatov-usm1-node4.os1.phx2.redhat.com-general
  3 0.00999         osd.3                                                 up  1.00000          1.00000
 -1       0 root default
 -2       0     host mbukatov-usm1-node1
 -3       0     host mbukatov-usm1-node2
 -4       0     host mbukatov-usm1-node3
 -5       0     host mbukatov-usm1-node4
~~~

In this case:

 * 1st hierarchy has root with ID -10 (named "general")
 * 2nd hierarchy has root with ID -1 (named "default")

Expected results
================

There is only single cluster hierarchies in cluster map, so that the output
would look something like this:

~~~
# ceph -c /etc/ceph/alpha.conf osd tree
ID  WEIGHT  TYPE NAME                                                UP/DOWN REWEIGHT PRIMARY-AFFINITY
-10 0.03998 root general
 -6 0.00999     host mbukatov-usm1-node2.os1.phx2.redhat.com-general
  1 0.00999         osd.1                                                 up  1.00000          1.00000
 -7 0.00999     host mbukatov-usm1-node3.os1.phx2.redhat.com-general
  2 0.00999         osd.2                                                 up  1.00000          1.00000
 -8 0.00999     host mbukatov-usm1-node1.os1.phx2.redhat.com-general
  0 0.00999         osd.0                                                 up  1.00000          1.00000
 -9 0.00999     host mbukatov-usm1-node4.os1.phx2.redhat.com-general
  3 0.00999         osd.3                                                 up  1.00000          1.00000
~~~

Additional info
===============

For details about CRUSH map, see:

http://docs.ceph.com/docs/master/rados/operations/crush-map/

Comment 1 Martin Bukatovic 2016-07-11 16:06:33 UTC

I have few additional questions here:

1) Why do we have 2 cluster hierarchies in the crush map? Is it intentional or
just a remnant of some action during cluster setup? 

2) Why does each hierarchy use different bucket naming scheme for hosts?

3) Which component created each hierarchy?

4) Why does only one hierarchy have OSD's attached while the other doesn't?

That said, based on my current understanding of CRUSH cluster map, I don't
think that makes any sense to have 2 hierarchies in cluster map like that.

Without a clear purpose, RHSC 2.0  shouldn't create such complicated and
error prone configuration.

Comment 2 Nishanth Thomas 2016-07-13 10:21:21 UTC

(In reply to Martin Bukatovic from comment #1)
> I have few additional questions here:
> 
> 1) Why do we have 2 cluster hierarchies in the crush map? Is it intentional
> or
> just a remnant of some action during cluster setup? 

you will as many hierarchies as the number of valid storage profiles applicable to the cluster

> 
> 2) Why does each hierarchy use different bucket naming scheme for hosts?
> 

As mentioned above it is based on the storage profile

> 3) Which component created each hierarchy?

it will be created by the ceph provider after the cluster is created

> 
> 4) Why does only one hierarchy have OSD's attached while the other doesn't?

This was due to the earlier implementation where default tree is ignored. Also calamari won't allow a OSD to be present inn two different hierarchies. hence the original tree will be empty if you take out all the OSDs from the the dafult to otheres 

> 
> That said, based on my current understanding of CRUSH cluster map, I don't
> think that makes any sense to have 2 hierarchies in cluster map like that.
> 
> Without a clear purpose, RHSC 2.0  shouldn't create such complicated and
> error prone configuration.

This is implemented based on the requirement where pools to the created on top specified OSDs grouped by the storage profile


As part of this patch, fixed the issue of ignoring default hierarchy. Still it is possible to see a blank default if the user move all the OSDs out from default

Comment 4 Lubos Trilety 2016-07-26 15:11:59 UTC

Tested on:
rhscon-core-0.0.36-1.el7scon.x86_64
rhscon-ui-0.0.50-1.el7scon.noarch
rhscon-ceph-0.0.36-1.el7scon.x86_64
rhscon-core-selinux-0.0.36-1.el7scon.noarch

I have several active storage profiles
# ceph osd tree
ID  WEIGHT  TYPE NAME                     UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-11 0.00999 root test                                                       
-10 0.00999     host dhcp-126-101-test                                      
  0 0.00999         osd.0                      up  1.00000          1.00000 
 -9 0.02998 root ec_test                                                    
 -6 0.00999     host dhcp-126-103-ec_test                                   
  4 0.00999         osd.4                      up  1.00000          1.00000 
 -7 0.00999     host dhcp-126-102-ec_test                                   
  3 0.00999         osd.3                      up  1.00000          1.00000 
 -8 0.00999     host dhcp-126-105-ec_test                                   
  6 0.00999         osd.6                      up  1.00000          1.00000 
 -1 0.03998 root default                                                    
 -2 0.00999     host dhcp-126-101                                           
  1 0.00999         osd.1                      up  1.00000          1.00000 
 -3 0.00999     host dhcp-126-102                                           
  2 0.00999         osd.2                      up  1.00000          1.00000 
 -4 0.00999     host dhcp-126-103                                           
  5 0.00999         osd.5                      up  1.00000          1.00000 
 -5 0.00999     host dhcp-126-105                                           
  7 0.00999         osd.7                      up  1.00000          1.00000

It works as it should, pools could be created, data could be stored etc from GUI and from CLI too.
However because of multiple hierarchies ceph statistics are not correct. On some places it seems that ceph counts with all osds available for a pool.

Comment 5 Martin Bukatovic 2016-07-27 08:46:21 UTC

(In reply to Nishanth Thomas from comment #2)
> (In reply to Martin Bukatovic from comment #1)
> > 1) Why do we have 2 cluster hierarchies in the crush map? Is it intentional
> > or
> > just a remnant of some action during cluster setup? 
> 
> you will as many hierarchies as the number of valid storage profiles
> applicable to the cluster

Ok, we will assume this is intended RHSC 2.0 design.

(In reply to Lubos Trilety from comment #4)
> Tested on:
> rhscon-core-0.0.36-1.el7scon.x86_64
> rhscon-ui-0.0.50-1.el7scon.noarch
> rhscon-ceph-0.0.36-1.el7scon.x86_64
> rhscon-core-selinux-0.0.36-1.el7scon.noarch
> 
> I have several active storage profiles
> # ceph osd tree
> ID  WEIGHT  TYPE NAME                     UP/DOWN REWEIGHT PRIMARY-AFFINITY 
> -11 0.00999 root test                                                       
> -10 0.00999     host dhcp-126-101-test                                      
>   0 0.00999         osd.0                      up  1.00000          1.00000 
>  -9 0.02998 root ec_test                                                    
>  -6 0.00999     host dhcp-126-103-ec_test                                   
>   4 0.00999         osd.4                      up  1.00000          1.00000 
>  -7 0.00999     host dhcp-126-102-ec_test                                   
>   3 0.00999         osd.3                      up  1.00000          1.00000 
>  -8 0.00999     host dhcp-126-105-ec_test                                   
>   6 0.00999         osd.6                      up  1.00000          1.00000 
>  -1 0.03998 root default                                                    
>  -2 0.00999     host dhcp-126-101                                           
>   1 0.00999         osd.1                      up  1.00000          1.00000 
>  -3 0.00999     host dhcp-126-102                                           
>   2 0.00999         osd.2                      up  1.00000          1.00000 
>  -4 0.00999     host dhcp-126-103                                           
>   5 0.00999         osd.5                      up  1.00000          1.00000 
>  -5 0.00999     host dhcp-126-105                                           
>   7 0.00999         osd.7                      up  1.00000          1.00000
> 
> It works as it should, pools could be created, data could be stored etc from
> GUI and from CLI too.
> However because of multiple hierarchies ceph statistics are not correct. On
> some places it seems that ceph counts with all osds available for a pool.

Since Nishanth stated that the current design is that dedicated cluster
hierarchy is maintained for each storage profile, I would consider this
behaviour correct and so I think it's ok to validate this BZ.

That said, personally I'm not completely sure current design of storage
profiles feature makes sense. But that would be a starting point for another
discussion (and another BZ).

Comment 6 Martin Bukatovic 2016-07-27 08:47:14 UTC

Based information provided in comment 4 and comment 5, moving to VERIFIED.

Comment 8 errata-xmlrpc 2016-08-23 19:56:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754

Note You need to log in before you can comment on or make changes to this bug.