1589114 – [UPGRADE] After 12 (jewel) -> 13 (luminous) upgrade all Ceph MDS services are active

Bug 1589114 - [UPGRADE] After 12 (jewel) -> 13 (luminous) upgrade all Ceph MDS services are active

Summary: [UPGRADE] After 12 (jewel) -> 13 (luminous) upgrade all Ceph MDS services are...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	z2
Target Release:	3.3
Assignee:	Guillaume Abrioux
QA Contact:	Vasishta
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1578730
TreeView+	depends on / blocked

Reported:	2018-06-08 12:32 UTC by Yogev Rabl
Modified:	2020-01-27 05:15 UTC (History)
CC List:	16 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.1.0-0.1.rc7.el7cp Ubuntu: ceph-ansible_3.1.0~rc7-2redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-27 05:15:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 2719	0	'None'	closed	Backport of 2695 in stable-3.1	2020-01-27 05:13:54 UTC

Description Yogev Rabl 2018-06-08 12:32:29 UTC

Description of problem:
The results of an upgrade with MDS services running on the controller nodes is that the Ceph Ansible, due to its limitations, is setting all of the MDS services as active, instead of 1 active and 2 passive.

After the upgrade the Ceph cluster status is:
$ sudo ceph -s
  cluster:
    id:     c18db176-6a52-11e8-818d-525400af03d6
    health: HEALTH_WARN
            insufficient standby MDS daemons available
            clock skew detected on mon.controller-0

  services:
    mon: 3 daemons, quorum controller-1,controller-2,controller-0
    mgr: controller-2(active), standbys: controller-0, controller-1
    mds: cephfs-3/3/3 up  {0=controller-0=up:active,1=controller-2=up:active,2=controller-1=up:active}
    osd: 5 osds: 5 up, 5 in

  data:
    pools:   8 pools, 288 pgs
    objects: 57 objects, 192 kB
    usage:   549 MB used, 99235 MB / 99784 MB avail
    pgs:     288 active+clean


Version-Release number of selected component (if applicable):
ceph-ansible-3.1.0-0.1.rc3.el7cp.noarch
puppet-tripleo-8.3.2-6.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch
openstack-tripleo-common-8.6.1-19.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP 12 overcloud with MDS services
2. Upgrade to OSP 13 and Ceph 3.x
3. Validate the status of the Ceph cluster

Actual results:
As showed in the description, the Ceph cluster status is on Warning with 3 active MDS services

Expected results:
The default overcloud status for MDS services, with 1 active and 2 standby

Additional info:

Comment 2 John Fulton 2018-06-11 17:48:00 UTC

More details on this at:

 https://bugzilla.redhat.com/show_bug.cgi?id=1415236#c9

Comment 9 Giridhar Ramaraju 2019-08-05 13:06:34 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 10 Giridhar Ramaraju 2019-08-05 13:09:12 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 11 Yogev Rabl 2019-10-02 15:08:32 UTC

(In reply to Giridhar Ramaraju from comment #10)
> Updating the QA Contact to a Hemant. Hemant will be rerouting them to the
> appropriate QE Associate. 
> 
> Regards,
> Giri

The validation from TripleO side is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1757570.

The Ceph QE can validate this bug by upgrading Ceph 2.5 -> Ceph 3.1 with MDS running.

Note You need to log in before you can comment on or make changes to this bug.