Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 1589114

Summary: [UPGRADE] After 12 (jewel) -> 13 (luminous) upgrade all Ceph MDS services are active
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Yogev Rabl <yrabl>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1CC: aschoen, ceph-eng-bugs, gabrioux, gfidente, gmeno, hnallurv, jdurgin, johfulto, lhh, mburns, nthomas, shan, srevivo, tchandra, tserlin, yrabl
Target Milestone: z2   
Target Release: 3.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.1.0-0.1.rc7.el7cp Ubuntu: ceph-ansible_3.1.0~rc7-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-27 05:15:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1578730    

Description Yogev Rabl 2018-06-08 12:32:29 UTC
Description of problem:
The results of an upgrade with MDS services running on the controller nodes is that the Ceph Ansible, due to its limitations, is setting all of the MDS services as active, instead of 1 active and 2 passive.

After the upgrade the Ceph cluster status is:
$ sudo ceph -s
  cluster:
    id:     c18db176-6a52-11e8-818d-525400af03d6
    health: HEALTH_WARN
            insufficient standby MDS daemons available
            clock skew detected on mon.controller-0

  services:
    mon: 3 daemons, quorum controller-1,controller-2,controller-0
    mgr: controller-2(active), standbys: controller-0, controller-1
    mds: cephfs-3/3/3 up  {0=controller-0=up:active,1=controller-2=up:active,2=controller-1=up:active}
    osd: 5 osds: 5 up, 5 in

  data:
    pools:   8 pools, 288 pgs
    objects: 57 objects, 192 kB
    usage:   549 MB used, 99235 MB / 99784 MB avail
    pgs:     288 active+clean


Version-Release number of selected component (if applicable):
ceph-ansible-3.1.0-0.1.rc3.el7cp.noarch
puppet-tripleo-8.3.2-6.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch
openstack-tripleo-common-8.6.1-19.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP 12 overcloud with MDS services
2. Upgrade to OSP 13 and Ceph 3.x
3. Validate the status of the Ceph cluster

Actual results:
As showed in the description, the Ceph cluster status is on Warning with 3 active MDS services

Expected results:
The default overcloud status for MDS services, with 1 active and 2 standby

Additional info:

Comment 2 John Fulton 2018-06-11 17:48:00 UTC
More details on this at:

 https://bugzilla.redhat.com/show_bug.cgi?id=1415236#c9

Comment 9 Giridhar Ramaraju 2019-08-05 13:06:34 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 10 Giridhar Ramaraju 2019-08-05 13:09:12 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 11 Yogev Rabl 2019-10-02 15:08:32 UTC
(In reply to Giridhar Ramaraju from comment #10)
> Updating the QA Contact to a Hemant. Hemant will be rerouting them to the
> appropriate QE Associate. 
> 
> Regards,
> Giri

The validation from TripleO side is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1757570.

The Ceph QE can validate this bug by upgrading Ceph 2.5 -> Ceph 3.1 with MDS running.