Bug 1548026 - dedicated monitor node scale up or monitor replacement causes stack update to fail and takes monitors out of quorum
Summary: dedicated monitor node scale up or monitor replacement causes stack update to...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z2
: 13.0 (Queens)
Assignee: John Fulton
QA Contact: Yogev Rabl
URL:
Whiteboard:
: 1553676 (view as bug list)
Depends On:
Blocks: 1600202
TreeView+ depends on / blocked
 
Reported: 2018-02-22 14:48 UTC by Yogev Rabl
Modified: 2022-03-13 14:47 UTC (History)
17 users (show)

Fixed In Version: openstack-tripleo-common-8.6.3-4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1553676 1613847 (view as bug list)
Environment:
Last Closed: 2018-08-29 16:34:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ceph-install-workflow.log from the dedicated monitor nodes (2.51 MB, text/plain)
2018-02-22 19:34 UTC, Yogev Rabl
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1769769 0 None None None 2018-05-07 23:35:47 UTC
OpenStack gerrit 567782 0 None master: MERGED tripleo-common: Persist ceph-ansible fetch_directory using mistral (I4b576a6e7fbfb18fa13221e2d080bf7876a8303e) 2018-07-19 01:33:49 UTC
OpenStack gerrit 583229 0 None stable/queens: MERGED tripleo-common: Persist ceph-ansible fetch_directory using mistral (I4b576a6e7fbfb18fa13221e2d080bf7876a8303e) 2018-07-19 01:33:43 UTC
Red Hat Issue Tracker OSP-13568 0 None None None 2022-03-13 14:47:40 UTC
Red Hat Product Errata RHBA-2018:2574 0 None None None 2018-08-29 16:35:49 UTC

Description Yogev Rabl 2018-02-22 14:48:28 UTC
Description of problem:
A scale up of the number of dedicate Ceph monitor nodes failed. After a successful deployment of a Ceph cluster as a part of the Overcloud with dedicated monitor node, the cluster lost its OSDs count and failed to deploy 1 of the additional Ceph monitors.

The result of the scale is:
# ceph -s
  cluster:
    id:     6ed05b60-1655-11e8-99ef-525400a0203f
    health: HEALTH_WARN
            1/3 mons down, quorum monitor-1,monitor-2

  services:
    mon: 3 daemons, quorum monitor-1,monitor-2, out of quorum: monitor-0
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

# ceph osd tree
ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
-1            0 root default

[heat-admin@ceph-0 ~]$ sudo -i
[root@ceph-0 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    252:0    0   20G  0 disk
├─vda1 252:1    0    1M  0 part
└─vda2 252:2    0   20G  0 part /
vdb    252:16   0   40G  0 disk
├─vdb1 252:17   0 39.5G  0 part
└─vdb2 252:18   0  512M  0 part
vdc    252:32   0   40G  0 disk
├─vdc1 252:33   0 39.5G  0 part
└─vdc2 252:34   0  512M  0 part

[root@ceph-0 ~]# docker ps
CONTAINER ID        IMAGE                                                     COMMAND             CREATED             STATUS              PORTS               NAMES
70157ddf180a        192.168.24.1:8787/rhosp13/openstack-cron:2018-02-14.1     "kolla_start"       17 hours ago        Up 17 hours                             logrotate_crond
6b259bb37f59        registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest   "/entrypoint.sh"    20 hours ago        Up 20 hours                             ceph-osd-ceph-0-vdc
ec073eb81645        registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest   "/entrypoint.sh"    20 hours ago        Up 20 hours                             ceph-osd-ceph-0-vdb


Version-Release number of selected component (if applicable):
ceph-ansible-3.0.25-1.el7cp.noarch

How reproducible:
unknown

Steps to Reproduce:
1. Deploy an overcloud with 3 controller nodes, 1 dedicated monitor node, 1 compute node and 3 Ceph storage nodes (each with 2 osds) 
2. update the overcloud by add 2 additional dedicated Ceph monitor nodes


Actual results:
The update failed and cause the Ceph cluster 

Expected results:
The scale up is successful

Additional info:

Comment 1 Giulio Fidente 2018-02-22 16:06:07 UTC
Seems an issue with ceph-ansible, can you attach the ceph-install-workflow.log file?

Can you also paste/link to the specific deploy and scaleup commands used?

Comment 2 Yogev Rabl 2018-02-22 19:34:48 UTC
Created attachment 1399508 [details]
ceph-install-workflow.log from the dedicated monitor nodes

Please follow the last installations in the log

Comment 3 Giulio Fidente 2018-02-23 09:57:17 UTC
Thanks, looks an issue with the containers restart.

Comment 7 Sébastien Han 2018-03-28 17:17:35 UTC
Yogev, please give us access to the env or tell us why the ceph-mgr is not running.
You don't see the OSDs/Pools/PGs because the ceph-mgr is not started.

When it comes to the ceph-mon scale issue, unfortunately, the logs are not enough so we need access to the env.

Thanks in advance.

Comment 8 Yogev Rabl 2018-03-28 17:34:10 UTC
(In reply to leseb from comment #7)
> Yogev, please give us access to the env or tell us why the ceph-mgr is not
> running.
> You don't see the OSDs/Pools/PGs because the ceph-mgr is not started.
> 
> When it comes to the ceph-mon scale issue, unfortunately, the logs are not
> enough so we need access to the env.
> 
> Thanks in advance.

There's an environment ready for you to test

Comment 9 Sébastien Han 2018-03-28 17:50:37 UTC
Thanks Yogev, I'm presently looking into this.

Comment 10 Sébastien Han 2018-03-28 20:54:57 UTC
After some investigation, it appears that ooo purges the fetch_directory at the end of the play, see https://github.com/openstack/tripleo-common/blob/master/workbooks/ceph-ansible.yaml#L157-L159. This directory is critical as it gives an understanding of what's already present or not.

In a containerized scenario, we rely on the content of this directory to deploy each monitor. When you have 3 monitors in a single play it works since the fetch_directory exists till the end. On a scale-up scenario, the fetch_directory does not exist so the during their bootstrap the new monitors believe they are new ones (no keys were copied)

John Fulton and Yogev are working on removing the purge and validate if this is the culprit.

FYI: there are ideas to remove the need for fetch_directory but there are just ideas at the moment. It has never been explicitly said that this directory was optional.

Comment 11 John Fulton 2018-03-28 21:31:26 UTC
I wanted to share some additional details from when Seb looked into this with us. As per the reproduction steps: 

1. Deploy an overcloud with 3 controller nodes, 1 dedicated monitor node, 1 compute node and 3 Ceph storage nodes (each with 2 osds) 

During step1 above there was one mon in the ansible inventory:

mons:
  hosts:
    192.168.24.16: {}

2. update the overcloud by add 2 additional dedicated Ceph monitor nodes

During this step the other two mons, .10 and .20, were brought up and added the ceph-ansible inventory: 

mons:
  hosts:
    192.168.24.10: {}
    192.168.24.16: {}
    192.168.24.20: {}

The ceph.conf that was generated during the run contained the following:

mon initial members = monitor-1,monitor-2,monitor-0
mon host = 172.17.3.18,172.17.3.14,172.17.3.12

The 172.17 addresses map one-to-one to the 192.168 addresses and simply refer to the ceph-storage network. 

The root cause seems to be that the order of the monitors as indicated by the mon_initial_members. What should have happened is that ceph-ansible should have gone back to the original monitor first. As per Seb's comment #10 the state of the deployment is preserved in the fetch directory. 

As TripleO currently creates and destroys this directory per ceph-ansible run: 

https://github.com/openstack/tripleo-common/blob/master/workbooks/ceph-ansible.yaml#L157-L159

If this ends up being the culprit, then the above workbook may need some integration with the undercloud swift to export the fetch directory after the first run of the playbook and import the fetch directory from swift before each run of the playbook to preserve that state information. 

Next step, modify the workbook to hard code that directory. 

NeedInfo Yogev: can you please set this test up again and ping me before you run step 1 of the reproduction steps so that I can modify the workbook to not delete the fetch directory in your env?

Comment 15 John Fulton 2018-03-30 00:07:09 UTC
Preserving the fetch directory during initial deployment and then doing the monitor scale up with the existing fetch directory worked and I didn't hit the issues reported. 

The order of monitors in the ceph.conf don't seem any different than during the initial investigation of the issue:

[root@monitor-0 ~]# grep mon /etc/ceph/ceph.conf 
mon host = 172.17.3.12,172.17.3.18,172.17.3.15
mon initial members = monitor-1,monitor-0,monitor-2
[root@monitor-0 ~]# 

However, the monitors are in quorum [1] and the stack update succeeded [2]. 

Next step: modify the workflow to preserve the fetch directory. 

[1]
[root@monitor-0 ~]# ceph -s
  cluster:
    id:     b667b35e-3353-11e8-8fad-525400a45353
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum monitor-1,monitor-2,monitor-0
    mgr: monitor-0(active), standbys: monitor-2, monitor-1
    osd: 5 osds: 5 up, 5 in
 
  data:
    pools:   6 pools, 192 pgs
    objects: 0 objects, 0 bytes
    usage:   541 MB used, 99243 MB / 99784 MB avail
    pgs:     192 active+clean
 
[root@monitor-0 ~]#

[2]

2018-03-29 22:12:12Z [AllNodesDeploySteps]: UPDATE_COMPLETE  state changed
2018-03-29 22:12:18Z [overcloud]: UPDATE_COMPLETE  Stack UPDATE completed successfully

 Stack overcloud UPDATE_COMPLETE 

Started Mistral Workflow tripleo.deployment.v1.get_horizon_url. Execution ID: 08d1c10e-b8fc-4bd6-95fd-82a7fd89df92
Overcloud Endpoint: http://10.0.0.108:5000/
Overcloud Horizon Dashboard URL: http://10.0.0.108:80/dashboard
Overcloud rc file: /home/stack/overcloudrc
Overcloud Deployed
(undercloud) [stack@undercloud-0 ~]$

Comment 17 Giulio Fidente 2018-03-30 10:58:13 UTC
*** Bug 1553676 has been marked as a duplicate of this bug. ***

Comment 20 John Fulton 2018-07-12 14:55:25 UTC
*** Bug 1600202 has been marked as a duplicate of this bug. ***

Comment 21 John Fulton 2018-07-12 14:59:59 UTC
As per bug 1600202 this also affects monitor replacement.

Comment 23 John Fulton 2018-07-17 13:21:52 UTC
Fix merged in master branch [1] backport to queens in review [2].

[1] https://review.openstack.org/#/c/567782
[2] https://review.openstack.org/#/c/583229

Comment 24 John Fulton 2018-07-18 12:11:39 UTC
https://review.openstack.org/#/c/583229 has merged

Comment 34 Joanne O'Flynn 2018-08-15 07:39:39 UTC
This bug is marked for inclusion in the errata but does not currently contain draft documentation text. To ensure the timely release of this advisory please provide draft documentation text for this bug as soon as possible.

If you do not think this bug requires errata documentation, set the requires_doc_text flag to "-".


To add draft documentation text:

* Select the documentation type from the "Doc Type" drop down field.

* A template will be provided in the "Doc Text" field based on the "Doc Type" value selected. Enter draft text in the "Doc Text" field.

Comment 35 Gal Amado 2018-08-19 07:55:05 UTC
Verified with :
core_puddle=2018-08-16.1
ceph-ansible-3.1.0-0.1.rc10.el7cp.noarch

deployed 3controllers, 2computes, 3 ceph and 1dedicated monitor nodes 
After scaling up to 2 dedicated monitors:

ceph -s
  cluster:
    id:     71492142-a2af-11e8-929f-525400fee3e1
    health: HEALTH_OK
 
  services:
    mon: 2 daemons, quorum monitor-1,monitor-0
    mgr: monitor-0(active), standbys: monitor-1
    osd: 15 osds: 15 up, 15 in
 
  data:
    pools:   5 pools, 160 pgs
    objects: 5588 objects, 301 MB
    usage:   2198 MB used, 155 GB / 157 GB avail
    pgs:     160 active+clean
 
[heat-admin@monitor-0 ~]$ ceph osd tree
ID CLASS WEIGHT  TYPE NAME       STATUS REWEIGHT PRI-AFF 
-1       0.15289 root default                            
-3       0.05096     host ceph-0                         
 0   hdd 0.01019         osd.0       up  1.00000 1.00000 
 4   hdd 0.01019         osd.4       up  1.00000 1.00000 
 7   hdd 0.01019         osd.7       up  1.00000 1.00000 
 9   hdd 0.01019         osd.9       up  1.00000 1.00000 
11   hdd 0.01019         osd.11      up  1.00000 1.00000 
-7       0.05096     host ceph-1                         
 1   hdd 0.01019         osd.1       up  1.00000 1.00000 
 3   hdd 0.01019         osd.3       up  1.00000 1.00000 
 6   hdd 0.01019         osd.6       up  1.00000 1.00000 
13   hdd 0.01019         osd.13      up  1.00000 1.00000 
14   hdd 0.01019         osd.14      up  1.00000 1.00000 
-5       0.05096     host ceph-2                         
 2   hdd 0.01019         osd.2       up  1.00000 1.00000 
 5   hdd 0.01019         osd.5       up  1.00000 1.00000 
 8   hdd 0.01019         osd.8       up  1.00000 1.00000 
10   hdd 0.01019         osd.10      up  1.00000 1.00000 
12   hdd 0.01019         osd.12      up  1.00000 1.00000 



sudo -i
[root@ceph-0 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    252:0    0   10G  0 disk 
├─vda1 252:1    0    1M  0 part 
└─vda2 252:2    0   10G  0 part /
vdb    252:16   0   11G  0 disk 
├─vdb1 252:17   0 10.5G  0 part 
└─vdb2 252:18   0  512M  0 part 
vdc    252:32   0   11G  0 disk 
├─vdc1 252:33   0 10.5G  0 part 
└─vdc2 252:34   0  512M  0 part 
vdd    252:48   0   11G  0 disk 
├─vdd1 252:49   0 10.5G  0 part 
└─vdd2 252:50   0  512M  0 part 
vde    252:64   0   11G  0 disk 
├─vde1 252:65   0 10.5G  0 part 
└─vde2 252:66   0  512M  0 part 
vdf    252:80   0   11G  0 disk 
├─vdf1 252:81   0 10.5G  0 part 
└─vdf2 252:82   0  512M  0 part 
[root@ceph-0 ~]# docker ps
CONTAINER ID        IMAGE                                                   COMMAND             CREATED             STATUS              PORTS               NAMES
a8efd64506e3        192.168.24.1:8787/rhosp13/openstack-cron:2018-08-16.1   "kolla_start"       21 hours ago        Up 21 hours                             logrotate_crond
f4f31ac96980        192.168.24.1:8787/rhceph:3-11                           "/entrypoint.sh"    23 hours ago        Up 23 hours                             ceph-osd-ceph-0-vdf
8c5ed83c9000        192.168.24.1:8787/rhceph:3-11                           "/entrypoint.sh"    23 hours ago        Up 23 hours                             ceph-osd-ceph-0-vde
92b0c6da5f78        192.168.24.1:8787/rhceph:3-11                           "/entrypoint.sh"    23 hours ago        Up 23 hours                             ceph-osd-ceph-0-vdd
678ad98caaf3        192.168.24.1:8787/rhceph:3-11                           "/entrypoint.sh"    23 hours ago        Up 23 hours                             ceph-osd-ceph-0-vdc
90d6ceed9e78        192.168.24.1:8787/rhceph:3-11                           "/entrypoint.sh"    23 hours ago        Up 23 hours                             ceph-osd-ceph-0-vdb

Comment 37 errata-xmlrpc 2018-08-29 16:34:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2574


Note You need to log in before you can comment on or make changes to this bug.