1622688 – ceph-ansible should drop dependency on state information in the fetch directory

Bug 1622688 - ceph-ansible should drop dependency on state information in the fetch directory

Summary: ceph-ansible should drop dependency on state information in the fetch directory

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	4.0
Assignee:	Guillaume Abrioux
QA Contact:	Yogev Rabl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1624388 1769719 1809602
TreeView+	depends on / blocked

Reported:	2018-08-27 18:19 UTC by John Fulton
Modified:	2020-03-03 13:16 UTC (History)
CC List:	17 users (show)
Fixed In Version:	ceph-ansible-4.0.7-1.el8cp, ceph-ansible-4.0.7-1.el7cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-31 12:44:52 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 3481	'None'	closed	facts: clean code	2020-09-30 18:26:53 UTC
Github	ceph ceph-ansible pull 4875	None	closed	handler: fix bug	2020-09-30 18:26:52 UTC
Github	ceph ceph-ansible pull 4904	None	closed	mon: support replacing a mon	2020-09-30 18:26:52 UTC
Red Hat Product Errata	RHBA-2020:0312	None	None	None	2020-01-31 12:45:28 UTC

Description John Fulton 2018-08-27 18:19:50 UTC

In order to replace or scale ceph monitors the ceph-ansible fetch_directory [1] from the original ceph cluster deployment must exist and be referenced by ceph-ansible during the ceph-ansible run to scale up or replace a monitor. If it does not exist then you will encounter bugs like bz 1600202 and bz 1548026.

It would be better if ceph-ansible could retrieve its state information from an existing ceph deployment and then act accordingly without needing to determine state information from the fetch_directory. This is a request for ceph-ansible to be able to handle monitor scale up or replacement without a fetch_directory. 

OSPd (TripleO) works around this bug by backing up the fetch_directory in swift and restoring it if necessary before running subsequent ceph-ansible operations and implementation details are linked from bz 1548026 and bz 1613847. However, in a future versions of TripleO there is a plan for the Swift service on the undercloud to be removed but this bug will be a blocker for that.


[1] https://github.com/ceph/ceph-ansible/blob/b7b8aba47bcf92d5c972dea37677205fa5f7b4a4/roles/ceph-config/tasks/main.yml#L30

Comment 11 Giridhar Ramaraju 2019-08-05 13:11:01 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 12 Giridhar Ramaraju 2019-08-05 13:12:02 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 16 John Fulton 2019-10-10 13:36:56 UTC

How to test this:

- Deploy OSP16 w/ ceph-ansible in fixed-in with 3 mons on the controllers
- Remove the fetch directory backup [1]
- Follow the procedure to replace a controller node [2] (this should include removing an existing mon and adding a new mon)
- Did you encounter the symptoms of bug 1600202? (if yes, we pass and I'll remove fetch directory management [3] from upstream of OpenStack TripleO U and T)

[1] 
source ~/stackrc
for F in $(swift list overcloud_ceph_ansible_fetch_dir); do
  swift delete overcloud_ceph_ansible_fetch_dir $F;
done

[2] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/html-single/director_installation_and_usage/index#replacing-controller-nodes

[3] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo-ceph-fetch-dir

Comment 23 Yaniv Kaul 2020-01-08 13:39:38 UTC

Any updates? Is this going to be fixed for RHCS 4.0?

Comment 30 Yogev Rabl 2020-01-29 17:44:50 UTC

verified

Comment 33 errata-xmlrpc 2020-01-31 12:44:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0312

Note You need to log in before you can comment on or make changes to this bug.