1848134 – [Upgrades] OSP10 -> OSP13 ceph-ansible doesn't perform rolling_update after switch-to-containers because the cluster is left with NOUP flag set

Bug 1848134 - [Upgrades] OSP10 -> OSP13 ceph-ansible doesn't perform rolling_update after switch-to-containers because the cluster is left with NOUP flag set

Summary: [Upgrades] OSP10 -> OSP13 ceph-ansible doesn't perform rolling_update after s...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.3
Hardware:	All
OS:	All
Priority:	unspecified
Severity:	high
Target Milestone:	z6
Target Release:	3.3
Assignee:	Guillaume Abrioux
QA Contact:	Vasishta
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1578730
TreeView+	depends on / blocked

Reported:	2020-06-17 18:44 UTC by Sergii Golovatiuk
Modified:	2020-08-18 18:06 UTC (History)
CC List:	19 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.2.45-1.el7cp Ubuntu: ceph-ansible_3.2.45-2redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-18 18:05:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ceph-install-workflow.log (3.30 MB, text/plain) 2020-06-26 17:27 UTC, Giulio Fidente	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 5425	None	closed	switch_to_containers: don't set noup flag	2021-02-10 03:02:05 UTC
Github	ceph ceph-ansible pull 5460	None	closed	switch-to-containers: set and unset osd flags	2021-02-10 03:02:06 UTC
Red Hat Product Errata	RHSA-2020:3504	None	None	None	2020-08-18 18:06:29 UTC

Description Sergii Golovatiuk 2020-06-17 18:44:42 UTC

Description of problem:

Our CI spotted the issue [1] which may happen at customer's environment as well. Our environment was in HEALTH_WARN state. PGs were doing scrubbing so the actual state was active+clean+scrubbing+ which is quite fine to check the consistency, so running

# ceph osd set noout
# ceph osd set noscrub
# ceph osd set nodeep-scrub

before upgrade helped as stated in [4]. However, these commands were not run during upgrade. Actually, [3] resolves the issue but it requires backport as ceph-ansible-3.2.43-1.el7cp.noarch doesn't have it.


[1] http://cougar11.scl.lab.tlv.redhat.com/DFG-upgrades-ffu-ffu-upgrade-10-13_director-rhel-virthost-3cont_2comp_3ceph-ipv6-vxlan-HA/36/undercloud-0.tar.gz?undercloud-0/var/log/mistral/ceph-install-workflow.log
[2] https://access.redhat.com/solutions/3362431
[3] https://github.com/ceph/ceph-ansible/commit/b91d60d38456f9e316bee3daeb2f72dda0315cae


How reproducible:


Steps to Reproduce:
1. Install OPS10, Start upgrade to OSP13. Perform scrubbing in the middle.

Actual results:
Ceph upgrade failed

Expected results:
Ceph upgrade happened


Additional info:
ceph-ansible-3.2.43-1.el7cp.noarch

Comment 1 John Fulton 2020-06-17 19:57:14 UTC

I see this as a request to backport the following to ceph-ansible 3.x

 https://github.com/ceph/ceph-ansible/commit/b91d60d38456f9e316bee3daeb2f72dda0315cae

Comment 3 John Fulton 2020-06-18 14:24:33 UTC

The backport is done https://github.com/ceph/ceph-ansible/pull/5425 it doesn't have a tag yet

Comment 12 Giulio Fidente 2020-06-26 17:27:17 UTC

Created attachment 1698946 [details]
ceph-install-workflow.log

Comment 31 errata-xmlrpc 2020-08-18 18:05:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3504

Note You need to log in before you can comment on or make changes to this bug.