Bug 1848134 - [Upgrades] OSP10 -> OSP13 ceph-ansible doesn't perform rolling_update after switch-to-containers because the cluster is left with NOUP flag set
Summary: [Upgrades] OSP10 -> OSP13 ceph-ansible doesn't perform rolling_update after s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.3
Hardware: All
OS: All
unspecified
high
Target Milestone: z6
: 3.3
Assignee: Guillaume Abrioux
QA Contact: Vasishta
URL:
Whiteboard:
Depends On:
Blocks: 1578730
TreeView+ depends on / blocked
 
Reported: 2020-06-17 18:44 UTC by Sergii Golovatiuk
Modified: 2020-08-18 18:06 UTC (History)
19 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.45-1.el7cp Ubuntu: ceph-ansible_3.2.45-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-18 18:05:58 UTC
Embargoed:


Attachments (Terms of Use)
ceph-install-workflow.log (3.30 MB, text/plain)
2020-06-26 17:27 UTC, Giulio Fidente
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 5425 0 None closed switch_to_containers: don't set noup flag 2021-02-10 03:02:05 UTC
Github ceph ceph-ansible pull 5460 0 None closed switch-to-containers: set and unset osd flags 2021-02-10 03:02:06 UTC
Red Hat Product Errata RHSA-2020:3504 0 None None None 2020-08-18 18:06:29 UTC

Description Sergii Golovatiuk 2020-06-17 18:44:42 UTC
Description of problem:

Our CI spotted the issue [1] which may happen at customer's environment as well. Our environment was in HEALTH_WARN state. PGs were doing scrubbing so the actual state was active+clean+scrubbing+ which is quite fine to check the consistency, so running

# ceph osd set noout
# ceph osd set noscrub
# ceph osd set nodeep-scrub

before upgrade helped as stated in [4]. However, these commands were not run during upgrade. Actually, [3] resolves the issue but it requires backport as ceph-ansible-3.2.43-1.el7cp.noarch doesn't have it.


[1] http://cougar11.scl.lab.tlv.redhat.com/DFG-upgrades-ffu-ffu-upgrade-10-13_director-rhel-virthost-3cont_2comp_3ceph-ipv6-vxlan-HA/36/undercloud-0.tar.gz?undercloud-0/var/log/mistral/ceph-install-workflow.log
[2] https://access.redhat.com/solutions/3362431
[3] https://github.com/ceph/ceph-ansible/commit/b91d60d38456f9e316bee3daeb2f72dda0315cae


How reproducible:


Steps to Reproduce:
1. Install OPS10, Start upgrade to OSP13. Perform scrubbing in the middle.

Actual results:
Ceph upgrade failed

Expected results:
Ceph upgrade happened


Additional info:
ceph-ansible-3.2.43-1.el7cp.noarch

Comment 1 John Fulton 2020-06-17 19:57:14 UTC
I see this as a request to backport the following to ceph-ansible 3.x

 https://github.com/ceph/ceph-ansible/commit/b91d60d38456f9e316bee3daeb2f72dda0315cae

Comment 3 John Fulton 2020-06-18 14:24:33 UTC
The backport is done https://github.com/ceph/ceph-ansible/pull/5425 it doesn't have a tag yet

Comment 12 Giulio Fidente 2020-06-26 17:27:17 UTC
Created attachment 1698946 [details]
ceph-install-workflow.log

Comment 31 errata-xmlrpc 2020-08-18 18:05:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3504


Note You need to log in before you can comment on or make changes to this bug.