Bug 1848134

Summary: [Upgrades] OSP10 -> OSP13 ceph-ansible doesn't perform rolling_update after switch-to-containers because the cluster is left with NOUP flag set
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Sergii Golovatiuk <sgolovat>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3CC: anharris, aschoen, ceph-eng-bugs, ceph-qe-bugs, dsavinea, fpantano, gcharot, gfidente, gmeno, jfrancoa, johfulto, jpretori, mbollo, morazi, nthomas, pgrist, tchandra, tserlin, ykaul
Target Milestone: z6   
Target Release: 3.3   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.45-1.el7cp Ubuntu: ceph-ansible_3.2.45-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-18 18:05:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1578730    
Attachments:
Description Flags
ceph-install-workflow.log none

Description Sergii Golovatiuk 2020-06-17 18:44:42 UTC
Description of problem:

Our CI spotted the issue [1] which may happen at customer's environment as well. Our environment was in HEALTH_WARN state. PGs were doing scrubbing so the actual state was active+clean+scrubbing+ which is quite fine to check the consistency, so running

# ceph osd set noout
# ceph osd set noscrub
# ceph osd set nodeep-scrub

before upgrade helped as stated in [4]. However, these commands were not run during upgrade. Actually, [3] resolves the issue but it requires backport as ceph-ansible-3.2.43-1.el7cp.noarch doesn't have it.


[1] http://cougar11.scl.lab.tlv.redhat.com/DFG-upgrades-ffu-ffu-upgrade-10-13_director-rhel-virthost-3cont_2comp_3ceph-ipv6-vxlan-HA/36/undercloud-0.tar.gz?undercloud-0/var/log/mistral/ceph-install-workflow.log
[2] https://access.redhat.com/solutions/3362431
[3] https://github.com/ceph/ceph-ansible/commit/b91d60d38456f9e316bee3daeb2f72dda0315cae


How reproducible:


Steps to Reproduce:
1. Install OPS10, Start upgrade to OSP13. Perform scrubbing in the middle.

Actual results:
Ceph upgrade failed

Expected results:
Ceph upgrade happened


Additional info:
ceph-ansible-3.2.43-1.el7cp.noarch

Comment 1 John Fulton 2020-06-17 19:57:14 UTC
I see this as a request to backport the following to ceph-ansible 3.x

 https://github.com/ceph/ceph-ansible/commit/b91d60d38456f9e316bee3daeb2f72dda0315cae

Comment 3 John Fulton 2020-06-18 14:24:33 UTC
The backport is done https://github.com/ceph/ceph-ansible/pull/5425 it doesn't have a tag yet

Comment 12 Giulio Fidente 2020-06-26 17:27:17 UTC
Created attachment 1698946 [details]
ceph-install-workflow.log

Comment 31 errata-xmlrpc 2020-08-18 18:05:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3504