Description of problem: Our CI spotted the issue [1] which may happen at customer's environment as well. Our environment was in HEALTH_WARN state. PGs were doing scrubbing so the actual state was active+clean+scrubbing+ which is quite fine to check the consistency, so running # ceph osd set noout # ceph osd set noscrub # ceph osd set nodeep-scrub before upgrade helped as stated in [4]. However, these commands were not run during upgrade. Actually, [3] resolves the issue but it requires backport as ceph-ansible-3.2.43-1.el7cp.noarch doesn't have it. [1] http://cougar11.scl.lab.tlv.redhat.com/DFG-upgrades-ffu-ffu-upgrade-10-13_director-rhel-virthost-3cont_2comp_3ceph-ipv6-vxlan-HA/36/undercloud-0.tar.gz?undercloud-0/var/log/mistral/ceph-install-workflow.log [2] https://access.redhat.com/solutions/3362431 [3] https://github.com/ceph/ceph-ansible/commit/b91d60d38456f9e316bee3daeb2f72dda0315cae How reproducible: Steps to Reproduce: 1. Install OPS10, Start upgrade to OSP13. Perform scrubbing in the middle. Actual results: Ceph upgrade failed Expected results: Ceph upgrade happened Additional info: ceph-ansible-3.2.43-1.el7cp.noarch
I see this as a request to backport the following to ceph-ansible 3.x https://github.com/ceph/ceph-ansible/commit/b91d60d38456f9e316bee3daeb2f72dda0315cae
The backport is done https://github.com/ceph/ceph-ansible/pull/5425 it doesn't have a tag yet
Created attachment 1698946 [details] ceph-install-workflow.log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 3.3 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3504