1548071 – [CEE/SD][ceph-ansible][RHCS2]rolling-update.yml does not set noout,noscrub and nodeep-scrub flags

Bug 1548071 - [CEE/SD][ceph-ansible][RHCS2]rolling-update.yml does not set noout,noscrub and nodeep-scrub flags

Summary: [CEE/SD][ceph-ansible][RHCS2]rolling-update.yml does not set noout,noscrub an...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	2.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z1
Target Release:	2.5
Assignee:	Sébastien Han
QA Contact:	Vasishta
Docs Contact:	Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks:	1536401
TreeView+	depends on / blocked

Reported:	2018-02-22 16:10 UTC by Tomas Petr
Modified:	2021-09-09 13:18 UTC (History)
CC List:	13 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.0.35-1.el7cp Ubuntu: ceph-ansible_3.0.35-2redhat1
Doc Type:	Bug Fix
Doc Text:	.Relocated some OSD options the `rolling-update.yml` Ceph Ansible playbook Previously, when doing a minor Ceph upgrade, for example, upgrading version 10.2.9 to 10.2.10, the `noout`, `noscrub` and `nodeep-scrub` OSD options did not get applied. Since a daemon does not exist for these versions, the `mgr` section in the `rolling-update.yml` file was skipped. With this release, the OSD options are set properly after all the Ceph Monitors have been upgraded.
Clone Of:
Environment:
Last Closed:	2018-07-26 18:06:41 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
moving tasks to mons post_task section resolves this issue (17.19 KB, text/x-vhdl) 2018-02-22 18:00 UTC, Tomas Petr	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 2594	None	closed	rolling_update: move osd flag section	2021-02-10 03:03:05 UTC
Red Hat Issue Tracker	RHCEPH-1548	None	None	None	2021-09-09 13:18:53 UTC
Red Hat Knowledge Base (Solution)	3362431	None	None	None	2018-02-23 07:22:21 UTC
Red Hat Product Errata	RHSA-2018:2261	None	None	None	2018-07-26 18:07:49 UTC

Description Tomas Petr 2018-02-22 16:10:30 UTC

Description of problem:
rolling-update.yml does not set noout,noscrub and nodeep-scrub flags, because this is now part of post-task for update ceph-mgr, which are skipped for RHCS 2.x version

This causes fail of the rolling-update for waiting on pgs get active+clean
AND OSDs may get marked out if updating packages takes too long.

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.25-1.el7cp.noarch

How reproducible:
Always

Steps to Reproduce:
1. install ceph-ansible-3.0.25-1.el7cp.noarch
2. run rolling-update.yml
3. watch noout,noscrub and nodeep-scrub flags not being set for update OSDs

Actual results:
noout,noscrub and nodeep-scrub flags not being set for update OSDs

Expected results:
noout,noscrub and nodeep-scrub flags are being set for update OSDs

Additional info:

Comment 3 Tomas Petr 2018-02-22 16:12:33 UTC

the set of flags should move from mgr section to mon section post tasks or get section of its own

Currently in:
-----------
- name: upgrade ceph mgr node

  vars:
    upgrade_ceph_packages: True

  hosts:
    - "{{ mgr_group_name|default('mgrs') }}"

  serial: 1
  become: True

  pre_tasks:
    # this task has a failed_when: false to handle the scenario where no mgr existed before the upgrade
    - name: stop ceph mgr
      systemd:
        name: ceph-mgr@{{ ansible_hostname }}
        state: stopped
        enabled: yes
      failed_when: false
      when:
        - not containerized_deployment

  roles:
    - ceph-defaults
    - { role: ceph-common, when: not containerized_deployment }
    - { role: ceph-docker-common, when: containerized_deployment }
    - ceph-config
    - { role: ceph-mgr,
        when: "(ceph_release_num[ceph_release] >= ceph_release_num.luminous) or
               (ceph_release_num[ceph_release] < ceph_release_num.luminous and rolling_update)" }

  post_tasks:   #<----------------------
    - name: start ceph mgr
      systemd:
        name: ceph-mgr@{{ ansible_hostname }}
        state: started
        enabled: yes
      when:
        - not containerized_deployment

    - name: restart containerized ceph mgr
      systemd:
        name: ceph-mgr@{{ ansible_hostname }}
        state: restarted
        enabled: yes
        daemon_reload: yes
      when:
        - containerized_deployment

    - name: set osd flags   #<----------------------
      command: ceph --cluster {{ cluster }} osd set {{ item }}
      with_items:
        - noout
        - noscrub
        - nodeep-scrub
      delegate_to: "{{ groups[mon_group_name][0] }}"
      when: not containerized_deployment

    - name: set containerized osd flags
      command: |
        docker exec ceph-mon-{{ hostvars[groups[mon_group_name][0]]['ansible_hostname'] }} ceph --cluster {{ cluster }} osd set {{ item }}
      with_items:
        - noout
        - noscrub
        - nodeep-scrub
      delegate_to: "{{ groups[mon_group_name][0] }}"
      when: containerized_deployment
-----------

Comment 4 Tomas Petr 2018-02-22 18:00:17 UTC

Created attachment 1399495 [details]
moving tasks to mons post_task section resolves this issue

Comment 6 Sébastien Han 2018-04-18 14:25:01 UTC

I'm looking into this, Tomas do you expect some kind of backport on 2.5 for this?

Comment 7 Tomas Petr 2018-04-19 10:26:48 UTC

(In reply to leseb from comment #6)
> I'm looking into this, Tomas do you expect some kind of backport on 2.5 for
> this?

Hi Seb,
as we are now have pretty much only one ceph-ansible for RHCS 2.x and RHCS 3.x, the rolling update could be tuned to fit for both versions.
As for RHCS 2.x the "mgrs" section gets skipped ,the flags are not set before update "osds".
In the attachment is rolling_update.yml which has setting the flags in "mons" post_task section, which resolves this issue for RHCS 2.x and has no impact on functionality in case of RHCS 3.x.

Or any other solution that will set the noout,  noscrub, nodeep-scrub flags before update osds regardless the Ceph version would be fine.

Comment 8 Sébastien Han 2018-05-16 14:13:51 UTC

Sorry, Tomas for not taking care of this one earlier. I didn't realize you even sent a patch in one of your comment. Thanks for that. Now I've sent a PR to fix this with a few modifications to your patch. I've also added you as a co-author to the commit.

Thanks.

Comment 13 Vasishta 2018-07-24 15:33:09 UTC

Working fine with ceph-ansible-3.0.39-1.el7cp.noarch and 3.0.36-2redhat1

Moving to VERIFIED state.

Regards,
Vasishta

Comment 15 errata-xmlrpc 2018-07-26 18:06:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2261

Note You need to log in before you can comment on or make changes to this bug.