Bug 1548071 - [CEE/SD][ceph-ansible][RHCS2]rolling-update.yml does not set noout,noscrub and nodeep-scrub flags
Summary: [CEE/SD][ceph-ansible][RHCS2]rolling-update.yml does not set noout,noscrub an...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible
Version: 2.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 2.5
Assignee: leseb
QA Contact: Vasishta
Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks: 1536401
TreeView+ depends on / blocked
 
Reported: 2018-02-22 16:10 UTC by Tomas Petr
Modified: 2018-07-26 18:07 UTC (History)
13 users (show)

Fixed In Version: RHEL: ceph-ansible-3.0.35-1.el7cp Ubuntu: ceph-ansible_3.0.35-2redhat1
Doc Type: Bug Fix
Doc Text:
.Relocated some OSD options the `rolling-update.yml` Ceph Ansible playbook Previously, when doing a minor Ceph upgrade, for example, upgrading version 10.2.9 to 10.2.10, the `noout`, `noscrub` and `nodeep-scrub` OSD options did not get applied. Since a daemon does not exist for these versions, the `mgr` section in the `rolling-update.yml` file was skipped. With this release, the OSD options are set properly after all the Ceph Monitors have been upgraded.
Clone Of:
Environment:
Last Closed: 2018-07-26 18:06:41 UTC
Target Upstream Version:


Attachments (Terms of Use)
moving tasks to mons post_task section resolves this issue (17.19 KB, text/x-vhdl)
2018-02-22 18:00 UTC, Tomas Petr
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2261 None None None 2018-07-26 18:07:49 UTC
Github ceph ceph-ansible pull 2594 None None None 2018-05-16 14:13:51 UTC
Red Hat Knowledge Base (Solution) 3362431 None None None 2018-02-23 07:22:21 UTC

Description Tomas Petr 2018-02-22 16:10:30 UTC
Description of problem:
rolling-update.yml does not set noout,noscrub and nodeep-scrub flags, because this is now part of post-task for update ceph-mgr, which are skipped for RHCS 2.x version

This causes fail of the rolling-update for waiting on pgs get active+clean
AND OSDs may get marked out if updating packages takes too long.

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.25-1.el7cp.noarch

How reproducible:
Always

Steps to Reproduce:
1. install ceph-ansible-3.0.25-1.el7cp.noarch
2. run rolling-update.yml
3. watch noout,noscrub and nodeep-scrub flags not being set for update OSDs

Actual results:
noout,noscrub and nodeep-scrub flags not being set for update OSDs

Expected results:
noout,noscrub and nodeep-scrub flags are being set for update OSDs

Additional info:

Comment 3 Tomas Petr 2018-02-22 16:12:33 UTC
the set of flags should move from mgr section to mon section post tasks or get section of its own

Currently in:
-----------
- name: upgrade ceph mgr node

  vars:
    upgrade_ceph_packages: True

  hosts:
    - "{{ mgr_group_name|default('mgrs') }}"

  serial: 1
  become: True

  pre_tasks:
    # this task has a failed_when: false to handle the scenario where no mgr existed before the upgrade
    - name: stop ceph mgr
      systemd:
        name: ceph-mgr@{{ ansible_hostname }}
        state: stopped
        enabled: yes
      failed_when: false
      when:
        - not containerized_deployment

  roles:
    - ceph-defaults
    - { role: ceph-common, when: not containerized_deployment }
    - { role: ceph-docker-common, when: containerized_deployment }
    - ceph-config
    - { role: ceph-mgr,
        when: "(ceph_release_num[ceph_release] >= ceph_release_num.luminous) or
               (ceph_release_num[ceph_release] < ceph_release_num.luminous and rolling_update)" }

  post_tasks:   #<----------------------
    - name: start ceph mgr
      systemd:
        name: ceph-mgr@{{ ansible_hostname }}
        state: started
        enabled: yes
      when:
        - not containerized_deployment

    - name: restart containerized ceph mgr
      systemd:
        name: ceph-mgr@{{ ansible_hostname }}
        state: restarted
        enabled: yes
        daemon_reload: yes
      when:
        - containerized_deployment

    - name: set osd flags   #<----------------------
      command: ceph --cluster {{ cluster }} osd set {{ item }}
      with_items:
        - noout
        - noscrub
        - nodeep-scrub
      delegate_to: "{{ groups[mon_group_name][0] }}"
      when: not containerized_deployment

    - name: set containerized osd flags
      command: |
        docker exec ceph-mon-{{ hostvars[groups[mon_group_name][0]]['ansible_hostname'] }} ceph --cluster {{ cluster }} osd set {{ item }}
      with_items:
        - noout
        - noscrub
        - nodeep-scrub
      delegate_to: "{{ groups[mon_group_name][0] }}"
      when: containerized_deployment
-----------

Comment 4 Tomas Petr 2018-02-22 18:00:17 UTC
Created attachment 1399495 [details]
moving tasks to mons post_task section resolves this issue

Comment 6 leseb 2018-04-18 14:25:01 UTC
I'm looking into this, Tomas do you expect some kind of backport on 2.5 for this?

Comment 7 Tomas Petr 2018-04-19 10:26:48 UTC
(In reply to leseb from comment #6)
> I'm looking into this, Tomas do you expect some kind of backport on 2.5 for
> this?

Hi Seb,
as we are now have pretty much only one ceph-ansible for RHCS 2.x and RHCS 3.x, the rolling update could be tuned to fit for both versions.
As for RHCS 2.x the "mgrs" section gets skipped ,the flags are not set before update "osds".
In the attachment is rolling_update.yml which has setting the flags in "mons" post_task section, which resolves this issue for RHCS 2.x and has no impact on functionality in case of RHCS 3.x.

Or any other solution that will set the noout,  noscrub, nodeep-scrub flags before update osds regardless the Ceph version would be fine.

Comment 8 leseb 2018-05-16 14:13:51 UTC
Sorry, Tomas for not taking care of this one earlier. I didn't realize you even sent a patch in one of your comment. Thanks for that. Now I've sent a PR to fix this with a few modifications to your patch. I've also added you as a co-author to the commit.

Thanks.

Comment 13 Vasishta 2018-07-24 15:33:09 UTC
Working fine with ceph-ansible-3.0.39-1.el7cp.noarch and 3.0.36-2redhat1

Moving to VERIFIED state.

Regards,
Vasishta

Comment 15 errata-xmlrpc 2018-07-26 18:06:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2261


Note You need to log in before you can comment on or make changes to this bug.