Description of problem: rolling-update.yml does not set noout,noscrub and nodeep-scrub flags, because this is now part of post-task for update ceph-mgr, which are skipped for RHCS 2.x version This causes fail of the rolling-update for waiting on pgs get active+clean AND OSDs may get marked out if updating packages takes too long. Version-Release number of selected component (if applicable): ceph-ansible-3.0.25-1.el7cp.noarch How reproducible: Always Steps to Reproduce: 1. install ceph-ansible-3.0.25-1.el7cp.noarch 2. run rolling-update.yml 3. watch noout,noscrub and nodeep-scrub flags not being set for update OSDs Actual results: noout,noscrub and nodeep-scrub flags not being set for update OSDs Expected results: noout,noscrub and nodeep-scrub flags are being set for update OSDs Additional info:
the set of flags should move from mgr section to mon section post tasks or get section of its own Currently in: ----------- - name: upgrade ceph mgr node vars: upgrade_ceph_packages: True hosts: - "{{ mgr_group_name|default('mgrs') }}" serial: 1 become: True pre_tasks: # this task has a failed_when: false to handle the scenario where no mgr existed before the upgrade - name: stop ceph mgr systemd: name: ceph-mgr@{{ ansible_hostname }} state: stopped enabled: yes failed_when: false when: - not containerized_deployment roles: - ceph-defaults - { role: ceph-common, when: not containerized_deployment } - { role: ceph-docker-common, when: containerized_deployment } - ceph-config - { role: ceph-mgr, when: "(ceph_release_num[ceph_release] >= ceph_release_num.luminous) or (ceph_release_num[ceph_release] < ceph_release_num.luminous and rolling_update)" } post_tasks: #<---------------------- - name: start ceph mgr systemd: name: ceph-mgr@{{ ansible_hostname }} state: started enabled: yes when: - not containerized_deployment - name: restart containerized ceph mgr systemd: name: ceph-mgr@{{ ansible_hostname }} state: restarted enabled: yes daemon_reload: yes when: - containerized_deployment - name: set osd flags #<---------------------- command: ceph --cluster {{ cluster }} osd set {{ item }} with_items: - noout - noscrub - nodeep-scrub delegate_to: "{{ groups[mon_group_name][0] }}" when: not containerized_deployment - name: set containerized osd flags command: | docker exec ceph-mon-{{ hostvars[groups[mon_group_name][0]]['ansible_hostname'] }} ceph --cluster {{ cluster }} osd set {{ item }} with_items: - noout - noscrub - nodeep-scrub delegate_to: "{{ groups[mon_group_name][0] }}" when: containerized_deployment -----------
Created attachment 1399495 [details] moving tasks to mons post_task section resolves this issue
I'm looking into this, Tomas do you expect some kind of backport on 2.5 for this?
(In reply to leseb from comment #6) > I'm looking into this, Tomas do you expect some kind of backport on 2.5 for > this? Hi Seb, as we are now have pretty much only one ceph-ansible for RHCS 2.x and RHCS 3.x, the rolling update could be tuned to fit for both versions. As for RHCS 2.x the "mgrs" section gets skipped ,the flags are not set before update "osds". In the attachment is rolling_update.yml which has setting the flags in "mons" post_task section, which resolves this issue for RHCS 2.x and has no impact on functionality in case of RHCS 3.x. Or any other solution that will set the noout, noscrub, nodeep-scrub flags before update osds regardless the Ceph version would be fine.
Sorry, Tomas for not taking care of this one earlier. I didn't realize you even sent a patch in one of your comment. Thanks for that. Now I've sent a PR to fix this with a few modifications to your patch. I've also added you as a co-author to the commit. Thanks.
Working fine with ceph-ansible-3.0.39-1.el7cp.noarch and 3.0.36-2redhat1 Moving to VERIFIED state. Regards, Vasishta
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2261