Bug 1548357

Summary: [ceph-ansible] [ceph-container] : during rolling update playbook failing trying to restart mgr without copying restart script to mgr
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.0CC: adeza, agunn, aschoen, ceph-eng-bugs, gabrioux, gmeno, hnallurv, kdreyer, nthomas, sankarshan, vashastr
Target Milestone: z1   
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.0.27-1.el7cp Ubuntu: ceph-ansible_3.0.27-2redhat1 Doc Type: Bug Fix
Doc Text:
.A required script is copied when doing a rolling upgrade with Ansible Previously, if the active Ceph Manager node is not the first node to be upgraded, when running the `ceph-ansible` rolling update playbook, then a required restart script was not copied to the Ceph Manager node. This would cause the rolling update to fail. In this release, the required script does get copied to the Ceph Manager node.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-08 15:54:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File contains contents of inventory file, ansible-playbook log none

Description Vasishta 2018-02-23 10:03:36 UTC
Created attachment 1399797 [details]
File contains contents of inventory file, ansible-playbook log

Description of problem:
During rolling update playbook failing trying to restart (handler) mgr without copying restart script to mgr

As per the observation, this issue is happening when the first node that playbook takes up is not the active mgr 

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.26-1.el7cp.noarch

How reproducible:
Always (2/2)

Steps to Reproduce:
1. Configure containerized cluster of (latest-1) version 
2. Upgrade to latest version running rolling_update


Actual results:
RUNNING HANDLER [ceph-defaults : restart ceph mgr daemon(s) - container] **
failed: [magna035 -> magna043] (item=magna043) => {"changed": false, "cmd": "/tmp/restart_mgr_daemon.sh", "item": "magna043", "msg": "[Errno 2] No such file or directory", "rc": 2}


Expected results:
Playbook must not try to restart daemon without copying respective script

Additional info:

Comment 3 Guillaume Abrioux 2018-02-23 10:19:21 UTC
will be included in v3.0.27

Comment 4 Vasishta 2018-02-23 13:40:35 UTC
Working fine with workaround -

Mention active mgr's name in top of the mon group (when monitor and mgrs are collocated) in the inventory file and run rolling_update again.

Comment 5 Ken Dreyer (Red Hat) 2018-02-23 21:52:00 UTC
Based on the workaround, retargeting to z2

Comment 12 Vasishta 2018-03-01 18:10:41 UTC
1) Used - ceph-ansible-3.0.27-1.el7cp.noarch

2) Ensured that active mgr is not the first one to be listed in both mon group and mgr. 

3) rolling updated worked fine upgrading cluster from 3.0 live to ceph-3.0-rhel-7-docker-candidate-99411-20180228192608 .


Moving to VERIFIED state.

Regards,
Vasishta Shastry
AQE, Ceph

Comment 16 errata-xmlrpc 2018-03-08 15:54:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0474