Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1543284

Summary: [ceph-ansible] [ceph-container] : playbook failed trying to restart dmcrypt OSDs
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.0CC: adeza, aschoen, ceph-eng-bugs, gabrioux, gmeno, hnallurv, nthomas, sankarshan
Target Milestone: z1   
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.0.25-1.el7cp Ubuntu: ceph-ansible_3.0.25-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-08 15:54:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File contains contents of inventory file, ansible-playbook log none

Description Vasishta 2018-02-08 07:17:10 UTC
Created attachment 1393032 [details]
File contains contents of inventory file, ansible-playbook log

Description of problem:
Playbook is failing while trying to run handler 'ceph-defaults : restart ceph osds daemon(s) - container' on collocated+dmcrypt OSDs

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.23-1.el7cp.noarch
ceph-3.0-rhel-7-docker-candidate-28895-20180204092708 

How reproducible:
Always (3/3)

Steps to Reproduce:
1. Configure containerized cluster with collocated+dmcrypt OSDs
2. Run rolling update.

Actual results:
ansible-playbook failing while trying to restart OSDs with dmcypt enabled

"msg": "non-zero return code", 
    "rc": 1, 
    "start": "2018-02-08 06:27:08.037939", 
    "stderr": "Error response from daemon: No such container: b2a7aacd1819", 
    "stderr_lines": [
        "Error response from daemon: No such container: b2a7aacd1819"
    ], 
    "stdout": "Socket file /var/run/ceph/mno34-osd.Timed out while trying to look for a Ceph OSD socket.\nAbort mission!.asok could not be found, which means the osd daemon is not running.", 
    "stdout_lines": [
        "Socket file /var/run/ceph/mno34-osd.Timed out while trying to look for a Ceph OSD socket.", 
        "Abort mission!.asok could not be found, which means the osd daemon is not running."

Expected results:
ansible-playbook must complete its run successfully.

Additional info:
When playbook is re run, same issue is not getting repeated.

Comment 6 Guillaume Abrioux 2018-02-08 12:48:44 UTC
fixed by https://github.com/ceph/ceph-ansible/pull/2380/commits/7d179e2abe33e8363aab48db8a392b230dcfc47a

the fix will be included in v3.0.25

Comment 7 Harish NV Rao 2018-02-12 07:05:08 UTC
@Guillaume, when will the fix be available for testing?

Comment 9 Guillaume Abrioux 2018-02-14 01:26:11 UTC
Hi Harish, the fix is available in v3.0.25

Comment 14 Vasishta 2018-02-23 13:42:59 UTC
Hi, 

Working fine using ceph-ansible-3.0.26-1.el7cp.noarch to upgrade to ceph-3.0-rhel-7-docker-candidate-38019-20180222163657 from rhceph-3-rhel7.

While upgrading cluster having OSDs with collocated journals, faced issue reported in Bug 1548357, Followed workaround mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1548357#c4 . (mention mgr's name at top of mons group in inventory file)

Moving to VERIFIED state.

Regards,
Vasishta Shatsry
AQE, Ceph

Comment 17 errata-xmlrpc 2018-03-08 15:54:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0474