1543284 – [ceph-ansible] [ceph-container] : playbook failed trying to restart dmcrypt OSDs

Bug 1543284 - [ceph-ansible] [ceph-container] : playbook failed trying to restart dmcrypt OSDs

Summary: [ceph-ansible] [ceph-container] : playbook failed trying to restart dmcrypt OSDs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	z1
Target Release:	3.0
Assignee:	Guillaume Abrioux
QA Contact:	Vasishta
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-08 07:17 UTC by Vasishta
Modified:	2018-03-08 15:54 UTC (History)
CC List:	8 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.0.25-1.el7cp Ubuntu: ceph-ansible_3.0.25-2redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-03-08 15:54:03 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
File contains contents of inventory file, ansible-playbook log (9.59 MB, text/plain) 2018-02-08 07:17 UTC, Vasishta	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 2380	0	'None'	'closed'	'osd: fix osd restart when dmcrypt'	2019-12-06 16:32:03 UTC
Red Hat Product Errata	RHBA-2018:0474	0	normal	SHIPPED_LIVE	Red Hat Ceph Storage 3.0 bug fix update	2018-03-08 20:51:53 UTC

Description Vasishta 2018-02-08 07:17:10 UTC

Created attachment 1393032 [details]
File contains contents of inventory file, ansible-playbook log

Description of problem:
Playbook is failing while trying to run handler 'ceph-defaults : restart ceph osds daemon(s) - container' on collocated+dmcrypt OSDs

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.23-1.el7cp.noarch
ceph-3.0-rhel-7-docker-candidate-28895-20180204092708 

How reproducible:
Always (3/3)

Steps to Reproduce:
1. Configure containerized cluster with collocated+dmcrypt OSDs
2. Run rolling update.

Actual results:
ansible-playbook failing while trying to restart OSDs with dmcypt enabled

"msg": "non-zero return code", 
    "rc": 1, 
    "start": "2018-02-08 06:27:08.037939", 
    "stderr": "Error response from daemon: No such container: b2a7aacd1819", 
    "stderr_lines": [
        "Error response from daemon: No such container: b2a7aacd1819"
    ], 
    "stdout": "Socket file /var/run/ceph/mno34-osd.Timed out while trying to look for a Ceph OSD socket.\nAbort mission!.asok could not be found, which means the osd daemon is not running.", 
    "stdout_lines": [
        "Socket file /var/run/ceph/mno34-osd.Timed out while trying to look for a Ceph OSD socket.", 
        "Abort mission!.asok could not be found, which means the osd daemon is not running."

Expected results:
ansible-playbook must complete its run successfully.

Additional info:
When playbook is re run, same issue is not getting repeated.

Comment 6 Guillaume Abrioux 2018-02-08 12:48:44 UTC

fixed by https://github.com/ceph/ceph-ansible/pull/2380/commits/7d179e2abe33e8363aab48db8a392b230dcfc47a

the fix will be included in v3.0.25

Comment 7 Harish NV Rao 2018-02-12 07:05:08 UTC

@Guillaume, when will the fix be available for testing?

Comment 9 Guillaume Abrioux 2018-02-14 01:26:11 UTC

Hi Harish, the fix is available in v3.0.25

Comment 14 Vasishta 2018-02-23 13:42:59 UTC

Hi, 

Working fine using ceph-ansible-3.0.26-1.el7cp.noarch to upgrade to ceph-3.0-rhel-7-docker-candidate-38019-20180222163657 from rhceph-3-rhel7.

While upgrading cluster having OSDs with collocated journals, faced issue reported in Bug 1548357, Followed workaround mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1548357#c4 . (mention mgr's name at top of mons group in inventory file)

Moving to VERIFIED state.

Regards,
Vasishta Shatsry
AQE, Ceph

Comment 17 errata-xmlrpc 2018-03-08 15:54:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0474

Note You need to log in before you can comment on or make changes to this bug.