1455357 – [ceph-container] : MON docker instances keep restarting after MON hosts reboot

Bug 1455357 - [ceph-container] : MON docker instances keep restarting after MON hosts reboot

Summary: [ceph-container] : MON docker instances keep restarting after MON hosts reboot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Container
Sub Component:
Version:	2.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	2.3
Assignee:	Andrew Schoen
QA Contact:	Rachana Patel
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-24 22:19 UTC by Rachana Patel
Modified:	2017-06-19 13:23 UTC (History)
CC List:	9 users (show)
Fixed In Version:	ceph-2-rhel-7-docker-candidate-96406-20170601145625
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-19 13:23:04 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:1498	0	normal	SHIPPED_LIVE	updated rhceph-2-rhel7 container image	2017-06-19 17:22:05 UTC

Description Rachana Patel 2017-05-24 22:19:35 UTC

Description of problem:
========================
If we reboot more than one MON host nodes(say 2 out of 3 or all 3) than docker instance on 2 MONs(out of 3 MONS) keep restarting 


Version-Release number of selected component (if applicable):
=============================================================
ceph-2-rhel-7-docker-candidate-20170516172622
ceph-ansible-2.2.7-1.el7scon.noarch
ansible-2.2.3.0-1.el7.noarch



How reproducible:
=================
2/2
case 1 :- reboot 2 MONs out of 3 MONs
case 2 :- shutdown entire cluster (all MONs and OSDS)


Steps to Reproduce:
===================
case 1:-

1. created containerized cluster having 3 MON nodes and 3 OSD nodes. (have 1 rbd-mirror node as well)
2.create some rbd images and create 2 way mirrorring with another cluster
3. reboot 2 MON node at same time


case 2:
1. created containerized cluster having 3 MON nodes and 3 OSD nodes. (have 1 rbd-mirror node as well)
2.create some rbd images and create 2 way mirrorring with another cluster
3. reboot all MONs and OSDs of cluster at same time


Actual results:
===============
Mon Docker instance on 2 MON nodes keep restarting and never joined quorom.



Expected results:


Additional info:

Comment 4 Andrew Schoen 2017-05-25 21:24:40 UTC

Proposed fix upstream: https://github.com/ceph/ceph-docker/pull/654

Comment 7 seb 2017-05-30 14:00:26 UTC

I just pushed a new commit downstream, that should trigger a new image build.

Comment 16 seb 2017-06-01 11:04:21 UTC

The error log is invalid then. As figured out, there is a conflict between the 2 mons trying to start.
Can we have the logs from the initial failure?

Thanks.

Comment 17 seb 2017-06-01 13:18:59 UTC

I think I've found the issue, I'm working on a fix.

Comment 19 seb 2017-06-01 14:22:39 UTC

New commit, please re-test:

remote: *** Checking commit aee9726f6457dcbef8ef633c21c704111f3d1dfc
remote: *** Resolves:
remote: ***   Approved:
remote: ***     rhbz#1455357 (pm_ack+)
remote: *** Commit aee9726f6457dcbef8ef633c21c704111f3d1dfc allowed

Comment 22 Rachana Patel 2017-06-01 21:38:09 UTC

verified with version - ceph-2-rhel-7-docker-candidate-96406-20170601145625

Use a minimal setup (no rgw or rbd I/O going on) and rebooted all Mon nodes, 2 MON nodes out of 3. In both cases cluster was able to achieve health state henc moving to verified

Comment 24 errata-xmlrpc 2017-06-19 13:23:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1498

Note You need to log in before you can comment on or make changes to this bug.