Bug 1455357 - [ceph-container] : MON docker instances keep restarting after MON hosts reboot
Summary: [ceph-container] : MON docker instances keep restarting after MON hosts reboot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Container
Version: 2.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 2.3
Assignee: Andrew Schoen
QA Contact: Rachana Patel
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-24 22:19 UTC by Rachana Patel
Modified: 2017-06-19 13:23 UTC (History)
9 users (show)

Fixed In Version: ceph-2-rhel-7-docker-candidate-96406-20170601145625
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-19 13:23:04 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1498 0 normal SHIPPED_LIVE updated rhceph-2-rhel7 container image 2017-06-19 17:22:05 UTC

Description Rachana Patel 2017-05-24 22:19:35 UTC
Description of problem:
========================
If we reboot more than one MON host nodes(say 2 out of 3 or all 3) than docker instance on 2 MONs(out of 3 MONS) keep restarting 


Version-Release number of selected component (if applicable):
=============================================================
ceph-2-rhel-7-docker-candidate-20170516172622
ceph-ansible-2.2.7-1.el7scon.noarch
ansible-2.2.3.0-1.el7.noarch



How reproducible:
=================
2/2
case 1 :- reboot 2 MONs out of 3 MONs
case 2 :- shutdown entire cluster (all MONs and OSDS)


Steps to Reproduce:
===================
case 1:-

1. created containerized cluster having 3 MON nodes and 3 OSD nodes. (have 1 rbd-mirror node as well)
2.create some rbd images and create 2 way mirrorring with another cluster
3. reboot 2 MON node at same time


case 2:
1. created containerized cluster having 3 MON nodes and 3 OSD nodes. (have 1 rbd-mirror node as well)
2.create some rbd images and create 2 way mirrorring with another cluster
3. reboot all MONs and OSDs of cluster at same time


Actual results:
===============
Mon Docker instance on 2 MON nodes keep restarting and never joined quorom.



Expected results:


Additional info:

Comment 4 Andrew Schoen 2017-05-25 21:24:40 UTC
Proposed fix upstream: https://github.com/ceph/ceph-docker/pull/654

Comment 7 seb 2017-05-30 14:00:26 UTC
I just pushed a new commit downstream, that should trigger a new image build.

Comment 16 seb 2017-06-01 11:04:21 UTC
The error log is invalid then. As figured out, there is a conflict between the 2 mons trying to start.
Can we have the logs from the initial failure?

Thanks.

Comment 17 seb 2017-06-01 13:18:59 UTC
I think I've found the issue, I'm working on a fix.

Comment 19 seb 2017-06-01 14:22:39 UTC
New commit, please re-test:

remote: *** Checking commit aee9726f6457dcbef8ef633c21c704111f3d1dfc
remote: *** Resolves:
remote: ***   Approved:
remote: ***     rhbz#1455357 (pm_ack+)
remote: *** Commit aee9726f6457dcbef8ef633c21c704111f3d1dfc allowed

Comment 22 Rachana Patel 2017-06-01 21:38:09 UTC
verified with version - ceph-2-rhel-7-docker-candidate-96406-20170601145625

Use a minimal setup (no rgw or rbd I/O going on) and rebooted all Mon nodes, 2 MON nodes out of 3. In both cases cluster was able to achieve health state henc moving to verified

Comment 24 errata-xmlrpc 2017-06-19 13:23:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1498


Note You need to log in before you can comment on or make changes to this bug.