1537003 – [ceph-ansible] [ceph-container] : playbook stuck for indefinite time checking whether ceph is running already

Bug 1537003 - [ceph-ansible] [ceph-container] : playbook stuck for indefinite time checking whether ceph is running already

Summary: [ceph-ansible] [ceph-container] : playbook stuck for indefinite time checking...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	2.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	2.5
Assignee:	Guillaume Abrioux
QA Contact:	Vasishta
Docs Contact:
URL:
Whiteboard:
Depends On:	1541065
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-22 09:05 UTC by Vasishta
Modified:	2018-02-21 19:48 UTC (History)
CC List:	13 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.0.21-1.el7cp Ubuntu: ceph-ansible_3.0.21-2redhat1
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Clones:	1541065 (view as bug list)
Environment:
Last Closed:	2018-02-21 19:48:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
File contains contents ansible-playbook log and contents of all.yml and inventory file (462.56 KB, text/plain) 2018-01-22 09:05 UTC, Vasishta	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 2348	0	None	closed	defaults: avoid getting stuck (ceph --connect-timeout)	2020-08-27 17:49:24 UTC
Red Hat Product Errata	RHBA-2018:0340	0	normal	SHIPPED_LIVE	Red Hat Ceph Storage 2.5 bug fix and enhancement update	2018-02-22 00:50:32 UTC

Description Vasishta 2018-01-22 09:05:42 UTC

Created attachment 1384266 [details]
File contains contents ansible-playbook log and contents of all.yml and inventory file

Description of problem:
playbook gets stuck for indefinite time checking whether ceph is running already

It seemed that existing mon was trying to contact other mons whereas other mons were not yet configured. 

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.18-1.el7cp.noarch

How reproducible:
Always (2/2)

Steps to Reproduce:
1. Configure ceph-ansible to initialize a ceph-cluster with more than one mon.
2. Run playbook.

Actual results:
playbook gets stuck at task - 
TASK [ceph-defaults : is ceph running already?]

Expected results:
Cluster must be initialized

Additional info:

Observed that ansible seemed to be waiting for "sudo docker exec ceph-mon-magna029 ceph --cluster c1 fsid  --connect-timeout 3" to finish its execution but manual effort to observe outcome of above command also resulted the same. This might be an issue with "--connect-timeout 3" as without this argument command gets TimedOut after 300 seconds as other mons were not yet configured.

Comment 5 Harish NV Rao 2018-01-23 10:37:53 UTC

@Guillaume/Sebastien, can you please let us know when can we expect the fix for this?

Comment 6 Guillaume Abrioux 2018-01-23 13:31:12 UTC

@Harish, the environment you provided to debug this issue has been reset, since I can't reproduce this issue on my env any chance you can get your environment back to the state it was yesterday so I can reproduce this bug?

Comment 7 Harish NV Rao 2018-01-23 13:42:35 UTC

@Guillaume, we had provided the test system yesterday evening. But today morning we needed it for moving ahead with other 2.5 testing. Please get in touch with Vasi for the system with issue reproduced in it.

Comment 8 Harish NV Rao 2018-01-24 09:56:34 UTC

Hi Guillaume,

We have hit the issue 1537003 on one more setup (we have emailed you the details of this setup) while trying to install IPv6 based RHEL ceph cluster.

Please check the system and let us know if you need more info.

Regards,
Harish

Comment 9 Guillaume Abrioux 2018-01-24 10:16:08 UTC

Hi Harish,

Thanks for the details, I'm currently taking a look at this.

Comment 10 Guillaume Abrioux 2018-01-24 10:29:13 UTC

One thing I noticed in your env is that you have set monitor_address variable in all.yml like this :

monitor_address: 2620:52:0:880:225:90ff:fefc:2770

The result is that your ceph.conf contains this on all monitors :

mon host = [2620:52:0:880:225:90ff:fefc:2770],[2620:52:0:880:225:90ff:fefc:2770],[2620:52:0:880:225:90ff:fefc:2770],[2620:52:0:880:225:90ff:fefc:2770]

As mentioned here :https://github.com/ceph/ceph-ansible/blob/stable-3.0/group_vars/all.yml.sample#L313

monitor_address should be used in inventory host file to set for each monitor the address they will bind on.

Could you try using monitor_address that way, or using monitor_interface in all.yml instead ?

Comment 11 Vasishta 2018-01-24 14:36:27 UTC

Hi Guillaume,

Thanks for pointing out and correcting it.

I've hit this issue in container scenario - please check magna071 (admin and mon)

Please update once we can use the setup.

Regards,
Vasishta

Comment 12 Guillaume Abrioux 2018-01-25 09:22:14 UTC

Hi Vasishta,

You can use the setup, I'm not using it anymore.

The fix for this issue will be in v3.0.20

Comment 24 errata-xmlrpc 2018-02-21 19:48:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0340

Note You need to log in before you can comment on or make changes to this bug.