1568157 – ceph-client role fails to create key because the ceph-create-keys container is not running

Bug 1568157 - ceph-client role fails to create key because the ceph-create-keys container is not running

Summary: ceph-client role fails to create key because the ceph-create-keys container i...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	3.1
Assignee:	Sébastien Han
QA Contact:	Vasishta
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1568234 1569258 (view as bug list)
Depends On:
Blocks:	1548353 1571947
TreeView+	depends on / blocked

Reported:	2018-04-16 21:22 UTC by John Fulton
Modified:	2019-08-27 05:10 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-08-27 05:10:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 2531	0	None	closed	client: add quotes to the dict values	2020-09-02 10:57:58 UTC

Description John Fulton 2018-04-16 21:22:26 UTC

While using ceph-ansible-3.1.0.0-0.beta6.1.el7.noarch to configure a server to communicate with an external ceph cluster, the deployment fails on task: "create cephx key(s)" [1]. 

I was able to reproduce the problem by running the same command the ceph-ansible tried to run [2]. 

When examining the server where the task failed I found that the ceph-create-keys daemon (with the same container ID) was running at some point but that it had stopped. 

[root@rhosp-ctrl0 ~]# docker ps -a
CONTAINER ID        IMAGE                                                             COMMAND                  CREATED             STATUS                         PORTS               NAMES
6d391bc9842c        registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest           "sleep 300"              About an hour ago   Exited (0) About an hour ago                       ceph-create-keys

Maybe there is a race condition with the dummy container not being available when needed? (just a theory) I'll attach more logs.

https://github.com/ceph/ceph-ansible/blob/f711c51f395df181c5f821fca7ef879af79fdb64/roles/ceph-client/tasks/create_users_keys.yml#L15-L26

[1] 
2018-04-16 19:54:55,600 p=2591 u=mistral |  TASK [ceph-client : create cephx key(s)] ***************************************
2018-04-16 19:54:55,601 p=2591 u=mistral |  Monday 16 April 2018  19:54:55 +0000 (0:00:00.048)       0:01:06.258 ********** 
2018-04-16 19:59:55,593 p=2591 u=mistral |  failed: [192.168.213.214] (item={'caps': {'mds': u'', 'osd': u'allow class-read object_prefix rbd_children, allow rwx pool=rhosp-volumes, allow rwx pool=rhosp-backup, allow rwx pool=rhosp-vms, allow rwx pool=rhosp-images, allow rwx pool=rhosp-metrics', 'mon': u'allow r', 'mgr': u'allow *'}, 'mode': u'0600', 'key': u'AQC8ZSlakbFkMBAAAZFQjIgWVZQ+HVnKc3FpTw==', 'name': u'client.rhosp'}) => {"changed": true, "cmd": ["docker", "exec", "ceph-create-keys", "ceph-authtool", "--create-keyring", "/etc/ceph/ceph.client.rhosp.keyring", "--name", "client.rhosp", "--add-key", "AQC8ZSlakbFkMBAAAZFQjIgWVZQ+HVnKc3FpTw==", "--cap", "mds", "", "--cap", "osd", "allow class-read object_prefix rbd_children, allow rwx pool=rhosp-volumes, allow rwx pool=rhosp-backup, allow rwx pool=rhosp-vms, allow rwx pool=rhosp-images, allow rwx pool=rhosp-metrics", "--cap", "mon", "allow r", "--cap", "mgr", "allow *"], "delta": "0:04:59.598733", "end": "2018-04-16 15:59:55.574075", "item": {"caps": {"mds": "", "mgr": "allow *", "mon": "allow r", "osd": "allow class-read object_prefix rbd_children, allow rwx pool=rhosp-volumes, allow rwx pool=rhosp-backup, allow rwx pool=rhosp-vms, allow rwx pool=rhosp-images, allow rwx pool=rhosp-metrics"}, "key": "AQC8ZSlakbFkMBAAAZFQjIgWVZQ+HVnKc3FpTw==", "mode": "0600", "name": "client.rhosp"}, "msg": "non-zero return code", "rc": 126, "start": "2018-04-16 15:54:55.975342", "stderr": "", "stderr_lines": [], "stdout": "rpc error: code = 2 desc = oci runtime error: exec failed: container \"6d391bc9842c993ae6123f023d24da305a96dfe8d64e3607973c665cd8880129\" does not exist", "stdout_lines": ["rpc error: code = 2 desc = oci runtime error: exec failed: container \"6d391bc9842c993ae6123f023d24da305a96dfe8d64e3607973c665cd8880129\" does not exist"]}

[2] 
[root@rhosp-ctrl0 ~]# docker exec ceph-create-keys ceph-authtool --create-keyring /etc/ceph/ceph.client.rhosp.keyring --name client.rhosp --add-key AQC8ZSlakbFkMBAAAZFQjIgWVZQ+HVnKc3FpTw== --cap mds  --cap osd allow class-read object_prefix rbd_children, allow rwx pool=rhosp-volumes, allow rwx pool=rhosp-backup, allow rwx pool=rhosp-vms, allow rwx pool=rhosp-images, allow rwx pool=rhosp-metrics --cap mon allow r --cap mgr allow * 
Error response from daemon: Container 6d391bc9842c993ae6123f023d24da305a96dfe8d64e3607973c665cd8880129 is not running
[root@rhosp-ctrl0 ~]#

Comment 3 John Fulton 2018-04-16 21:41:18 UTC

I reproduced this for the ceph clients when deploying a new ceph cluster w/ tripleo too. Here is the output of ansible-playbook -vvv https://ptpb.pw/-QC_

Here is the ceph-install log from the original report though it didn't have -vvv to ansible-playbook.  http://ix.io/17Y7

Comment 4 John Fulton 2018-04-17 13:30:21 UTC

*** Bug 1568234 has been marked as a duplicate of this bug. ***

Comment 5 John Fulton 2018-04-18 23:44:02 UTC

*** Bug 1569258 has been marked as a duplicate of this bug. ***

Comment 6 Artem Hrechanychenko 2018-04-19 10:00:23 UTC

Reproduced in OSP13 puddle 2018-04-13.1 with TLS everywhere scenarion and disabled SElinux for OC nodes

Comment 7 Artem Hrechanychenko 2018-04-19 10:00:56 UTC

Reproduced in OSP13 puddle 2018-04-13.1 with TLS everywhere scenario and disabled SElinux for OC nodes and UC node

Comment 9 Omri Hochman 2018-04-19 13:12:32 UTC

RHOS OSP13 Deployment with the latest puddle failing over this issue.

Comment 14 John Fulton 2018-04-25 18:09:11 UTC

Part of the fix for this is that we needed the following:

https://github.com/ceph/ceph-ansible/commit/90e47c5fb0c95f4b1a17cdf2a019bdcebc77a773

which landed in 

 https://github.com/ceph/ceph-ansible/releases/tag/v3.1.0beta8

Comment 15 Yogev Rabl 2018-05-16 14:11:14 UTC

verified on ceph-ansible-3.1.0-0.1.rc2.el7cp.noarch

Note You need to log in before you can comment on or make changes to this bug.