1619098 – [Ceph-Ansible][Container] [Filestore] RGW Installation failed

Bug 1619098 - [Ceph-Ansible][Container] [Filestore] RGW Installation failed

Summary: [Ceph-Ansible][Container] [Filestore] RGW Installation failed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	3.1
Assignee:	Sébastien Han
QA Contact:	Persona non grata
Docs Contact:	Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks:	1584264
TreeView+	depends on / blocked

Reported:	2018-08-20 05:37 UTC by Persona non grata
Modified:	2018-09-26 18:24 UTC (History)
CC List:	12 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.1.0-0.1.rc19.el7cp Ubuntu: ceph-ansible_3.1.0~rc19-2redhat1
Doc Type:	Bug Fix
Doc Text:	.Installing the Object Gateway no longer fails for container deployments When installing the Object Gateway into a container the following error was observed: ---- fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false, "cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file or directory" ---- An execution task failed because there was no `ceph-common` package installed. This Ansible task was delegated to a Ceph Monitor node, which allows the execution to happen in the correct order.
Clone Of:
Environment:
Last Closed:	2018-09-26 18:23:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ansible log (3.55 MB, text/plain) 2018-08-20 08:50 UTC, Persona non grata	no flags	Details
ansible log with -vvv (3.66 MB, text/plain) 2018-08-20 10:29 UTC, Persona non grata	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 3015	0	None	None	None	2018-08-20 10:56:25 UTC
Red Hat Product Errata	RHBA-2018:2819	0	None	None	None	2018-09-26 18:24:47 UTC

Description Persona non grata 2018-08-20 05:37:48 UTC

Description of problem:
While setting up ceph cluster with filestore on containers, ansible playbook failed in this task:
Task [ceph-defaults : get current cluster status (if already running)] 
The cluster was up but RGW was not installed.
============================
Host file:

[mons]
magna006
magna059.ceph.redhat.com
magna061.ceph.redhat.com 

[osds]
magna059.ceph.redhat.com dmcrypt="true" devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated"  osd_objectstore="filestore"
magna061.ceph.redhat.com dmcrypt="true" dedicated_devices="['/dev/sdb']" devices="['/dev/sdc','/dev/sdd']" osd_scenario="non-collocated" osd_objectstore="filestore"
magna064 devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated"  osd_objectstore="filestore"

[mgrs]
magna006

[rgws]
magna059.ceph.redhat.com radosgw_interface=eno1

[nfss]
magna064

[mdss]
magna006
==================================


Version-Release number of selected component (if applicable):

ansible-2.4.6.0-1.el7ae.noarch

ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch

ceph version 12.2.5-39.el7cp (f12d44e46a54948a86dd27b16c77d97475ba2d4e) luminous (stable)

How reproducible:
Always

Steps to Reproduce:
1. Try to setup ceph cluster in containers with 3 MONs(collocated), 3 OSDS, 1 MGR, 1 RGW, 1 NFS, 1 MDS with 'filestore'


Actual results:

TASK [ceph-defaults : get current cluster status (if already running)] **************
Friday 17 August 2018  16:15:29 +0000 (0:00:00.250)       0:00:30.021 ********* 
skipping: [magna006]
skipping: [magna061.ceph.redhat.com]
skipping: [magna064]
fatal: [magna059.ceph.redhat.com]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "ceph-mon-magna006", "ceph", "--cluster", "local", "-s", "-f", "json"], "delta": "0:00:00.022128", "end": "2018-08-17 16:15:29.721539", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2018-08-17 16:15:29.699411", "stderr": "Error response from daemon: No such container: ceph-mon-magna006", "stderr_lines": ["Error response from daemon: No such container: ceph-mon-magna006"], "stdout": "", "stdout_lines": []}


===========
Cluster status:
  cluster:
    id:     297811f5-605d-4093-81d6-4b25bb72cc99
    health: HEALTH_WARN
            Degraded data redundancy: 242/726 objects degraded (33.333%), 89 pgs degraded, 384 pgs undersized
 
  services:
    mon:     3 daemons, quorum magna006,magna059,magna061
    mgr:     magna006(active)
    mds:     cephfs-1/1/1 up  {0=magna006=up:active}
    osd:     5 osds: 5 up, 5 in
    rgw-nfs: 1 daemon active
 
  data:
    pools:   6 pools, 384 pgs
    objects: 242 objects, 3721 bytes
    usage:   901 MB used, 4650 GB / 4651 GB avail
    pgs:     242/726 objects degraded (33.333%)
             295 active+undersized
==========================

Expected results:

The cluster should be active and clean with RGW installed

Additional info:

Comment 4 Sébastien Han 2018-08-20 08:45:25 UTC

I either need the complete log of the ansible run or access to the env. Without one of the two I cannot help you with this.
Thanks.

Comment 5 Persona non grata 2018-08-20 08:50:16 UTC

Created attachment 1477100 [details]
ansible log

Comment 6 Sébastien Han 2018-08-20 10:18:37 UTC

How many runs do you have in this file?
The last action is a mon failing to restart.

If rgw wasn't deployed, this means it was not declared in the inventory file, I don't see any ceph-rgw statements so nothing ran.

This is hard to read.
Please clarify what you did.

Comment 7 Persona non grata 2018-08-20 10:29:36 UTC

Created attachment 1477124 [details]
ansible log with -vvv

Comment 8 Persona non grata 2018-08-20 10:33:15 UTC

(In reply to leseb from comment #6)
> How many runs do you have in this file?
> The last action is a mon failing to restart.
> 
> If rgw wasn't deployed, this means it was not declared in the inventory
> file, I don't see any ceph-rgw statements so nothing ran.
> 
> This is hard to read.
> Please clarify what you did.

That log file had purge and site-docker logs. I've attached ansible log( only one run) with verbose mode enabled.

Comment 9 Sébastien Han 2018-08-20 10:56:26 UTC

Thanks, I see what's going on now.

Comment 10 Sébastien Han 2018-08-20 22:08:26 UTC

In https://github.com/ceph/ceph-ansible/releases/tag/v3.1.0rc19

Comment 17 errata-xmlrpc 2018-09-26 18:23:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819

Note You need to log in before you can comment on or make changes to this bug.