Bug 1619098

Summary:

[Ceph-Ansible][Container] [Filestore] RGW Installation failed

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Persona non grata <nobody+410372>

Component:

Ceph-Ansible

Assignee:

Sébastien Han <shan>

Status:

CLOSED ERRATA

QA Contact:

Persona non grata <nobody+410372>

Severity:

urgent

Docs Contact:

Aron Gunn <agunn>

Priority:

unspecified

Version:

3.1

CC:

agunn, aschoen, ceph-eng-bugs, gmeno, hgurav, hnallurv, jbrier, nobody+410372, nthomas, sankarshan, shan, ykaul

Target Milestone:

Keywords:

Regression

Target Release:

3.1

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

RHEL: ceph-ansible-3.1.0-0.1.rc19.el7cp Ubuntu: ceph-ansible_3.1.0~rc19-2redhat1

Doc Type:

Bug Fix

Doc Text:

.Installing the Object Gateway no longer fails for container deployments When installing the Object Gateway into a container the following error was observed: ---- fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false, "cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file or directory" ---- An execution task failed because there was no `ceph-common` package installed. This Ansible task was delegated to a Ceph Monitor node, which allows the execution to happen in the correct order.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-09-26 18:23:45 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1584264

Attachments:

Description	Flags
ansible log	none
ansible log with -vvv	none

Description Persona non grata 2018-08-20 05:37:48 UTC

Description of problem:
While setting up ceph cluster with filestore on containers, ansible playbook failed in this task:
Task [ceph-defaults : get current cluster status (if already running)] 
The cluster was up but RGW was not installed.
============================
Host file:

[mons]
magna006
magna059.ceph.redhat.com
magna061.ceph.redhat.com 

[osds]
magna059.ceph.redhat.com dmcrypt="true" devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated"  osd_objectstore="filestore"
magna061.ceph.redhat.com dmcrypt="true" dedicated_devices="['/dev/sdb']" devices="['/dev/sdc','/dev/sdd']" osd_scenario="non-collocated" osd_objectstore="filestore"
magna064 devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated"  osd_objectstore="filestore"

[mgrs]
magna006

[rgws]
magna059.ceph.redhat.com radosgw_interface=eno1

[nfss]
magna064

[mdss]
magna006
==================================


Version-Release number of selected component (if applicable):

ansible-2.4.6.0-1.el7ae.noarch

ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch

ceph version 12.2.5-39.el7cp (f12d44e46a54948a86dd27b16c77d97475ba2d4e) luminous (stable)

How reproducible:
Always

Steps to Reproduce:
1. Try to setup ceph cluster in containers with 3 MONs(collocated), 3 OSDS, 1 MGR, 1 RGW, 1 NFS, 1 MDS with 'filestore'


Actual results:

TASK [ceph-defaults : get current cluster status (if already running)] **************
Friday 17 August 2018  16:15:29 +0000 (0:00:00.250)       0:00:30.021 ********* 
skipping: [magna006]
skipping: [magna061.ceph.redhat.com]
skipping: [magna064]
fatal: [magna059.ceph.redhat.com]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "ceph-mon-magna006", "ceph", "--cluster", "local", "-s", "-f", "json"], "delta": "0:00:00.022128", "end": "2018-08-17 16:15:29.721539", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2018-08-17 16:15:29.699411", "stderr": "Error response from daemon: No such container: ceph-mon-magna006", "stderr_lines": ["Error response from daemon: No such container: ceph-mon-magna006"], "stdout": "", "stdout_lines": []}


===========
Cluster status:
  cluster:
    id:     297811f5-605d-4093-81d6-4b25bb72cc99
    health: HEALTH_WARN
            Degraded data redundancy: 242/726 objects degraded (33.333%), 89 pgs degraded, 384 pgs undersized
 
  services:
    mon:     3 daemons, quorum magna006,magna059,magna061
    mgr:     magna006(active)
    mds:     cephfs-1/1/1 up  {0=magna006=up:active}
    osd:     5 osds: 5 up, 5 in
    rgw-nfs: 1 daemon active
 
  data:
    pools:   6 pools, 384 pgs
    objects: 242 objects, 3721 bytes
    usage:   901 MB used, 4650 GB / 4651 GB avail
    pgs:     242/726 objects degraded (33.333%)
             295 active+undersized
==========================

Expected results:

The cluster should be active and clean with RGW installed

Additional info:

Comment 4 Sébastien Han 2018-08-20 08:45:25 UTC

I either need the complete log of the ansible run or access to the env. Without one of the two I cannot help you with this.
Thanks.

Comment 5 Persona non grata 2018-08-20 08:50:16 UTC

Created attachment 1477100 [details]
ansible log

Comment 6 Sébastien Han 2018-08-20 10:18:37 UTC

How many runs do you have in this file?
The last action is a mon failing to restart.

If rgw wasn't deployed, this means it was not declared in the inventory file, I don't see any ceph-rgw statements so nothing ran.

This is hard to read.
Please clarify what you did.

Comment 7 Persona non grata 2018-08-20 10:29:36 UTC

Created attachment 1477124 [details]
ansible log with -vvv

Comment 8 Persona non grata 2018-08-20 10:33:15 UTC

(In reply to leseb from comment #6)
> How many runs do you have in this file?
> The last action is a mon failing to restart.
> 
> If rgw wasn't deployed, this means it was not declared in the inventory
> file, I don't see any ceph-rgw statements so nothing ran.
> 
> This is hard to read.
> Please clarify what you did.

That log file had purge and site-docker logs. I've attached ansible log( only one run) with verbose mode enabled.

Comment 9 Sébastien Han 2018-08-20 10:56:26 UTC

Thanks, I see what's going on now.

Comment 10 Sébastien Han 2018-08-20 22:08:26 UTC

In https://github.com/ceph/ceph-ansible/releases/tag/v3.1.0rc19

Comment 17 errata-xmlrpc 2018-09-26 18:23:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819