Bug 1619098

Summary: [Ceph-Ansible][Container] [Filestore] RGW Installation failed
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Persona non grata <nobody+410372>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Persona non grata <nobody+410372>
Severity: urgent Docs Contact: Aron Gunn <agunn>
Priority: unspecified    
Version: 3.1CC: agunn, aschoen, ceph-eng-bugs, gmeno, hgurav, hnallurv, jbrier, nobody+410372, nthomas, sankarshan, shan, ykaul
Target Milestone: rcKeywords: Regression
Target Release: 3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.1.0-0.1.rc19.el7cp Ubuntu: ceph-ansible_3.1.0~rc19-2redhat1 Doc Type: Bug Fix
Doc Text:
.Installing the Object Gateway no longer fails for container deployments When installing the Object Gateway into a container the following error was observed: ---- fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false, "cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file or directory" ---- An execution task failed because there was no `ceph-common` package installed. This Ansible task was delegated to a Ceph Monitor node, which allows the execution to happen in the correct order.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-26 18:23:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1584264    
Attachments:
Description Flags
ansible log
none
ansible log with -vvv none

Description Persona non grata 2018-08-20 05:37:48 UTC
Description of problem:
While setting up ceph cluster with filestore on containers, ansible playbook failed in this task:
Task [ceph-defaults : get current cluster status (if already running)] 
The cluster was up but RGW was not installed.
============================
Host file:

[mons]
magna006
magna059.ceph.redhat.com
magna061.ceph.redhat.com 

[osds]
magna059.ceph.redhat.com dmcrypt="true" devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated"  osd_objectstore="filestore"
magna061.ceph.redhat.com dmcrypt="true" dedicated_devices="['/dev/sdb']" devices="['/dev/sdc','/dev/sdd']" osd_scenario="non-collocated" osd_objectstore="filestore"
magna064 devices="['/dev/sdb','/dev/sdc','/dev/sdd']" osd_scenario="collocated"  osd_objectstore="filestore"

[mgrs]
magna006

[rgws]
magna059.ceph.redhat.com radosgw_interface=eno1

[nfss]
magna064

[mdss]
magna006
==================================


Version-Release number of selected component (if applicable):

ansible-2.4.6.0-1.el7ae.noarch

ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch

ceph version 12.2.5-39.el7cp (f12d44e46a54948a86dd27b16c77d97475ba2d4e) luminous (stable)

How reproducible:
Always

Steps to Reproduce:
1. Try to setup ceph cluster in containers with 3 MONs(collocated), 3 OSDS, 1 MGR, 1 RGW, 1 NFS, 1 MDS with 'filestore'


Actual results:

TASK [ceph-defaults : get current cluster status (if already running)] **************
Friday 17 August 2018  16:15:29 +0000 (0:00:00.250)       0:00:30.021 ********* 
skipping: [magna006]
skipping: [magna061.ceph.redhat.com]
skipping: [magna064]
fatal: [magna059.ceph.redhat.com]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "ceph-mon-magna006", "ceph", "--cluster", "local", "-s", "-f", "json"], "delta": "0:00:00.022128", "end": "2018-08-17 16:15:29.721539", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2018-08-17 16:15:29.699411", "stderr": "Error response from daemon: No such container: ceph-mon-magna006", "stderr_lines": ["Error response from daemon: No such container: ceph-mon-magna006"], "stdout": "", "stdout_lines": []}


===========
Cluster status:
  cluster:
    id:     297811f5-605d-4093-81d6-4b25bb72cc99
    health: HEALTH_WARN
            Degraded data redundancy: 242/726 objects degraded (33.333%), 89 pgs degraded, 384 pgs undersized
 
  services:
    mon:     3 daemons, quorum magna006,magna059,magna061
    mgr:     magna006(active)
    mds:     cephfs-1/1/1 up  {0=magna006=up:active}
    osd:     5 osds: 5 up, 5 in
    rgw-nfs: 1 daemon active
 
  data:
    pools:   6 pools, 384 pgs
    objects: 242 objects, 3721 bytes
    usage:   901 MB used, 4650 GB / 4651 GB avail
    pgs:     242/726 objects degraded (33.333%)
             295 active+undersized
==========================

Expected results:

The cluster should be active and clean with RGW installed

Additional info:

Comment 4 Sébastien Han 2018-08-20 08:45:25 UTC
I either need the complete log of the ansible run or access to the env. Without one of the two I cannot help you with this.
Thanks.

Comment 5 Persona non grata 2018-08-20 08:50:16 UTC
Created attachment 1477100 [details]
ansible log

Comment 6 Sébastien Han 2018-08-20 10:18:37 UTC
How many runs do you have in this file?
The last action is a mon failing to restart.

If rgw wasn't deployed, this means it was not declared in the inventory file, I don't see any ceph-rgw statements so nothing ran.

This is hard to read.
Please clarify what you did.

Comment 7 Persona non grata 2018-08-20 10:29:36 UTC
Created attachment 1477124 [details]
ansible log with -vvv

Comment 8 Persona non grata 2018-08-20 10:33:15 UTC
(In reply to leseb from comment #6)
> How many runs do you have in this file?
> The last action is a mon failing to restart.
> 
> If rgw wasn't deployed, this means it was not declared in the inventory
> file, I don't see any ceph-rgw statements so nothing ran.
> 
> This is hard to read.
> Please clarify what you did.

That log file had purge and site-docker logs. I've attached ansible log( only one run) with verbose mode enabled.

Comment 9 Sébastien Han 2018-08-20 10:56:26 UTC
Thanks, I see what's going on now.

Comment 17 errata-xmlrpc 2018-09-26 18:23:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819