Bug 1525209

Summary:	During upgrade, ceph-ansible does not disable the radosgw system service
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Keith Schincke <kschinck>
Component:	Ceph-Ansible	Assignee:	Guillaume Abrioux <gabrioux>
Status:	CLOSED ERRATA	QA Contact:	Yogev Rabl <yrabl>
Severity:	high	Docs Contact:	Bara Ancincova <bancinco>
Priority:	unspecified
Version:	3.0	CC:	adeza, agunn, aschoen, ceph-eng-bugs, ceph-qe-bugs, gabrioux, gfidente, gkadam, gmeno, hnallurv, kdreyer, kschinck, nthomas, sankarshan, yrabl
Target Milestone:	rc	Keywords:	Triaged
Target Release:	2.5
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	RHEL: ceph-ansible-3.0.18-1.el7cp Ubuntu: ceph-ansible_3.0.18-2redhat1	Doc Type:	Bug Fix
Doc Text:	.`ceph-ansible` now disables the Ceph Object Gateway service as expected when upgrading the OpenStack container When upgrading the OpenStack container from version 11 to 12, the `ceph-ansible` utility did not properly disable the Ceph Object Gateway service provided by the overcloud image. Consequently, the containerized Ceph Object Gateway service entered a failed state because the port it used was bound. The `ceph-ansible` utility has been updated to properly disable the system Ceph Object Gateway service. As a result, the containerized Ceph Object Gateway service starts as expected after upgrading the OpenStack container from version 11 to 12.	Story Points:	---
Clone Of:
Clones:	1528430 1539738 (view as bug list)		Environment:
Last Closed:	2018-02-21 19:46:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1528430
Bug Blocks:	1536401, 1539738

Description Keith Schincke 2017-12-12 19:28:55 UTC

Description of problem:
When upgrading from OSP11 to OSP12 container, ceph-ansible attempts to disable the RGW service provided by the overcloud image. The task attempts to stop/disable ceph-rgw@{{ ansible-hostname }} and ceph-radosgw@{{ ansible-hostname }}.service. The actual service name is ceph-radosgw@radosgw.$name


Version-Release number of selected component (if applicable):
ceph-ansible-3.0.14-1.el7cp.noarch

How reproducible:


Steps to Reproduce:
1. Install OSP11
2. Upgrade to OSP12
3.

Actual results:
radosgw system service remains enabled. this causes the radosgw container service to go into a failed state due to the port being bound. 


Expected results:
radosgw system service disabled.
radosgw container service running


Additional info:
Working to replicate the reported issue.

Comment 1 Sébastien Han 2017-12-15 10:52:59 UTC

Just to be sure, is it when running rolling_update.yml?

Comment 2 Giulio Fidente 2017-12-15 15:28:38 UTC

(In reply to leseb from comment #1)
> Just to be sure, is it when running rolling_update.yml?

No, this is with switch-from-non-containerized-to-containerized-ceph-daemons.yml

Comment 12 Yogev Rabl 2018-01-16 19:47:11 UTC

the verification failed.

The rgw system service is running and cause the rgw container to fail - it uses the same port. 

post upgrade output:
[root@controller-1 ~]# systemctl -a | grep ceph
ceph-radosgw.service                                                                  loaded    active     running      Ceph rados gateway
  ceph-radosgw.service                                                                 loaded    activating auto-restart Ceph RGW

# netstat -tuplan4 | grep 8080 
tcp        0      0 172.17.3.19:8080        0.0.0.0:*               LISTEN      158261/haproxy
tcp        0      0 10.0.0.110:8080         0.0.0.0:*               LISTEN      158261/haproxy
tcp        0      0 172.17.3.20:8080        0.0.0.0:*               LISTEN      93050/radosgw

Comment 13 Sébastien Han 2018-01-18 14:35:27 UTC

fix in https://github.com/ceph/ceph-ansible/releases/tag/v3.0.18

Comment 15 Yogev Rabl 2018-01-22 20:43:08 UTC

the verification failed

Comment 17 Guillaume Abrioux 2018-01-29 17:01:39 UTC

Tested with v3.0.21 and it passed.

@Yogev, I think we can move this BZ to 'VERIFIED' ?

Comment 21 Guillaume Abrioux 2018-02-02 21:29:13 UTC

Hi Bara,

it looks good to me.

Comment 22 Yogev Rabl 2018-02-07 16:43:19 UTC

I have came across a problem in the upgrade that is not related to this bug, we are checking for a workaround and will verify this

Comment 23 Yogev Rabl 2018-02-09 02:10:17 UTC

This is a work in progress, will test it tomorrow at the latest

Comment 24 Yogev Rabl 2018-02-09 18:41:14 UTC

With the latest version of Ceph-ansible and the latest version of ceph docker image the verification failed:

[root@controller-0 ~]# ceph -s
    cluster c9e9f454-0ce9-11e8-a5d6-5254007feace
     health HEALTH_WARN
            too many PGs per OSD (480 > max 300)
     monmap e1: 3 mons at {controller-0=172.17.3.15:6789/0,controller-1=172.17.3.21:6789/0,controller-2=172.17.3.13:6789/0}
            election epoch 38, quorum 0,1,2 controller-2,controller-0,controller-1
     osdmap e48: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds,recovery_deletes
      pgmap v3255: 480 pgs, 14 pools, 1588 bytes data, 171 objects
            138 MB used, 104 GB / 104 GB avail
                 480 active+clean
[root@controller-0 ~]# docker ps | grep ceph
61b91cc92326        brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhceph:ceph-2-rhel-7-docker-candidate-81064-20180205070134   "/entrypoint.sh"         2 minutes ago       Up 2 minutes                                           ceph-mon-controller-0
dc6bbfae758b        brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhceph:ceph-2-rhel-7-docker-candidate-81064-20180205070134   "/entrypoint.sh"         3 minutes ago       Up 3 minutes                                           ceph-rgw-controller-0
[root@controller-0 ~]# systemctl -a | grep ceph
  ceph-mon                                                                         loaded    active   running   Ceph Monitor
  ceph-radosgw.service                                                                 loaded    active   running   Ceph RGW
  system-ceph\x2dcreate\x2dkeys.slice                                                                   loaded    active   active    system-ceph\x2dcreate\x2dkeys.slice
  system-ceph\x2dmon.slice                                                                              loaded    active   active    system-ceph\x2dmon.slice
  system-ceph\x2dradosgw.slice                                                                          loaded    active   active    system-ceph\x2dradosgw.slice
  ceph-mds.target                                                                                       loaded    active   active    ceph target allowing to start/stop all ceph-mds@.service instances at once
  ceph-mon.target                                                                                       loaded    active   active    ceph target allowing to start/stop all ceph-mon@.service instances at once
  ceph-osd.target                                                                                       loaded    active   active    ceph target allowing to start/stop all ceph-osd@.service instances at once
  ceph-radosgw.target                                                                                   loaded    active   active    ceph target allowing to start/stop all ceph-radosgw@.service instances at once
  ceph.target                                                                                           loaded    active   active    ceph target allowing to start/stop all ceph*@.service

Comment 25 Yogev Rabl 2018-02-12 14:29:30 UTC

The verification was successful after all

[root@controller-2 ~]# systemctl status ceph-radosgw.service
● ceph-radosgw.service - Ceph RGW
   Loaded: loaded (/etc/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-02-09 18:28:47 UTC; 2 days ago
 Main PID: 729787 (docker-current)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service
           └─729787 /usr/bin/docker-current run --rm --net=host --memory=1g --cpu-quota=100000 -v /var/lib/ceph:/var/lib/ceph -v /etc/ceph:/etc/ceph -e RGW_CIVETWEB_IP=172.17.3.13 -v /etc/...

Comment 28 errata-xmlrpc 2018-02-21 19:46:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0340