Bug 1888630 - [ceph-ansible]: Multi realm create workflow fails on restart RGW step
Summary: [ceph-ansible]: Multi realm create workflow fails on restart RGW step
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.2
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.2z1
Assignee: Guillaume Abrioux
QA Contact: Uday kurundwade
URL:
Whiteboard:
: 1917144 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-15 12:00 UTC by Tejas
Modified: 2021-04-28 20:13 UTC (History)
9 users (show)

Fixed In Version: ceph-ansible-4.0.43-1.el8cp, ceph-ansible-4.0.43-1.el7cp
Doc Type: Bug Fix
Doc Text:
Cause: When collocating rgw with either a mon, mgr or osd, switching from single site to a multisite rgw setup failed because of the handlers triggered between the ansible play of the collocated daemon and the play of the rgw. Consequence: Multi realm creation workflow fails on the RGW restart task because the multisite changes are not yet applied the (it makes the handlers fail). Fix: The idea here is to ensure we run the multisite configuration from the ceph-handler role before the restart happens, this way it won't complain because of a non existing multisite configuration. (Note: this is also valid when simply changing a multisite configuration) Result: Multi realm creation workflow doesn't fail anymore on the RGW restart task.
Clone Of:
Environment:
Last Closed: 2021-04-28 20:12:32 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:1452 0 None None None 2021-04-28 20:13:04 UTC

Description Tejas 2020-10-15 12:00:24 UTC
Description of problem:
  RGW failes to restart , in the scenario where we are trying to setup multiple replicated realms.

Version-Release number of selected component (if applicable):
ceph version 14.2.11-49.el7cp
ansible-2.9.14-1.el7ae.noarch
ceph-ansible-4.0.34-1.el7cp.noarch
How reproducible:


RUNNING HANDLER [ceph-handler : restart ceph rgw daemon(s)] **************************************************************************************************
Thursday 15 October 2020  02:58:27 -0400 (0:00:00.997)       0:05:04.726 ****** 
skipping: [ceph-tejas-1602577535750-node2-monrgwosd] => (item=ceph-tejas-1602577535750-node4-rgw) 
failed: [ceph-tejas-1602577535750-node2-monrgwosd -> ceph-tejas-1602577535750-node3-monrgwosd] (item=ceph-tejas-1602577535750-node3-monrgwosd) => changed=true 
  ansible_loop_var: item
  cmd:
  - /usr/bin/env
  - bash
  - /tmp/restart_rgw_daemon.sh
  delta: '0:01:40.472566'
  end: '2020-10-15 03:00:15.202965'
  item: ceph-tejas-1602577535750-node3-monrgwosd
  msg: non-zero return code
  rc: 1
  start: '2020-10-15 02:58:34.730399'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |-
    Socket file  could not be found, which means Rados Gateway is not running. Showing ceph-rgw unit logs now:

Oct 15 02:58:34 ceph-tejas-1602577535750-node3-monrgwosd systemd[1]: Stopping Ceph rados gateway...
    Oct 15 02:58:34 ceph-tejas-1602577535750-node3-monrgwosd radosgw[14486]: 2020-10-15 02:58:34.746 7feb29b07700 -1 received  signal: Terminated from /usr/lib/systemd/systemd --system --deserialize 17  (PID: 1) UID: 0
    Oct 15 02:58:34 ceph-tejas-1602577535750-node3-monrgwosd radosgw[14486]: 2020-10-15 02:58:34.747 7feb3efd6900 -1 shutting down
    Oct 15 02:58:35 ceph-tejas-1602577535750-node3-monrgwosd systemd[1]: Stopped Ceph rados gateway.
    Oct 15 02:58:35 ceph-tejas-1602577535750-node3-monrgwosd systemd[1]: Started Ceph rados gateway.
    Oct 15 02:58:35 ceph-tejas-1602577535750-node3-monrgwosd radosgw[34532]: 2020-10-15 02:58:35.364 7f08ebc70900 -1 Couldn't init storage provider (RADOS)
    Oct 15 02:58:35 ceph-tejas-1602577535750-node3-monrgwosd systemd[1]: ceph-radosgw.rgw0.service: main process exited, code=exited, status=5/NOTINSTALLED
    Oct 15 02:58:35 ceph-tejas-1602577535750-node3-monrgwosd systemd[1]: Unit ceph-radosgw.rgw0.service entered failed state.
    Oct 15 02:58:35 ceph-tejas-1602577535750-node3-monrgwosd systemd[1]: ceph-radosgw.rgw0.service failed.
    Oct 15 02:58:35 ceph-tejas-1602577535750-node3-monrgwosd systemd[1]: ceph-radosgw.rgw0.service holdoff time over, scheduling restart.


Checking the ceph.conf :
[client.rgw.ceph-tejas-1602577535750-node2-monrgwosd.rgw0]
host = ceph-tejas-1602577535750-node2-monrgwosd
keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-tejas-1602577535750-node2-monrgwosd.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-ceph-tejas-1602577535750-node2-monrgwosd.rgw0.log
rgw frontends = beast endpoint=10.0.102.58:8080
rgw thread pool size = 512
rgw_realm = france
rgw_zone = paris
rgw_zonegroup = idf


But no realm created yet 
[root@ceph-tejas-1602577535750-node2-monrgwosd cephuser]# radosgw-admin realm list
{
    "default_info": "",
    "realms": []
}


After commenting out the lines 
rgw_realm = france
rgw_zone = paris
rgw_zonegroup = idf

]# systemctl status ceph-radosgw.rgw0
● ceph-radosgw.rgw0.service - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2020-10-15 04:01:05 EDT; 4s ago
 Main PID: 33128 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.rgw0.service
           └─33128 /usr/bin/radosgw -f --cluster ceph --name client.rgw.ceph-tejas-1602577535750-node2-monrgwosd.rgw0 --setuser ceph --setgroup ceph

Comment 5 Guillaume Abrioux 2021-01-18 19:06:19 UTC
*** Bug 1917144 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2021-04-28 20:12:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage security, bug fix, and enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1452


Note You need to log in before you can comment on or make changes to this bug.