Bug 1551637

Summary:	[RGW Container] options added to the ceph.conf are not being accepted to the running config of the RGW process upon restart
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	jquinn <jquinn>
Component:	Container	Assignee:	Sébastien Han <shan>
Status:	CLOSED ERRATA	QA Contact:	Vasishta <vashastr>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.0	CC:	dang, gabrioux, hchen, jim.curtis, jtudelag, kdreyer, mhackett, pprakash, rperiyas, shan, tpetr, tserlin, vashastr, vumrao
Target Milestone:	z4
Target Release:	3.0
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	RHEL: ceph-ansible-3.0.36-1.el7cp Ubuntu: ceph-ansible_3.0.36-2redhat1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-07-11 18:34:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1553254, 1572368

Description jquinn 2018-03-05 15:32:57 UTC

Description of problem: Cluster has been deployed as Containerized, with RGW instances.  The RGW container gets deployed with additional options added to the ceph_conf_overrides section of the playbook.  These values are added into the ceph.conf on the RGW node and in the ceph.conf in the container, but they are not active in the running config.  

In testing, I have modified the ceph.conf on the node, which in turn updates the ceph.conf in the container with the parameters that are being changed.  Upon restart of the container the RGW process is restarted, but the values in the ceph.conf are not reflected in the running process.  



Version-Release number of selected component (if applicable):12.2.1-40


How reproducible:every time


Steps to Reproduce:
1.Create coontainerized RGW intance, with values populated in the ceph_conf_overides (this part optional). 
2.Modify the ceph.conf on the node with customer values for any config parameters
3.restart the container/process
4. ceph daemon /var/run/ceph/ceph-client.rgw.vm250-102.gsslab.pnq2.redhat.com.asok config show |grep -i debug_rgw

Actual results:

[root@vm250-102 ceph]# cat /etc/ceph/ceph.conf 
[client.rgw.vm250-102]
debug_rgw = 20
host = vm250-102
keyring = /var/lib/ceph/radosgw/ceph-rgw.vm250-102/keyring
log file = /var/log/ceph/ceph-rgw-vm250-102.log
rgw frontends = civetweb port=10.74.250.102:8080 num_threads=100
osd_heartbeat_grace = 40

[global]
cluster network = 10.74.250.0/21
fsid = c8a35a99-331d-4ac6-b010-ee292b3f6816
mon host = 10.74.250.108
public network = 10.74.250.0/21
rgws_use_fqdn = true
[root@vm250-102 ceph]# ceph daemon /var/run/ceph/ceph-client.rgw.vm250-102.gsslab.pnq2.redhat.com.asok config show |grep -i heartbeat_grace
    "mon_osd_adjust_heartbeat_grace": "true",
    "osd_heartbeat_grace": "20",
[root@vm250-102 ceph]# ceph daemon /var/run/ceph/ceph-client.rgw.vm250-102.gsslab.pnq2.redhat.com.asok config show |grep -i debug_rgw      
    "debug_rgw": "1/5",
[root@vm250-102 ceph]# 


Expected results: The updated values are expected to be part of the running process, same as with non containerized deployment. 


Additional info:

[root@vm250-102 ~]# systemctl status ceph-radosgw.service  
● ceph-radosgw.service - Ceph RGW
   Loaded: loaded (/etc/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2018-03-05 10:11:28 EST; 4s ago
  Process: 32071 ExecStopPost=/usr/bin/docker stop ceph-rgw-vm250-102 (code=exited, status=1/FAILURE)
  Process: 32082 ExecStartPre=/usr/bin/docker rm ceph-rgw-vm250-102 (code=exited, status=1/FAILURE)
  Process: 32076 ExecStartPre=/usr/bin/docker stop ceph-rgw-vm250-102 (code=exited, status=1/FAILURE)
 Main PID: 32088 (docker-current)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw.service
           └─32088 /usr/bin/docker-current run --rm --net=host --memory=1g --cpu-quota=100000 -v /var/lib/ceph:/var/lib/ceph -v /etc/ceph:/etc/ceph -e RGW_CIVETWEB_IP=10.74.25...

Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30  /entrypoint.sh: static: does not generate config
Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30  /entrypoint.sh: SUCCESS
Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30.232222 7f89d6857e00  0 framework: civetweb
Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30.232272 7f89d6857e00  0 framework conf key: port, val: 10.74.250.102:8080
Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30.234726 7f89d6857e00  0 deferred set uid:gid to 167:167 (ceph:ceph)
Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30.234779 7f89d6857e00  0 ceph version 12.2.1-40.el7cp (c6d85fd953226c9e8168c9abe81f4...n), pid 1
Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30.698321 7f89a71f4700  0 ERROR: keystone revocation processing returned error r=-22
Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30.705597 7f89a89f7700  0 ERROR: lists_keys_next(): ret=-2
Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30.720545 7f89d6857e00  0 starting handler: civetweb
Mar 05 10:11:30 vm250-102.gsslab.pnq2.redhat.com docker[32088]: 2018-03-05 10:11:30.733501 7f89d6857e00  1 mgrc service_daemon_register rgw.vm250-102.gsslab.pnq2.redh...ro=rhel,d
Hint: Some lines were ellipsized, use -l to show in full.
[root@vm250-102 ~]#


[root@vm250-102 /]# ps -ef |grep -i ceph
ceph         1     0  0 10:11 ?        00:00:00 /usr/bin/radosgw --cluster ceph --setuser ceph --setgroup ceph -d -n client.rgw.vm250-102.gsslab.pnq2.redhat.com -k /var/lib/ceph/radosgw/ceph-rgw.vm250-102.gsslab.pnq2.redhat.com/keyring --rgw-socket-path= --rgw-zonegroup= --rgw-zone= --rgw-frontends=civetweb port=10.74.250.102:8080
root       250   236  0 10:13 ?        00:00:00 grep --color=auto -i ceph
[root@vm250-102 /]#

Comment 3 jquinn 2018-03-05 17:33:58 UTC

In further testing the below configuration works for the config being pushed to the process.  I tried using the FQDN section configured in the ceph_conf_overrides during the install but it would fail.   After the install, I am able to just update the ceph.conf to reflect the fqdn and upon restarts of the service it will pull in the config from the ceph.conf.   




** current ceph.conf, updated the grace period value in the new section created using the FQDN ** 

[root@vm250-102 /]# ll /var/log/ceph
total 0
[root@vm250-102 /]# cat /etc/ceph/ceph.conf 
#[client.rgw.vm250-102]
#debug_rgw = 20
#host = vm250-102
#keyring = /var/lib/ceph/radosgw/ceph-rgw.vm250-102/keyring
#log file = /var/log/ceph/ceph-rgw-vm250-102.log
#rgw frontends = civetweb port=10.74.250.102:8080 num_threads=100
#osd_heartbeat_grace = 40

[global]
cluster network = 10.74.250.0/21
fsid = c8a35a99-331d-4ac6-b010-ee292b3f6816
mon host = 10.74.250.108
public network = 10.74.250.0/21
rgws_use_fqdn = true

[client.rgw.vm250-102.gsslab.pnq2.redhat.com]
debug_rgw = 20
osd_heartbeat_grace = 60 
host = vm250-102
keyring = /var/lib/ceph/radosgw/ceph-rgw.vm250-102/keyring
log file = /var/log/ceph/ceph-rgw-vm250-102.log
rgw frontends = civetweb port=10.74.250.102:8080 num_threads=100

[root@vm250-102 /]# 



** restarted docker container and now the value is updated ** 

[root@vm250-102 /]# ceph daemon /var/run/ceph/ceph-client.rgw.vm250-102.gsslab.pnq2.redhat.com.asok config show |grep -i heartbeat_grace
    "mon_osd_adjust_heartbeat_grace": "true",
    "osd_heartbeat_grace": "60",
[root@vm250-102 /]

Thanks, 
Joe

Comment 4 jquinn 2018-03-06 22:02:05 UTC

When trying to add multi-site configuration to the RGW running config, the rgw_zone does not get picked up as part of the running config for the RGW process within the container.     

It appears that the rgw_zone parameter is not allowed to be passed in, and has to be passed in as part of the container startup.   See the steps below to workaround this issue.   



**** Configuration of multi-site steps all complete successfully I have added the "rgw_zone = joezone" to the ceph.conf per docs*** 

[root@vm250-102 /]# ceph daemon /var/run/ceph/ceph-client.rgw.vm250-102.gsslab.pnq2.redhat.com.asok config show |grep -i rgw_zone
    "rgw_zone": "",
    "rgw_zone_root_pool": ".rgw.root",
    "rgw_zonegroup": "",
    "rgw_zonegroup_root_pool": ".rgw.root",
[root@vm250-102 /]# 


** After a restart the rgw_zone is not populated *** 

[root@vm250-102 /]# ceph daemon /var/run/ceph/ceph-client.rgw.vm250-102.gsslab.pnq2.redhat.com.asok config show |grep -i rgw_zone
    "rgw_zone": "",
    "rgw_zone_root_pool": ".rgw.root",
    "rgw_zonegroup": "",
    "rgw_zonegroup_root_pool": ".rgw.root",
[root@vm250-102 /]# 



*** Upon docker startup the option for rgw_zone is not included, maybe part of a private class within the code? *** 


[root@vm250-102 ~]# docker inspect 2ce0c97944a2 |grep -i rgw
        "Name": "/ceph-rgw-vm250-102",
                "RGW_CIVETWEB_IP=10.74.250.102",
                "CEPH_DAEMON=RGW",
                "RGW_CIVETWEB_PORT=8080",
[root@vm250-102 ~]# 


** I added the below to rgws.yml, ran the playbook again, updated the ceph.conf to reflect the FQDN for the client per the original issue in the BZ and the rgw_zone ** 

ceph_rgw_docker_extra_env: "-e RGW_ZONE=joezone"


** The rgw_zone is now passed in as part of initial container configuration *** 

[root@vm250-102 ~]# docker inspect 7a9e4a9272fb |grep -i rgw
        "Name": "/ceph-rgw-vm250-102",
                "RGW_CIVETWEB_IP=10.74.250.102",
                "CEPH_DAEMON=RGW",
                "RGW_CIVETWEB_PORT=8080",
                "RGW_ZONE=joezone",
[root@vm250-102 ~]# 


** And now the running process has the rgw_zone configuration ** 

[root@vm250-102 /]# ceph daemon /var/run/ceph/ceph-client.rgw.vm250-102.gsslab.pnq2.redhat.com.asok config show |grep rgw_zone
    "rgw_zone": "joezone",
    "rgw_zone_root_pool": ".rgw.root",
    "rgw_zonegroup": "",
    "rgw_zonegroup_root_pool": ".rgw.root",
[root@vm250-102 /]# 


Thanks, 
Joe

Comment 8 Sébastien Han 2018-04-16 09:45:17 UTC

First, we should try to understand why the ceph.conf is ignored on containerized deployments. I don't think it's ignored, it's just being overwritten by the container CLI. It's just a matter of variable precedence.

Let me work on a patch now.

Comment 10 Tomas Petr 2018-05-22 09:39:05 UTC

Hi Seb,

I am fighting with setting performance tuning for RGW in containers, specifically the in all.yml
radosgw_civetweb_num_threads: 512 #default 100

in non-container env it is part of rgw_frontends variable
"rgw_frontends": "civetweb port=10.10.92.91:8080 num_threads=512",

rgw_frontends is overridden by container start of RGW, as there is no  num_threads option
-----------------------------
# find / -name "start_rgw.sh"
/var/lib/docker/overlay2/439572f5c98fe1a2b438171a4cc834e13745676116f914f1ae888375f673570b/diff/start_rgw.sh
/var/lib/docker/overlay2/263d778874297c8fee362cdd06cad80924c2b8bb312e06ab3251d1df2555776b/merged/start_rgw.sh

[root@rgws-1 ~]# grep num_threads 
/var/lib/docker/overlay2/439572f5c98fe1a2b438171a4cc834e13745676116f914f1ae888375f673570b/diff/start_rgw.sh:  local rgw_frontends="civetweb port=$RGW_CIVETWEB_IP:$RGW_CIVETWEB_PORT"

/var/lib/docker/overlay2/263d778874297c8fee362cdd06cad80924c2b8bb312e06ab3251d1df2555776b/merged/start_rgw.sh:  local rgw_frontends="civetweb port=$RGW_CIVETWEB_IP:$RGW_CIVETWEB_PORT"
--------------------
[root@rgws-1 ~]# docker exec ceph-rgw-rgws-1 ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgws-1.asok config diff
{
    "diff": {
        "current": {
            "admin_socket": "/var/run/ceph/ceph-client.rgw.rgws-1.asok",
            "cluster_network": "192.168.1.0/28",
            "err_to_stderr": "true",
            "fsid": "d5a1e891-afca-4e6f-b2dc-e7c08db07dbf",
            "internal_safe_to_start_threads": "true",
            "keyring": "/var/lib/ceph/radosgw/ceph-rgw.rgws-1/keyring",
            "log_max_recent": "10000",
            "mds_data": "/var/lib/ceph/mds/ceph-rgw.rgws-1",
            "mgr_data": "/var/lib/ceph/mgr/ceph-rgw.rgws-1",
            "mon_cluster_log_file": "default=/var/log/ceph/ceph.$channel.log cluster=/var/log/ceph/ceph.log",
            "mon_data": "/var/lib/ceph/mon/ceph-rgw.rgws-1",
            "mon_debug_dump_location": "/var/log/ceph/ceph-client.rgw.rgws-1.tdump",
            "mon_host": "10.10.93.0,10.10.92.94,10.10.93.107",
            "osd_data": "/var/lib/ceph/osd/ceph-rgw.rgws-1",
            "osd_journal": "/var/lib/ceph/osd/ceph-rgw.rgws-1/journal",
            "public_network": "10.10.92.0/22",
            "rgw_data": "/var/lib/ceph/radosgw/ceph-rgw.rgws-1",
            "rgw_frontends": "civetweb port=10.10.92.91:8080",  <-----
            "rgw_thread_pool_size": "512",
            "setgroup": "ceph",
            "setuser": "ceph"


cat /etc/ceph/ceph.conf
[client.rgw.rgws-1]
host = rgws-1
keyring = /var/lib/ceph/radosgw/ceph-rgw.rgws-1/keyring
log file = /var/log/ceph/ceph-rgw-rgws-1.log
rgw frontends = civetweb port=10.10.92.91:8080 num_threads=512
rgw_thread_pool_size = 512
---------------------------

editing these files is the only way how to set this parameter in container environment:
local rgw_frontends="civetweb port=$RGW_CIVETWEB_IP:$RGW_CIVETWEB_PORT num_threads=512"

[root@rgws-1 ~]# docker exec ceph-rgw-rgws-1 ceph --admin-daemon /var/run/ceph/ceph-client.rgw.rgws-1.asok config diff
{
    "diff": {
        "current": {
            "admin_socket": "/var/run/ceph/ceph-client.rgw.rgws-1.asok",
            "cluster_network": "192.168.1.0/28",
            "err_to_stderr": "true",
            "fsid": "d5a1e891-afca-4e6f-b2dc-e7c08db07dbf",
            "internal_safe_to_start_threads": "true",
            "keyring": "/var/lib/ceph/radosgw/ceph-rgw.rgws-1/keyring",
            "log_max_recent": "10000",
            "mds_data": "/var/lib/ceph/mds/ceph-rgw.rgws-1",
            "mgr_data": "/var/lib/ceph/mgr/ceph-rgw.rgws-1",
            "mon_cluster_log_file": "default=/var/log/ceph/ceph.$channel.log cluster=/var/log/ceph/ceph.log",
            "mon_data": "/var/lib/ceph/mon/ceph-rgw.rgws-1",
            "mon_debug_dump_location": "/var/log/ceph/ceph-client.rgw.rgws-1.tdump",
            "mon_host": "10.10.93.0,10.10.92.94,10.10.93.107",
            "osd_data": "/var/lib/ceph/osd/ceph-rgw.rgws-1",
            "osd_journal": "/var/lib/ceph/osd/ceph-rgw.rgws-1/journal",
            "public_network": "10.10.92.0/22",
            "rgw_data": "/var/lib/ceph/radosgw/ceph-rgw.rgws-1",
            "rgw_frontends": "civetweb port=10.10.92.91:8080 num_threads=512",
            "rgw_thread_pool_size": "512",
            "setgroup": "ceph",
            "setuser": "ceph"

Comment 11 Sébastien Han 2018-05-24 19:06:02 UTC

Tomas, can you please open another BZ for this?
I'll fix this.

Comment 12 Tomas Petr 2018-05-25 07:04:49 UTC

(In reply to leseb from comment #11)
> Tomas, can you please open another BZ for this?
> I'll fix this.

Hi Seb, 
new BZ#1582411, thanks

Comment 13 Sébastien Han 2018-05-25 10:54:56 UTC

Thanks Tomas!

Comment 21 Vasishta 2018-06-19 05:44:35 UTC

Hi,

I tried different rgw configs and some random values, working fine for me.

A conf file snippet -

debug rgw = 10
rgw_enable_usage_log = true
rgw_bucket_default_quota_max_size = 1024000
rgw bucket default quota max objects = 10000
rgw user max buckets = 333

Please let me know if there are concerns, suggestions.

Regards,
Vasishta shastry
AQE, Ceph

Comment 23 errata-xmlrpc 2018-07-11 18:34:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2178