Bug 1858884

Summary: [cephadm] rgw daemons are not coming up with the documented commands
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: CephadmAssignee: Juan Miguel Olmo <jolmomar>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact: Karen Norteman <knortema>
Priority: unspecified    
Version: 5.0CC: pnataraj, sewagner, tserlin, twilkins, vereddy
Target Milestone: ---   
Target Release: 5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.0.0-7209.el8cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-30 08:26:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vikhyat Umrao 2020-07-20 16:47:55 UTC
Description of problem:
rgw daemons are not coming up with the documented commands
https://docs.ceph.com/docs/master/cephadm/install/#deploy-rgws

# radosgw-admin realm create --rgw-realm=myorg --default
# radosgw-admin zonegroup create --rgw-zonegroup=default --master --default
# radosgw-admin zone create --rgw-zonegroup=default --rgw-zone=us-east-1 --master --default

The following command is not able to start rgw daemons.

# ceph orch apply rgw myorg us-east-1 --placement="2 myhost1 myhost2"

We need to run manual steps.

ceph orch daemon add rgw test us-east --placement dell-per630-13.gsslab.pnq2.redhat.com
ceph orch daemon add rgw test us-east --placement dell-per630-12.gsslab.pnq2.redhat.com
ceph orch daemon add rgw test us-east --placement dell-per630-11.gsslab.pnq2.redhat.com

# cephadm version
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)


## cluster status


[root@dell-per630-13 ~]# ceph -s
  cluster:
    id:     c365eda6-c766-11ea-8cfb-b083fee95e35
    health: HEALTH_ERR
            Module 'cephadm' has failed: auth get failed: failed to find client.crash.dell-per630-13 in keyring retval: -2
 
  services:
    mon: 3 daemons, quorum dell-per630-13.gsslab.pnq2.redhat.com,dell-per630-12,dell-per630-11 (age 3d)
    mgr: dell-per630-13.gsslab.pnq2.redhat.com.ubgekg(active, since 4d), standbys: dell-per630-12.awkxnp
    osd: 6 osds: 6 up (since 3d), 6 in (since 3d)
    rgw: 3 daemons active (test.us-east.dell-per630-11.dbcagc, test.us-east.dell-per630-12.kovdgi, test.us-east.dell-per630-13.yikuae)
 
  task status:
 
  data:
    pools:   6 pools, 137 pgs
    objects: 205 objects, 5.3 KiB
    usage:   6.1 GiB used, 1.7 TiB / 1.7 TiB avail
    pgs:     137 active+clean

Comment 1 RHEL Program Management 2020-07-20 16:48:02 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Vikhyat Umrao 2020-07-20 17:51:14 UTC
Looks like we do have an upstream report - https://tracker.ceph.com/issues/46385.

Comment 3 Juan Miguel Olmo 2020-07-21 08:40:18 UTC
@Vikhyat Umrao: In all the command you have put the values needed to use your hosts except in:

# ceph orch apply rgw myorg us-east-1 --placement="2 myhost1 myhost2"

Why did you use "myhost1 myhost2" ? instead of:

# ceph orch apply rgw myorg us-east-1 --placement="3 dell-per630-13.gsslab.pnq2.redhat.com dell-per630-12.gsslab.pnq2.redhat.com dell-per630-11.gsslab.pnq2.redhat.com"


Why when you expose the "ceph -s" command output the names of the servers running RGW daemons are different from the one you have used to create the RGW daemons?

...
rgw: 3 daemons active (test.us-east.dell-per630-11.dbcagc, test.us-east.dell-per630-12.kovdgi, test.us-east.dell-per630-13.yikuae)
...

but you said that you executed:

ceph orch daemon add rgw test us-east --placement dell-per630-13.gsslab.pnq2.redhat.com
ceph orch daemon add rgw test us-east --placement dell-per630-12.gsslab.pnq2.redhat.com
ceph orch daemon add rgw test us-east --placement dell-per630-11.gsslab.pnq2.redhat.com

Comment 4 Juan Miguel Olmo 2020-07-21 09:01:49 UTC
sorry. my second question is wrong... :-)

Comment 5 Vikhyat Umrao 2020-07-21 09:56:53 UTC
(In reply to Juan Miguel Olmo from comment #3)
> @Vikhyat Umrao: In all the command you have put the values needed to use
> your hosts except in:
> 
> # ceph orch apply rgw myorg us-east-1 --placement="2 myhost1 myhost2"
> 
> Why did you use "myhost1 myhost2" ? instead of:

 # ceph orch apply rgw myorg us-east-1 --placement="3
 dell-per630-13.gsslab.pnq2.redhat.com dell-per630-12.gsslab.pnq2.redhat.com
 dell-per630-11.gsslab.pnq2.redhat.com"

Yes, the first time I had run the same command which you mentioned above, I just gave doc example for explaining that doc method did not help.


> 
> 
> Why when you expose the "ceph -s" command output the names of the servers
> running RGW daemons are different from the one you have used to create the
> RGW daemons?

Yes, I had run the following commands - 

ceph orch daemon add rgw test us-east --placement dell-per630-13.gsslab.pnq2.redhat.com
ceph orch daemon add rgw test us-east --placement dell-per630-12.gsslab.pnq2.redhat.com
ceph orch daemon add rgw test us-east --placement dell-per630-11.gsslab.pnq2.redhat.com

but the rgw daemons name came in `ceph -s` short hostnames not as FQDN.

> 
> ...
> rgw: 3 daemons active (test.us-east.dell-per630-11.dbcagc,
> test.us-east.dell-per630-12.kovdgi, test.us-east.dell-per630-13.yikuae)
> ...
> 
> but you said that you executed:
> 
> ceph orch daemon add rgw test us-east --placement
> dell-per630-13.gsslab.pnq2.redhat.com
> ceph orch daemon add rgw test us-east --placement
> dell-per630-12.gsslab.pnq2.redhat.com
> ceph orch daemon add rgw test us-east --placement
> dell-per630-11.gsslab.pnq2.redhat.com

Comment 6 Juan Miguel Olmo 2020-07-21 10:36:42 UTC
Can you add the active manager log file?, when you issued the command:

ceph orch apply rgw myorg us-east-1 --placement="3 dell-per630-13.gsslab.pnq2.redhat.com dell-per630-12.gsslab.pnq2.redhat.com dell-per630-11.gsslab.pnq2.redhat.com"

Comment 7 Vikhyat Umrao 2020-07-21 15:22:58 UTC
(In reply to Juan Miguel Olmo from comment #6)
> Can you add the active manager log file?, when you issued the command:
> 
> ceph orch apply rgw myorg us-east-1 --placement="3
> dell-per630-13.gsslab.pnq2.redhat.com dell-per630-12.gsslab.pnq2.redhat.com
> dell-per630-11.gsslab.pnq2.redhat.com"

Looks like cephadm is not configuring mgr and mons logs. I have created the following bug - 
https://bugzilla.redhat.com/show_bug.cgi?id=1859267

Comment 8 Vikhyat Umrao 2020-07-21 17:18:38 UTC
Juan - as I mentioned in the email this time with downstream version I was able to install the rgws with the help of following command:

[root@dell-per630-13 ~]# ceph orch apply rgw test us-east --placement="3 dell-per630-13 dell-per630-12 dell-per630-11"

gdoc - https://docs.google.com/document/d/1Q8gPi0-Z_VJKe7uWH-vfuETi5BLj5ffetRw2llCJVN8/edit#

I would say I am not able to reproduce in downstream version. Additional thing this time I did match the FQDN and short hostanme as the same so it cephadm does not get confused b/w fqdn and short hostname.

Comment 9 Vikhyat Umrao 2020-07-23 09:02:11 UTC
Tim - as you mentiond in the IRC for you with downstream bits also this command[1] did not work to create rgws and you had to go with command[2].

[1]ceph orch apply rgw <realm> <zone> --placement="3 host1 host2 host3 ... hostN"
[2]ceph orch daemon add rgw <realm> <zone> --placement "host1"

Tim - can you please provide active mgr logs. This is the command to get the logs.

cephadm logs --fsid <fsid from `ceph -s` command> --name <mgr name from `ceph -s` command>

Comment 19 Preethi 2020-11-19 12:03:07 UTC
@Juan, Moving this to verified state as issue is not seen with latest image of downstream.

Comment 22 errata-xmlrpc 2021-08-30 08:26:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294