Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1914936

Summary: [RGW][cephadm] RGW daemon fails to deploy
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Madhavi Kasturi <mkasturi>
Component: CephadmAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact: Karen Norteman <knortema>
Priority: unspecified    
Version: 5.0CC: dfuller, kdreyer, mgowri, pdonnell, pnataraj, sangadi, sewagner, sunnagar, vereddy
Target Milestone: ---Keywords: Automation, Regression, TestBlocker
Target Release: 5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.1.0-486.el8cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-30 08:27:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 9 Preethi 2021-01-18 05:25:36 UTC
@Juan,Issue is seen with latest compose as well.

cephadm-16.0.0-8633

Image details: registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-97619-20210115184956"

Error snippet:

2021-01-17 18:39:58,677 - ceph.ceph_admin - INFO - 0/2 rgw daemon(s) up....
2021-01-17 18:39:58,678 - ceph.ceph_admin - INFO - re-checking... 0.0 times left
2021-01-17 18:40:03,679 - __main__ - ERROR - Traceback (most recent call last):
  File "run.py", line 589, in run
    ceph_cluster_dict=ceph_cluster_dict, clients=clients)
  File "/home/pnataraj/cephci/tests/ceph_installer/test_cephadm.py", line 74, in run
    cephadm.add_daemons()
  File "/home/pnataraj/cephci/ceph/ceph_admin.py", line 214, in add_daemons
    self.ceph_rgws(self.cluster.get_nodes(role="rgw"))
  File "/home/pnataraj/cephci/ceph/ceph_admin.py", line 378, in ceph_rgws
    timeout=len(rgws) * self.TIMEOUT,
AssertionError

2021-01-17 18:40:03,680 - __main__ - INFO - ceph_clusters_file rerun/ceph-snapshot-1610907666787
2021-01-17 18:40:03,680 - __main__ - INFO - Test <module 'test_cephadm' from '/home/pnataraj/cephci/tests/ceph_installer/test_cephadm.py'> failed
2021-01-17 18:40:03,681 - __main__ - INFO - Aborting on test failure
2021-01-17 18:40:03,681 - __main__ - INFO - 
All test logs located here: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1610907456925

CephCI Logs: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1610907456925/cephadm_deployment_0.log

Comment 10 Preethi 2021-01-19 10:01:47 UTC
Issue is seen with latest compose 16.0.9150 as well. 
Below log for reference http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1611047955800/cephadm_deployment_0.log

Comment 11 Douglas Fuller 2021-01-21 19:13:09 UTC
Venky, can you have a look at this while Patrick is out?

Comment 12 Ken Dreyer (Red Hat) 2021-01-21 21:32:58 UTC
Based on what JuanMi and I discussed earlier this week, as soon as we cherry-pick https://github.com/ceph/ceph/pull/38910 to pacific, we'll rebase onto that downstream and QE can re-qualify that build for the next RHCS 5.0 alpha (CLOUDBLD-3172).

Comment 13 Ken Dreyer (Red Hat) 2021-01-22 17:14:28 UTC
JuanMi has this commit in https://github.com/ceph/ceph/pull/39021

Comment 14 Venky Shankar 2021-01-25 12:39:15 UTC
(In reply to Douglas Fuller from comment #11)
> Venky, can you have a look at this while Patrick is out?

Looks like others already figured out the fix.

Comment 15 Venky Shankar 2021-01-25 14:42:41 UTC
(In reply to Venky Shankar from comment #14)
> (In reply to Douglas Fuller from comment #11)
> > Venky, can you have a look at this while Patrick is out?
> 
> Looks like others already figured out the fix.

Spoke to Doug. So this needs to be cherry-picked downstream. I'll get to this later this week (probably Wednesday, 27th Jan).

Comment 16 Ken Dreyer (Red Hat) 2021-01-25 19:42:25 UTC
Before v16.2.0, I rebase to the tip of pacific every Monday. Ideally we stabilize pacific upstream (this issue included), and then we can take this fix as we rebase.

Comment 19 Preethi 2021-02-02 05:52:07 UTC
@Juan, Unable to verify this issue with latest compose due to https://bugzilla.redhat.com/show_bug.cgi?id=1923719. Hence, We are blocked still.

Comment 20 Ken Dreyer (Red Hat) 2021-03-03 00:44:57 UTC
bug 1923719 is resolved now, so I'm setting Fixed In Version to the current RH Ceph Storage 5 build.

Comment 23 Preethi 2021-03-08 13:44:18 UTC
Issue is not seen in the latest alpha drop

[root@magna011 ubuntu]# sudo cephadm shell
Inferring fsid d8a1d97c-7cbb-11eb-82af-002590fc26f6
Inferring config /var/lib/ceph/d8a1d97c-7cbb-11eb-82af-002590fc26f6/mon.magna011/config
Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f
WARNING: The same type, major and minor should not be used for multiple devices.
WARNING: The same type, major and minor should not be used for multiple devices.
WARNING: The same type, major and minor should not be used for multiple devices.
[ceph: root@magna011 /]# ceph -s
  cluster:
    id:     d8a1d97c-7cbb-11eb-82af-002590fc26f6
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum magna011,magna014,magna013 (age 4d)
    mgr: magna011.vpdjxa(active, since 4d), standbys: magna014.pmkeku, magna013.evxipz
    osd: 12 osds: 12 up (since 4d), 12 in (since 4d)
    rgw: 5 daemons active (rgw_bz.TESTzone.magna014.mpujcu, rgw_bz.magna013.agpwvb, rgw_bz.magna014.oivsyb, rgw_bz.magna016.jerqyn, rgw_bz_new.TESTzone_new.magna016.usqivp)
 
  data:
    pools:   10 pools, 928 pgs
    objects: 754 objects, 166 KiB
    usage:   3.3 GiB used, 11 TiB / 11 TiB avail
    pgs:     928 active+clean
 
  io:
    client:   21 KiB/s rd, 0 B/s wr, 20 op/s rd, 10 op/s wr
 
  progress:
    Global Recovery Event (87m)
      [====........................] (remaining: 8h)
 
[ceph: root@magna011 /]# 


Ceph orch ls

[ceph: root@magna011 /]# ceph orch ls
NAME                         RUNNING  REFRESHED  AGE   PLACEMENT                           IMAGE NAME                                                                                                                    IMAGE ID      
alertmanager                     0/1  -          -     count:1                             <unknown>                                                                                                                     <unknown>     
crash                            4/4  8m ago     4d    *                                   registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
grafana                          0/1  -          -     count:1                             <unknown>                                                                                                                     <unknown>     
mgr                              3/3  8m ago     4d    magna011;magna013;magna014;count:3  mix                                                                                                                           38e52bf51cef  
mon                              3/3  8m ago     4d    magna011;magna013;magna014;count:3  mix                                                                                                                           38e52bf51cef  
node-exporter                    4/4  8m ago     4d    *                                   registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5                                                               f0a5cfd22f16  
osd.all-available-devices      12/12  8m ago     4d    *                                   registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
prometheus                       0/1  -          -     count:1                             <unknown>                                                                                                                     <unknown>     
rgw.rgw_bz                       3/3  8m ago     3d    magna013;magna014;magna016          registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
rgw.rgw_bz.TESTzone              1/1  8m ago     112m  magna014;count:1                    registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
rgw.rgw_bz_new.TESTzone_new      1/1  7m ago     89m   magna016;count:1                    registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
[ceph: root@magna011 /]#

Comment 26 errata-xmlrpc 2021-08-30 08:27:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294