Bug 1914936 - [RGW][cephadm] RGW daemon fails to deploy
Summary: [RGW][cephadm] RGW daemon fails to deploy
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 5.0
Assignee: Venky Shankar
QA Contact: Vasishta
Karen Norteman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-11 14:19 UTC by Madhavi Kasturi
Modified: 2021-08-30 08:28 UTC (History)
9 users (show)

Fixed In Version: ceph-16.1.0-486.el8cp
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-30 08:27:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 38910 0 None closed cephadm: fix rgw osd cap tag 2021-02-15 22:07:07 UTC
Github ceph ceph pull 39021 0 None closed pacific: cephadm: batch backport January 2021-02-15 22:07:08 UTC
Red Hat Issue Tracker RHCEPH-1214 0 None None None 2021-08-30 00:17:10 UTC
Red Hat Product Errata RHBA-2021:3294 0 None None None 2021-08-30 08:28:07 UTC

Comment 9 Preethi 2021-01-18 05:25:36 UTC
@Juan,Issue is seen with latest compose as well.

cephadm-16.0.0-8633

Image details: registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-97619-20210115184956"

Error snippet:

2021-01-17 18:39:58,677 - ceph.ceph_admin - INFO - 0/2 rgw daemon(s) up....
2021-01-17 18:39:58,678 - ceph.ceph_admin - INFO - re-checking... 0.0 times left
2021-01-17 18:40:03,679 - __main__ - ERROR - Traceback (most recent call last):
  File "run.py", line 589, in run
    ceph_cluster_dict=ceph_cluster_dict, clients=clients)
  File "/home/pnataraj/cephci/tests/ceph_installer/test_cephadm.py", line 74, in run
    cephadm.add_daemons()
  File "/home/pnataraj/cephci/ceph/ceph_admin.py", line 214, in add_daemons
    self.ceph_rgws(self.cluster.get_nodes(role="rgw"))
  File "/home/pnataraj/cephci/ceph/ceph_admin.py", line 378, in ceph_rgws
    timeout=len(rgws) * self.TIMEOUT,
AssertionError

2021-01-17 18:40:03,680 - __main__ - INFO - ceph_clusters_file rerun/ceph-snapshot-1610907666787
2021-01-17 18:40:03,680 - __main__ - INFO - Test <module 'test_cephadm' from '/home/pnataraj/cephci/tests/ceph_installer/test_cephadm.py'> failed
2021-01-17 18:40:03,681 - __main__ - INFO - Aborting on test failure
2021-01-17 18:40:03,681 - __main__ - INFO - 
All test logs located here: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1610907456925

CephCI Logs: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1610907456925/cephadm_deployment_0.log

Comment 10 Preethi 2021-01-19 10:01:47 UTC
Issue is seen with latest compose 16.0.9150 as well. 
Below log for reference http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1611047955800/cephadm_deployment_0.log

Comment 11 Douglas Fuller 2021-01-21 19:13:09 UTC
Venky, can you have a look at this while Patrick is out?

Comment 12 Ken Dreyer (Red Hat) 2021-01-21 21:32:58 UTC
Based on what JuanMi and I discussed earlier this week, as soon as we cherry-pick https://github.com/ceph/ceph/pull/38910 to pacific, we'll rebase onto that downstream and QE can re-qualify that build for the next RHCS 5.0 alpha (CLOUDBLD-3172).

Comment 13 Ken Dreyer (Red Hat) 2021-01-22 17:14:28 UTC
JuanMi has this commit in https://github.com/ceph/ceph/pull/39021

Comment 14 Venky Shankar 2021-01-25 12:39:15 UTC
(In reply to Douglas Fuller from comment #11)
> Venky, can you have a look at this while Patrick is out?

Looks like others already figured out the fix.

Comment 15 Venky Shankar 2021-01-25 14:42:41 UTC
(In reply to Venky Shankar from comment #14)
> (In reply to Douglas Fuller from comment #11)
> > Venky, can you have a look at this while Patrick is out?
> 
> Looks like others already figured out the fix.

Spoke to Doug. So this needs to be cherry-picked downstream. I'll get to this later this week (probably Wednesday, 27th Jan).

Comment 16 Ken Dreyer (Red Hat) 2021-01-25 19:42:25 UTC
Before v16.2.0, I rebase to the tip of pacific every Monday. Ideally we stabilize pacific upstream (this issue included), and then we can take this fix as we rebase.

Comment 19 Preethi 2021-02-02 05:52:07 UTC
@Juan, Unable to verify this issue with latest compose due to https://bugzilla.redhat.com/show_bug.cgi?id=1923719. Hence, We are blocked still.

Comment 20 Ken Dreyer (Red Hat) 2021-03-03 00:44:57 UTC
bug 1923719 is resolved now, so I'm setting Fixed In Version to the current RH Ceph Storage 5 build.

Comment 23 Preethi 2021-03-08 13:44:18 UTC
Issue is not seen in the latest alpha drop

[root@magna011 ubuntu]# sudo cephadm shell
Inferring fsid d8a1d97c-7cbb-11eb-82af-002590fc26f6
Inferring config /var/lib/ceph/d8a1d97c-7cbb-11eb-82af-002590fc26f6/mon.magna011/config
Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f
WARNING: The same type, major and minor should not be used for multiple devices.
WARNING: The same type, major and minor should not be used for multiple devices.
WARNING: The same type, major and minor should not be used for multiple devices.
[ceph: root@magna011 /]# ceph -s
  cluster:
    id:     d8a1d97c-7cbb-11eb-82af-002590fc26f6
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum magna011,magna014,magna013 (age 4d)
    mgr: magna011.vpdjxa(active, since 4d), standbys: magna014.pmkeku, magna013.evxipz
    osd: 12 osds: 12 up (since 4d), 12 in (since 4d)
    rgw: 5 daemons active (rgw_bz.TESTzone.magna014.mpujcu, rgw_bz.magna013.agpwvb, rgw_bz.magna014.oivsyb, rgw_bz.magna016.jerqyn, rgw_bz_new.TESTzone_new.magna016.usqivp)
 
  data:
    pools:   10 pools, 928 pgs
    objects: 754 objects, 166 KiB
    usage:   3.3 GiB used, 11 TiB / 11 TiB avail
    pgs:     928 active+clean
 
  io:
    client:   21 KiB/s rd, 0 B/s wr, 20 op/s rd, 10 op/s wr
 
  progress:
    Global Recovery Event (87m)
      [====........................] (remaining: 8h)
 
[ceph: root@magna011 /]# 


Ceph orch ls

[ceph: root@magna011 /]# ceph orch ls
NAME                         RUNNING  REFRESHED  AGE   PLACEMENT                           IMAGE NAME                                                                                                                    IMAGE ID      
alertmanager                     0/1  -          -     count:1                             <unknown>                                                                                                                     <unknown>     
crash                            4/4  8m ago     4d    *                                   registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
grafana                          0/1  -          -     count:1                             <unknown>                                                                                                                     <unknown>     
mgr                              3/3  8m ago     4d    magna011;magna013;magna014;count:3  mix                                                                                                                           38e52bf51cef  
mon                              3/3  8m ago     4d    magna011;magna013;magna014;count:3  mix                                                                                                                           38e52bf51cef  
node-exporter                    4/4  8m ago     4d    *                                   registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5                                                               f0a5cfd22f16  
osd.all-available-devices      12/12  8m ago     4d    *                                   registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
prometheus                       0/1  -          -     count:1                             <unknown>                                                                                                                     <unknown>     
rgw.rgw_bz                       3/3  8m ago     3d    magna013;magna014;magna016          registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
rgw.rgw_bz.TESTzone              1/1  8m ago     112m  magna014;count:1                    registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
rgw.rgw_bz_new.TESTzone_new      1/1  7m ago     89m   magna016;count:1                    registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f  38e52bf51cef  
[ceph: root@magna011 /]#

Comment 26 errata-xmlrpc 2021-08-30 08:27:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294


Note You need to log in before you can comment on or make changes to this bug.