Bug 1914936
| Summary: | [RGW][cephadm] RGW daemon fails to deploy | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Madhavi Kasturi <mkasturi> |
| Component: | Cephadm | Assignee: | Venky Shankar <vshankar> |
| Status: | CLOSED ERRATA | QA Contact: | Vasishta <vashastr> |
| Severity: | high | Docs Contact: | Karen Norteman <knortema> |
| Priority: | unspecified | ||
| Version: | 5.0 | CC: | dfuller, kdreyer, mgowri, pdonnell, pnataraj, sangadi, sewagner, sunnagar, vereddy |
| Target Milestone: | --- | Keywords: | Automation, Regression, TestBlocker |
| Target Release: | 5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-16.1.0-486.el8cp | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-08-30 08:27:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Issue is seen with latest compose 16.0.9150 as well. Below log for reference http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1611047955800/cephadm_deployment_0.log Venky, can you have a look at this while Patrick is out? Based on what JuanMi and I discussed earlier this week, as soon as we cherry-pick https://github.com/ceph/ceph/pull/38910 to pacific, we'll rebase onto that downstream and QE can re-qualify that build for the next RHCS 5.0 alpha (CLOUDBLD-3172). JuanMi has this commit in https://github.com/ceph/ceph/pull/39021 (In reply to Douglas Fuller from comment #11) > Venky, can you have a look at this while Patrick is out? Looks like others already figured out the fix. (In reply to Venky Shankar from comment #14) > (In reply to Douglas Fuller from comment #11) > > Venky, can you have a look at this while Patrick is out? > > Looks like others already figured out the fix. Spoke to Doug. So this needs to be cherry-picked downstream. I'll get to this later this week (probably Wednesday, 27th Jan). Before v16.2.0, I rebase to the tip of pacific every Monday. Ideally we stabilize pacific upstream (this issue included), and then we can take this fix as we rebase. @Juan, Unable to verify this issue with latest compose due to https://bugzilla.redhat.com/show_bug.cgi?id=1923719. Hence, We are blocked still. bug 1923719 is resolved now, so I'm setting Fixed In Version to the current RH Ceph Storage 5 build. Issue is not seen in the latest alpha drop
[root@magna011 ubuntu]# sudo cephadm shell
Inferring fsid d8a1d97c-7cbb-11eb-82af-002590fc26f6
Inferring config /var/lib/ceph/d8a1d97c-7cbb-11eb-82af-002590fc26f6/mon.magna011/config
Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f
WARNING: The same type, major and minor should not be used for multiple devices.
WARNING: The same type, major and minor should not be used for multiple devices.
WARNING: The same type, major and minor should not be used for multiple devices.
[ceph: root@magna011 /]# ceph -s
cluster:
id: d8a1d97c-7cbb-11eb-82af-002590fc26f6
health: HEALTH_OK
services:
mon: 3 daemons, quorum magna011,magna014,magna013 (age 4d)
mgr: magna011.vpdjxa(active, since 4d), standbys: magna014.pmkeku, magna013.evxipz
osd: 12 osds: 12 up (since 4d), 12 in (since 4d)
rgw: 5 daemons active (rgw_bz.TESTzone.magna014.mpujcu, rgw_bz.magna013.agpwvb, rgw_bz.magna014.oivsyb, rgw_bz.magna016.jerqyn, rgw_bz_new.TESTzone_new.magna016.usqivp)
data:
pools: 10 pools, 928 pgs
objects: 754 objects, 166 KiB
usage: 3.3 GiB used, 11 TiB / 11 TiB avail
pgs: 928 active+clean
io:
client: 21 KiB/s rd, 0 B/s wr, 20 op/s rd, 10 op/s wr
progress:
Global Recovery Event (87m)
[====........................] (remaining: 8h)
[ceph: root@magna011 /]#
Ceph orch ls
[ceph: root@magna011 /]# ceph orch ls
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID
alertmanager 0/1 - - count:1 <unknown> <unknown>
crash 4/4 8m ago 4d * registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef
grafana 0/1 - - count:1 <unknown> <unknown>
mgr 3/3 8m ago 4d magna011;magna013;magna014;count:3 mix 38e52bf51cef
mon 3/3 8m ago 4d magna011;magna013;magna014;count:3 mix 38e52bf51cef
node-exporter 4/4 8m ago 4d * registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5 f0a5cfd22f16
osd.all-available-devices 12/12 8m ago 4d * registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef
prometheus 0/1 - - count:1 <unknown> <unknown>
rgw.rgw_bz 3/3 8m ago 3d magna013;magna014;magna016 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef
rgw.rgw_bz.TESTzone 1/1 8m ago 112m magna014;count:1 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef
rgw.rgw_bz_new.TESTzone_new 1/1 7m ago 89m magna016;count:1 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef
[ceph: root@magna011 /]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3294 |
@Juan,Issue is seen with latest compose as well. cephadm-16.0.0-8633 Image details: registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-97619-20210115184956" Error snippet: 2021-01-17 18:39:58,677 - ceph.ceph_admin - INFO - 0/2 rgw daemon(s) up.... 2021-01-17 18:39:58,678 - ceph.ceph_admin - INFO - re-checking... 0.0 times left 2021-01-17 18:40:03,679 - __main__ - ERROR - Traceback (most recent call last): File "run.py", line 589, in run ceph_cluster_dict=ceph_cluster_dict, clients=clients) File "/home/pnataraj/cephci/tests/ceph_installer/test_cephadm.py", line 74, in run cephadm.add_daemons() File "/home/pnataraj/cephci/ceph/ceph_admin.py", line 214, in add_daemons self.ceph_rgws(self.cluster.get_nodes(role="rgw")) File "/home/pnataraj/cephci/ceph/ceph_admin.py", line 378, in ceph_rgws timeout=len(rgws) * self.TIMEOUT, AssertionError 2021-01-17 18:40:03,680 - __main__ - INFO - ceph_clusters_file rerun/ceph-snapshot-1610907666787 2021-01-17 18:40:03,680 - __main__ - INFO - Test <module 'test_cephadm' from '/home/pnataraj/cephci/tests/ceph_installer/test_cephadm.py'> failed 2021-01-17 18:40:03,681 - __main__ - INFO - Aborting on test failure 2021-01-17 18:40:03,681 - __main__ - INFO - All test logs located here: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1610907456925 CephCI Logs: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1610907456925/cephadm_deployment_0.log