@Juan,Issue is seen with latest compose as well. cephadm-16.0.0-8633 Image details: registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-97619-20210115184956" Error snippet: 2021-01-17 18:39:58,677 - ceph.ceph_admin - INFO - 0/2 rgw daemon(s) up.... 2021-01-17 18:39:58,678 - ceph.ceph_admin - INFO - re-checking... 0.0 times left 2021-01-17 18:40:03,679 - __main__ - ERROR - Traceback (most recent call last): File "run.py", line 589, in run ceph_cluster_dict=ceph_cluster_dict, clients=clients) File "/home/pnataraj/cephci/tests/ceph_installer/test_cephadm.py", line 74, in run cephadm.add_daemons() File "/home/pnataraj/cephci/ceph/ceph_admin.py", line 214, in add_daemons self.ceph_rgws(self.cluster.get_nodes(role="rgw")) File "/home/pnataraj/cephci/ceph/ceph_admin.py", line 378, in ceph_rgws timeout=len(rgws) * self.TIMEOUT, AssertionError 2021-01-17 18:40:03,680 - __main__ - INFO - ceph_clusters_file rerun/ceph-snapshot-1610907666787 2021-01-17 18:40:03,680 - __main__ - INFO - Test <module 'test_cephadm' from '/home/pnataraj/cephci/tests/ceph_installer/test_cephadm.py'> failed 2021-01-17 18:40:03,681 - __main__ - INFO - Aborting on test failure 2021-01-17 18:40:03,681 - __main__ - INFO - All test logs located here: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1610907456925 CephCI Logs: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1610907456925/cephadm_deployment_0.log
Issue is seen with latest compose 16.0.9150 as well. Below log for reference http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-1611047955800/cephadm_deployment_0.log
Venky, can you have a look at this while Patrick is out?
Based on what JuanMi and I discussed earlier this week, as soon as we cherry-pick https://github.com/ceph/ceph/pull/38910 to pacific, we'll rebase onto that downstream and QE can re-qualify that build for the next RHCS 5.0 alpha (CLOUDBLD-3172).
JuanMi has this commit in https://github.com/ceph/ceph/pull/39021
(In reply to Douglas Fuller from comment #11) > Venky, can you have a look at this while Patrick is out? Looks like others already figured out the fix.
(In reply to Venky Shankar from comment #14) > (In reply to Douglas Fuller from comment #11) > > Venky, can you have a look at this while Patrick is out? > > Looks like others already figured out the fix. Spoke to Doug. So this needs to be cherry-picked downstream. I'll get to this later this week (probably Wednesday, 27th Jan).
Before v16.2.0, I rebase to the tip of pacific every Monday. Ideally we stabilize pacific upstream (this issue included), and then we can take this fix as we rebase.
@Juan, Unable to verify this issue with latest compose due to https://bugzilla.redhat.com/show_bug.cgi?id=1923719. Hence, We are blocked still.
bug 1923719 is resolved now, so I'm setting Fixed In Version to the current RH Ceph Storage 5 build.
Issue is not seen in the latest alpha drop [root@magna011 ubuntu]# sudo cephadm shell Inferring fsid d8a1d97c-7cbb-11eb-82af-002590fc26f6 Inferring config /var/lib/ceph/d8a1d97c-7cbb-11eb-82af-002590fc26f6/mon.magna011/config Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f WARNING: The same type, major and minor should not be used for multiple devices. WARNING: The same type, major and minor should not be used for multiple devices. WARNING: The same type, major and minor should not be used for multiple devices. [ceph: root@magna011 /]# ceph -s cluster: id: d8a1d97c-7cbb-11eb-82af-002590fc26f6 health: HEALTH_OK services: mon: 3 daemons, quorum magna011,magna014,magna013 (age 4d) mgr: magna011.vpdjxa(active, since 4d), standbys: magna014.pmkeku, magna013.evxipz osd: 12 osds: 12 up (since 4d), 12 in (since 4d) rgw: 5 daemons active (rgw_bz.TESTzone.magna014.mpujcu, rgw_bz.magna013.agpwvb, rgw_bz.magna014.oivsyb, rgw_bz.magna016.jerqyn, rgw_bz_new.TESTzone_new.magna016.usqivp) data: pools: 10 pools, 928 pgs objects: 754 objects, 166 KiB usage: 3.3 GiB used, 11 TiB / 11 TiB avail pgs: 928 active+clean io: client: 21 KiB/s rd, 0 B/s wr, 20 op/s rd, 10 op/s wr progress: Global Recovery Event (87m) [====........................] (remaining: 8h) [ceph: root@magna011 /]# Ceph orch ls [ceph: root@magna011 /]# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID alertmanager 0/1 - - count:1 <unknown> <unknown> crash 4/4 8m ago 4d * registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef grafana 0/1 - - count:1 <unknown> <unknown> mgr 3/3 8m ago 4d magna011;magna013;magna014;count:3 mix 38e52bf51cef mon 3/3 8m ago 4d magna011;magna013;magna014;count:3 mix 38e52bf51cef node-exporter 4/4 8m ago 4d * registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5 f0a5cfd22f16 osd.all-available-devices 12/12 8m ago 4d * registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef prometheus 0/1 - - count:1 <unknown> <unknown> rgw.rgw_bz 3/3 8m ago 3d magna013;magna014;magna016 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef rgw.rgw_bz.TESTzone 1/1 8m ago 112m magna014;count:1 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef rgw.rgw_bz_new.TESTzone_new 1/1 7m ago 89m magna016;count:1 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b2ca10515af7e243732ac10b43f68a0d218d9a34421ec3b807bdc33d58c5c00f 38e52bf51cef [ceph: root@magna011 /]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3294