Description of problem: Once cephadm bootstrap has been performed in a disconnected environment, cephadm fails to create a local OSD (ceph orch daemon add osd ceph1:/dev/sdc) trying to connect to an external container instead of one provided by the local registry. Version-Release number of selected component (if applicable): ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable) How reproducible: always. Steps to Reproduce: 1. environment set up Install a RHEL 8.4 OS (we used a VM) with minimal install and update it to latest packages => Linux ceph1 4.18.0-305.19.1.el8_4.x86_64 #1 SMP Tue Sep 7 07:07:31 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux Create local RPM repository for for following RPM channels : => Ansible 2.9 for Red Hat Enterprise Linux 8 x86_64 (RPMs) => Red Hat Ceph Storage Tools 5 for RHEL 8 x86_64 (RPMs) => Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs) => Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs) Create local registry and add the following containers (I used skopeo sync to perform the synchronisation of these containers between registry.redhat.io and the local registry : => rhceph/rhceph-5-rhel8 => rhceph/rhceph-5-dashboard-rhel8 => openshift4/ose-prometheus => openshift4/ose-prometheus-alertmanager => openshift4/ose-prometheus-node-exporter 2. perform the cephadm bootstrap installation Install cephadm : dnf install cephadm rpm -qi cephadm Name : cephadm Epoch : 2 Version : 16.2.0 Release : 117.el8cp Architecture: noarch Install Date: Sun 10 Oct 2021 05:04:44 PM CEST Group : Unspecified Size : 301088 License : LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and MIT Signature : RSA/SHA256, Wed 18 Aug 2021 11:17:49 PM CEST, Key ID 199e2f91fd431d51 Source RPM : ceph-16.2.0-117.el8cp.src.rpm Build Date : Wed 18 Aug 2021 08:26:50 PM CEST Build Host : x86-vm-06.build.eng.bos.redhat.com Relocations : (not relocatable) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. URL : http://ceph.com/ Summary : Utility to bootstrap Ceph clusters Description : Utility to bootstrap a Ceph cluster and manage Ceph daemons deployed with systemd and podman. Launch bootstrap (note that the local registry needed no username/password but it was mandatory to provide one) : cephadm --image registry.lab/rhceph/rhceph-5-rhel8:latest bootstrap --fsid c3a016e5-cbfb-4539-963d-75bf160f6d6a --mon-ip 10.2.41.250 --initial-dashboard-user admin --initial-dashboard-password redhat123 --dashboard-password-noupdate --no-minimize-config --registry-url registry.lab --registry-username admin --registry-password admin Verify the cluster has started : [root@ceph1 ~]# ceph status cluster: id: c3a016e5-cbfb-4539-963d-75bf160f6d6a health: HEALTH_WARN failed to probe daemons or devices OSD count 0 < osd_pool_default_size 3 services: mon: 1 daemons, quorum ceph1 (age 32m) mgr: ceph1.rmjhul(active, since 31m) osd: 0 osds: 0 up, 0 in (since 22h) data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: Modify ceph config to be able to install Ceph Dashboard components : ceph config set mgr mgr/cephadm/container_image_base registry.lab/rhceph/rhceph-5-rhel8:latest ceph config set mgr mgr/cephadm/container_image_alertmanager registry.lab/openshift4/ose-prometheus-alertmanager:v4.6 ceph config set mgr mgr/cephadm/container_image_prometheus registry.lab/openshift4/ose-prometheus:v4.6 ceph config set mgr mgr/cephadm/container_image_grafana registry.lab/rhceph/rhceph-5-dashboard-rhel8:latest ceph config set mgr mgr/cephadm/container_image_node_exporter registry.lab/openshift4/ose-prometheus-no Verify that prometheus, grafana, alertmanager and node-exporter are running fine (it seems that ceph crash has an issue and does not start) : # ceph orch ps NAME HOST STATUS REFRESHED AGE PORTS VERSION IMAGE ID CONTAINER ID alertmanager.ceph1 ceph1 running (38m) 2m ago 23h *:9093 *:9094 0.21.0 cfa7ac9e2c00 38a4bb8d163b grafana.ceph1 ceph1 running (38m) 2m ago 23h *:3000 6.7.4 09cf77100f6a c53950b23b99 mgr.ceph1.rmjhul ceph1 running (38m) 2m ago 23h *:9283 16.2.0-117.el8cp 2142b60d7974 437dbf146288 mon.ceph1 ceph1 running (38m) 2m ago 23h - 16.2.0-117.el8cp 2142b60d7974 09feeaa61cb2 node-exporter.ceph1 ceph1 running (38m) 2m ago 23h *:9100 1.0.1 4afad9935fbf fca595401f65 prometheus.ceph1 ceph1 running (38m) 2m ago 23h *:9095 2.22.2 ed805e9dbe13 bf4598bff2c5 # ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT alertmanager 1/1 2m ago 23h count:1 crash 0/1 - 23h ceph1 grafana 1/1 2m ago 23h count:1 mgr 1/1 2m ago 23h count:1 mon 1/1 2m ago 23h count:1 node-exporter 1/1 2m ago 23h ceph1 prometheus 1/1 2m ago 23h count:1 3. Let's add an OSD (local /dev/sdc) to the Ceph cluster # ceph orch daemon add osd ceph1:/dev/sdc Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1345, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 167, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 390, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731 File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/orchestrator/module.py", line 794, in _daemon_add_osd raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 224, in raise_if_exception raise e RuntimeError: cephadm exited with an error code: 1, stderr:Non-zero exit code 125 from /bin/podman run --rm --ipc=host --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint stat --init -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-pacific-devel -e NODE_NAME=ceph1 -e CEPH_USE_RANDOM_NONCE=1 docker.io/ceph/daemon-base:latest-pacific-devel -c %u %g /var/lib/ceph stat: stderr Trying to pull docker.io/ceph/daemon-base:latest-pacific-devel... stat: stderr Error: Error initializing source docker://ceph/daemon-base:latest-pacific-devel: error pinging docker registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp 3.209.182.229:443: i/o timeout Traceback (most recent call last): File "/var/lib/ceph/c3a016e5-cbfb-4539-963d-75bf160f6d6a/cephadm.d7a73386d1e46cffff151775b8e1d098069c88b89aea56cab15b079c1a1f555f", line 8140, in <module> main() File "/var/lib/ceph/c3a016e5-cbfb-4539-963d-75bf160f6d6a/cephadm.d7a73386d1e46cffff151775b8e1d098069c88b89aea56cab15b079c1a1f555f", line 8128, in main r = ctx.func(ctx) File "/var/lib/ceph/c3a016e5-cbfb-4539-963d-75bf160f6d6a/cephadm.d7a73386d1e46cffff151775b8e1d098069c88b89aea56cab15b079c1a1f555f", line 1624, in _infer_fsid return func(ctx) File "/var/lib/ceph/c3a016e5-cbfb-4539-963d-75bf160f6d6a/cephadm.d7a73386d1e46cffff151775b8e1d098069c88b89aea56cab15b079c1a1f555f", line 1708, in _infer_image return func(ctx) File "/var/lib/ceph/c3a016e5-cbfb-4539-963d-75bf160f6d6a/cephadm.d7a73386d1e46cffff151775b8e1d098069c88b89aea56cab15b079c1a1f555f", line 4518, in command_ceph_volume make_log_dir(ctx, ctx.fsid) File "/var/lib/ceph/c3a016e5-cbfb-4539-963d-75bf160f6d6a/cephadm.d7a73386d1e46cffff151775b8e1d098069c88b89aea56cab15b079c1a1f555f", line 1810, in make_log_dir uid, gid = extract_uid_gid(ctx) File "/var/lib/ceph/c3a016e5-cbfb-4539-963d-75bf160f6d6a/cephadm.d7a73386d1e46cffff151775b8e1d098069c88b89aea56cab15b079c1a1f555f", line 2514, in extract_uid_gid raise RuntimeError('uid/gid not found') RuntimeError: uid/gid not found Actual results: The OSD is never being created using cephadm Expected results: The OSD should be correctly created using the local container image rhceph/rhceph-5-rhel8 Additional info:
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
Vasishta, is this BZ related to your BZ rhbz#1935044 ?
Sebastian, No, issue being tracked in BZ 1935044 seems to be different than the one reported here
*** Bug 2038414 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5997