Description of problem: OSP 17 ceph satellite deploys fail with error: FATAL | Wait for expected number of osds to be running | controller-0 Overcloud deploy failures with the satellite are specific to when ceph is used. LVM satellite deploys are successful. After overcloud deploy fails it is seen that the ceph containers are present on the undercloud: (undercloud) [stack@undercloud-0 ~]$ openstack tripleo container image list | grep ceph | docker://undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-openshift-ose-prometheus-alertmanager:v4.10 | | docker://undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-grafana:latest | | docker://undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-openshift-ose-prometheus:v4.10 | | docker://undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-openshift-ose-prometheus-node-exporter:v4.10 | | docker://undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-rhceph:5-359 | It is also seen that on a ceph node the /var/log/ceph/cephadm.log contains 404 errors saying it can't find the ceph images on the undercloud: 2023-02-07 12:16:05,207 7f2f5388e740 DEBUG stat: Trying to pull undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-rhceph@sha256:61ca086e93f6c433d6673afbe4d224b9bc51defed2cd88baaf9849a6a81940ce... 2023-02-07 12:16:05,214 7f2f5388e740 DEBUG stat: Error: initializing source docker://undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-rhceph@sha256:61ca086e93f6c433d6673afbe4d224b9bc51defed2cd88baaf9849a6a81940ce: reading manifest sha256:61ca086e93f6c433d6673afbe4d224b9bc51defed2cd88baaf9849a6a81940ce in undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-rhceph: StatusCode: 404, <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">... 2023-02-07 12:16:05,217 7f2f5388e740 INFO Non-zero exit code 125 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-rhceph@sha256:61ca086e93f6c433d6673afbe4d224b9bc51defed2cd88baaf9849a6a81940ce -e NODE_NAME=ceph-0 -e CEPH_USE_RANDOM_NONCE=1 undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-rhceph@sha256:61ca086e93f6c433d6673afbe4d224b9bc51defed2cd88baaf9849a6a81940ce -c %u %g /var/lib/ceph So its in a state where the ceph nodes can't find container images that do seem to be present on the undercloud. One other thing found while debugging is that controller-0 could pull the ceph image: [heat-admin@controller-0 ~]$ sudo podman images REPOSITORY TAG IMAGE ID CREATED SIZE undercloud-0.ctlplane.redhat.local:8787/default_organization-ceph-5-containers-rhceph 5-359 412d7e4d681e 4 weeks ago 986 MB However, controllers 1 and 2 have the same 404 in /var/log/ceph/cephadm.log that the ceph nodes have. Version-Release number of selected component (if applicable): OSP17 How reproducible: Every time Steps to Reproduce: 1. Do a satellite deployment with ceph 2. 3. Actual results: Overcloud deploy fails with error: FATAL | Wait for expected number of osds to be running | controller-0 Actual results: Overcloud successfully deploys Expected results: Overcloud successfully deploys Additional info: