Description of problem: Overcloud deployment fails If ceph dashboard is being deployed with using cephadm. The exact error is: FATAL | Run pacemaker restart if the config file for the service changed | central-controller-0 | error={"changed": false, "error": "Failed running command", "msg": "Error running /var/lib/container-config-scripts/pacemaker_restart_bundle.sh haproxy haproxy-bundle haproxy-bundle Started. rc: 1, stdout: , stderr: Waiting for the cluster to apply configuration changes (timeout: 600 seconds)...\nresource 'haproxy-bundle' is not running on any node\nWaiting for the cluster to apply configuration changes (timeout: 600 seconds)...\nError: resource 'haproxy-bundle' is not running on any node\nError: Errors have occurred, therefore pcs is unable to continue\n"} which happens because haproxy fails to bind socket for ceph dashboard proxy: Starting proxy ceph_dashboard: cannot bind socket (Address already in use) [192.168.24.82:8444] Because ceph-mgr is binded on that socket for all interfaces: # netstat -putna | grep 8444 tcp6 0 0 :::8444 :::* LISTEN 6444/ceph-mgr ceph-mgr should not be listening on all interfaces (which is default unless specifically configured) but specific IP address should be set for each ceph-mgr host. The specific IP addresses where ceph dashboard module should be listening are actually set for each host in ceph cluster: # ceph config dump | grep server_addr mgr advanced mgr/dashboard/central-controller-0-pwaxfm/server_addr 172.23.1.11 * mgr advanced mgr/dashboard/central-controller-1-iczifz/server_addr 172.23.1.42 * mgr advanced mgr/dashboard/central-controller-2-qxfada/server_addr 172.23.1.25 The problem is that the name of the hosts do not match the name with which the ceph-mgr was started: # ps ax| grep ceph-mgr 6442 ? Ss 0:00 /dev/init -- /usr/bin/ceph-mgr -n mgr.central-controller-0.pwaxfm -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug The difference is in last "-" versus ".": central-controller-0-pwaxfm vs. central-controller-0.pwaxfm It comes probably from these code: https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_cephadm/tasks/dashboard/configure_dashboard_backends.yml#L17-L33 where the name of the ceph-mgr node is being taken from container name which has dash and not a dot but ceph-mgr process is started with name including a dot. Version-Release number of selected component (if applicable): tripleo-ansible-3.3.1-0.20220407091528.0bc2994.el9ost.noarch How reproducible: Always
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543