Description of problem: My customer deployed RHCS 6.1.3 using RHOSP 17.1.2 director. However, prometheus cannot start due to a conflict on IP 0.0.0.0 port 9092. ~~~ [ceph: root@overcloud-controller-0 /]# ceph health detail HEALTH_WARN Failed to place 2 daemon(s) [WRN] CEPHADM_DAEMON_PLACE_FAIL: Failed to place 2 daemon(s) Failed while placing prometheus.overcloud-controller-2 on overcloud-controller-2: cephadm exited with an error code: 1, stderr: Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus-overcloud-controller-2 /bin/podman: stderr Error: inspecting object: no such container ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus-overcloud-controller-2 Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus.overcloud-controller-2 /bin/podman: stderr Error: inspecting object: no such container ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus.overcloud-controller-2 Deploy daemon prometheus.overcloud-controller-2 ... Verifying port 9092 ... Cannot bind to IP 0.0.0.0 port 9092: [Errno 98] Address already in use ERROR: TCP Port(s) '9092' required for prometheus already in use Failed while placing prometheus.overcloud-controller-0 on overcloud-controller-0: cephadm exited with an error code: 1, stderr: Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus-overcloud-controller-0 /bin/podman: stderr Error: inspecting object: no such container ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus-overcloud-controller-0 Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus.overcloud-controller-0 /bin/podman: stderr Error: inspecting object: no such container ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus.overcloud-controller-0 Deploy daemon prometheus.overcloud-controller-0 ... Verifying port 9092 ... Cannot bind to IP 0.0.0.0 port 9092: [Errno 98] Address already in use ERROR: TCP Port(s) '9092' required for prometheus already in use ~~~ HAProxy of OpenStack is listening on X.X.X.10:9092. That's why prometheus cannot listen on on 0.0.0.0:9092. One interesting point is than only prometheus on controller-1 is listening on the X.X.X.12:9092 and working well. ~~~ controller-0/sos_commands/networking/netstat_-W_-neopa tcp 0 0 X.X.X.10:9092 0.0.0.0:* LISTEN 0 55804 9027/haproxy off (0.00/0/0) controller-1/sos_commands/networking/netstat_-W_-neopa tcp 0 0 X.X.X.12:9092 0.0.0.0:* LISTEN 0 66705 4816/prometheus off (0.00/0/0) tcp 0 0 X.X.X.10:9092 0.0.0.0:* LISTEN 0 25484 9258/haproxy off (0.00/0/0) controller-2/sos_commands/networking/netstat_-W_-neopa tcp 0 0 X.X.X.10:9092 0.0.0.0:* LISTEN 0 48449 8994/haproxy off (0.00/0/0) ~~~ I suppose all prometheus should listening on a specific IP address like X.X.X.12:9092, not 0.0.0.0:9092. I found a similar bug below: - https://bugzilla.redhat.com/show_bug.cgi?id=2246440 However, there are some differences between this case and the above bug: - On this case the issue is occurring on prometheus - On this case only two protmetheus cannot start - On this case the issue is occurring on IPv4 environment Do you think this is the same as the above bug? Are there any workaround to start prometheus on specific IP addresses? Version-Release number of selected component (if applicable): RHOSP 17.1.2 RHCS 6.1.3 (cephadm-17.2.6-167.el9cp.noarch) How reproducible: Steps to Reproduce: 1. deploy RHCS 6.1.3 using RHOSP 17.1.2 director. Actual results: Prometheus on two controller nodes cannot start due to "Cannot bind to IP 0.0.0.0 port 9092: [Errno 98] Address already in use" Expected results: Prometheus works well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925