2264812 – prometheus listens on 0.0.0.0:9092 and conflicts with HAProxy on director-deployed ceph env

Bug 2264812 - prometheus listens on 0.0.0.0:9092 and conflicts with HAProxy on director-deployed ceph env

Summary: prometheus listens on 0.0.0.0:9092 and conflicts with HAProxy on director-dep...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Cephadm
Sub Component:
Version:	6.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	7.1
Assignee:	Adam King
QA Contact:	Mohit Bisht
Docs Contact:	Akash Raj
URL:
Whiteboard:
Depends On:
Blocks:	2267614 2298578 2298579
TreeView+	depends on / blocked

Reported:	2024-02-19 04:54 UTC by yatanaka
Modified:	2024-10-01 03:42 UTC (History)
CC List:	6 users (show)
Fixed In Version:	ceph-18.2.1-87.el9cp
Doc Type:	Enhancement
Doc Text:	.Prometheus now binds to an IP within a specific network on a host, rather that always binding to 0.0.0.0 With this enhancement, using a Prometheus specification file that includes both the networks section with the network that Prometheus binds to an IP on, and `only_bind_port_on_networks: true` included in the "spec" section of the specification, Cephadm configures the Prometheus daemon to bind to an IP within that network rather than 0.0.0.0. This enables users to use the same port that Prometheus uses for another service but on a different IP on the host. If it is a specification update that does not cause them all to be moved, `ceph orch redeploy prometheus` can be run to pick up the changes to the settings. Prometheus specification file: ---- service_type: prometheus service_name: prometheus placement: count: 1 networks: - 10.0.208.0/22 spec: only_bind_port_on_networks: true ----
Clone Of:
Environment:
Last Closed:	2024-06-13 14:26:55 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-8327	0	None	None	None	2024-02-19 04:55:02 UTC
Red Hat Product Errata	RHSA-2024:3925	0	None	None	None	2024-06-13 14:27:02 UTC

Description yatanaka 2024-02-19 04:54:23 UTC

Description of problem:

My customer deployed RHCS 6.1.3 using RHOSP 17.1.2 director.

However, prometheus cannot start due to a conflict on IP 0.0.0.0 port 9092.

~~~
[ceph: root@overcloud-controller-0 /]# ceph health detail
HEALTH_WARN Failed to place 2 daemon(s)
[WRN] CEPHADM_DAEMON_PLACE_FAIL: Failed to place 2 daemon(s)
    Failed while placing prometheus.overcloud-controller-2 on overcloud-controller-2: cephadm exited with an error code: 1, stderr: Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus-overcloud-controller-2
/bin/podman: stderr Error: inspecting object: no such container ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus-overcloud-controller-2
Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus.overcloud-controller-2
/bin/podman: stderr Error: inspecting object: no such container ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus.overcloud-controller-2
Deploy daemon prometheus.overcloud-controller-2 ...
Verifying port 9092 ...
Cannot bind to IP 0.0.0.0 port 9092: [Errno 98] Address already in use
ERROR: TCP Port(s) '9092' required for prometheus already in use
    Failed while placing prometheus.overcloud-controller-0 on overcloud-controller-0: cephadm exited with an error code: 1, stderr: Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus-overcloud-controller-0
/bin/podman: stderr Error: inspecting object: no such container ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus-overcloud-controller-0
Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus.overcloud-controller-0
/bin/podman: stderr Error: inspecting object: no such container ceph-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX-prometheus.overcloud-controller-0
Deploy daemon prometheus.overcloud-controller-0 ...
Verifying port 9092 ...
Cannot bind to IP 0.0.0.0 port 9092: [Errno 98] Address already in use
ERROR: TCP Port(s) '9092' required for prometheus already in use
~~~

HAProxy of OpenStack is listening on X.X.X.10:9092.
That's why prometheus cannot listen on on 0.0.0.0:9092.
One interesting point is than only prometheus on controller-1 is listening on the X.X.X.12:9092 and working well.

~~~
controller-0/sos_commands/networking/netstat_-W_-neopa
tcp        0      0 X.X.X.10:9092     0.0.0.0:*               LISTEN      0          55804      9027/haproxy         off (0.00/0/0)

controller-1/sos_commands/networking/netstat_-W_-neopa
tcp        0      0 X.X.X.12:9092     0.0.0.0:*               LISTEN      0          66705      4816/prometheus      off (0.00/0/0)
tcp        0      0 X.X.X.10:9092     0.0.0.0:*               LISTEN      0          25484      9258/haproxy         off (0.00/0/0)

controller-2/sos_commands/networking/netstat_-W_-neopa
tcp        0      0 X.X.X.10:9092     0.0.0.0:*               LISTEN      0          48449      8994/haproxy         off (0.00/0/0)
~~~

I suppose all prometheus should listening on a specific IP address like X.X.X.12:9092, not 0.0.0.0:9092.

I found a similar bug below:

- https://bugzilla.redhat.com/show_bug.cgi?id=2246440

However, there are some differences between this case and the above bug:
- On this case the issue is occurring on prometheus
- On this case only two protmetheus cannot start
- On this case the issue is occurring on IPv4 environment

Do you think this is the same as the above bug?
Are there any workaround to start prometheus on specific IP addresses?


Version-Release number of selected component (if applicable):
RHOSP 17.1.2
RHCS 6.1.3 (cephadm-17.2.6-167.el9cp.noarch)

How reproducible:
Steps to Reproduce:
1. deploy RHCS 6.1.3 using RHOSP 17.1.2 director.


Actual results:
Prometheus on two controller nodes cannot start due to "Cannot bind to IP 0.0.0.0 port 9092: [Errno 98] Address already in use"

Expected results:
Prometheus works well.

Comment 11 errata-xmlrpc 2024-06-13 14:26:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925

Note You need to log in before you can comment on or make changes to this bug.