Bug 2356354
Summary: | Skip port conflict check in case of RGW | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | John Fulton <johfulto> |
Component: | Cephadm | Assignee: | Adam King <adking> |
Status: | POST --- | QA Contact: | Sayalee <saraut> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 5.3 | CC: | cephqe-warriors, mcaldeir, mobisht |
Target Milestone: | --- | ||
Target Release: | 5.3z9 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
John Fulton
2025-03-31 21:33:25 UTC
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity. How does this impact OpenStack customers who are running Ceph RGW on their OpenStack control plane? In OpenStack Ceph is often running on the controller nodes, which are also running haproxy. The controller nodes have multiple interfaces, with haproxy listening on the interface used for API communications. Ceph services should only be listening on the storage (ceph public) or storage management (ceph private) networks. However, it appears that there are Ceph services that try to listen on 0.0.0.0, which when they do, run into port conflicts with the ports haproxy is listening on on the OpenStack API network. In this specific situation, the RGW spec is configured to have RGW come up on the ceph network: --- [ceph: root@host42 /]# ceph orch ls --export (...) service_type: rgw service_id: host42 service_name: rgw.host42 placement: count_per_host: 1 hosts: - host42 networks: - 10.1.42.0/24 - 10.0.42.0/24 - 10.2.42.0/24 extra_container_args: - -v - /etc/pki/ca-trust:/etc/pki/ca-trust:ro spec: rgw_frontend_port: 8080 --- service_type: rgw service_id: host43 service_name: rgw.host43 placement: count_per_host: 1 hosts: - host43 networks: - 10.1.42.0/24 - 10.0.42.0/24 - 10.2.42.0/24 extra_container_args: - -v - /etc/pki/ca-trust:/etc/pki/ca-trust:ro spec: rgw_frontend_port: 8080 --- service_type: rgw service_id: host44 service_name: rgw.host44 placement: count_per_host: 1 hosts: - host44 networks: - 10.1.42.0/24 - 10.0.42.0/24 - 10.2.42.0/24 extra_container_args: - -v - /etc/pki/ca-trust:/etc/pki/ca-trust:ro spec: rgw_frontend_port: 8080 --- However, error messages in 'ceph health details' indicate that it is attempting to bind on *:8080, which conflicts with haproxy: --- [ceph: root@host42 /]# ceph health detail HEALTH_WARN Failed to place 1 daemon(s); 3 failed cephadm daemon(s) [WRN] CEPHADM_DAEMON_PLACE_FAIL: Failed to place 1 daemon(s) Failed while placing rgw.host42.host42.foo on host42: cephadm exited with an error code: 1, stderr:Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-5ffc7906-2722-4602-9478-e2fe6ad3ff49-rgw-host42-host42-foo /bin/podman: stderr Error: error inspecting object: no such container ceph-5ffc7906-2722-4602-9478-e2fe6ad3ff49-rgw-host42-host42-foo Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-5ffc7906-2722-4602-9478-e2fe6ad3ff49-rgw.host42.host42.foo /bin/podman: stderr Error: error inspecting object: no such container ceph-5ffc7906-2722-4602-9478-e2fe6ad3ff49-rgw.host42.host42.foo Deploy daemon rgw.host42.host42.foo ... Verifying port 8080 ... Cannot bind to IP 0.0.0.0 port 8080: [Errno 98] Address already in use ERROR: TCP Port(s) '8080' required for rgw already in use --- In order to work around this issue we temporarily stopped haproxy using 'pcs resource disable haproxy-bundle'. Once haproxy was stopped RGW started up on its own, and was bound to the expected network instead of 0.0.0.0: --- [ceph: root@host42 /]# ceph orch ps (...) rgw.host42.host42.qfeedh host42 10.0.42.20:8080 running (62s) 58s ago 62s 60.1M - 16.2.10-275.el8cp d7a74ab527fa b60d550cdc91 rgw.host43.host43.ykpwef host43 10.0.42.21:8080 running (65s) 58s ago 64s 58.9M - 16.2.10-275.el8cp d7a74ab527fa ddea7b33bfc9 rgw.host44.host44.tsepgo host44 10.0.42.22:8080 running (56s) 51s ago 55s 62.2M - 16.2.10-275.el8cp d7a74ab527fa c1e87e8744ce --- It appears that there is something checking first for availability on 0.0.0.0:8080 as RGW is coming up before it actually binds to the network(s) specified in the spec. If the check fails the daemon does not start. This BZ is to track removing that unnecessary check since RGW can start on the IP specified in the spec. |