Bug 2262134 - rook-ceph-mon pods listen to both 3300 and 6789 port
Summary: rook-ceph-mon pods listen to both 3300 and 6789 port
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.14
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ODF 4.16.0
Assignee: Travis Nielsen
QA Contact: Itzhak
URL:
Whiteboard:
Depends On:
Blocks: 2260844
TreeView+ depends on / blocked
 
Reported: 2024-01-31 18:39 UTC by Eran Tamir
Modified: 2024-07-17 13:13 UTC (History)
6 users (show)

Fixed In Version: 4.16.0-92
Doc Type: Bug Fix
Doc Text:
.Rook-ceph-mon pods listen to both 3300 and 6789 port Previously, when a cluster is deployed with MSGRv2, the mon pods were listening unnecessarily on port 6789 for MSGR1 traffic. With this fix, the mon daemons start with flags to suppress listening on the v1 port 6789 and only listen exclusively on the v2 port 3300 thereby reducing the attack surface area.
Clone Of:
Environment:
Last Closed: 2024-07-17 13:12:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 640 0 None open Bug 2262134: mon: Disable v1 port if msgr2 is required 2024-05-01 22:07:12 UTC
Github rook rook pull 14147 0 None open mon: Disable the msgr v1 port if msgr2 is required 2024-05-01 18:31:51 UTC
Red Hat Product Errata RHSA-2024:4591 0 None None None 2024-07-17 13:13:02 UTC

Description Eran Tamir 2024-01-31 18:39:16 UTC
Description of problem (please be detailed as possible and provide log
snippests):

In a cluster deployed with MSGRv2, both ports 3300 and 6789 are open. 

Version of all relevant components (if applicable):

4.13,4.14

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?

No
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Simply deploy and scan for open ports.
2. or check the mon logs

Actual results:

debug 2024-01-31T18:24:36.763+0000 ffffb1ea4040  0 starting mon.a rank 0 at public addrs v2:10.111.162.89:3300/0 at bind addrs [v2:10.244.0.10:3300/0,v1:10.244.0.10:6789/0] mon_data /var/lib/ceph/mon/ceph-a fsid 6e5077ea-85bf-4afc-8869-852fc4f5e046

Expected results:
only 3300 should be available. 

Additional info:

Comment 14 Itzhak 2024-06-04 14:53:57 UTC
I didn't succeed in running the commands in https://bugzilla.redhat.com/show_bug.cgi?id=2262134#c10. 
I got this output:
$ oc rsh rook-ceph-mon-a-77cf76f8f8-4sstf 
Defaulted container "mon" out of: mon, log-collector, chown-container-data-dir (init), init-mon-fs (init)
sh-5.1# yum install net-tools -y
sh: yum: command not found
sh-5.1# dnf install net-tools -y
sh: dnf: command not found
sh-5.1# 

Anyway, I checked the mon logs, and I saw that the port was 3300, as expected.
$ oc logs rook-ceph-mon-a-77cf76f8f8-4sstf -c mon | grep "at bind addrs"
debug 2024-06-04T10:23:30.922+0000 7f9adf119b00  0 starting mon.a rank 0 at public addrs v2:172.30.207.86:3300/0 at bind addrs v2:10.128.2.35:3300/0 mon_data /var/lib/ceph/mon/ceph-a fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

$ oc logs rook-ceph-mon-b-5676d6b8b6-5r9jn -c mon | grep "at bind addrs"
debug 2024-06-04T10:24:03.082+0000 7f2850268b00  0 starting mon.b rank 0 at public addrs v2:172.30.155.101:3300/0 at bind addrs v2:10.129.2.25:3300/0 mon_data /var/lib/ceph/mon/ceph-b fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

$ oc logs rook-ceph-mon-c-d86f4d766-qftbv -c mon | grep "at bind addrs"
debug 2024-06-04T10:24:23.564+0000 7fa150c01b00  0 starting mon.c rank 1 at public addrs v2:172.30.189.25:3300/0 at bind addrs v2:10.131.0.24:3300/0 mon_data /var/lib/ceph/mon/ceph-c fsid 96146cd1-acfa-47e9-bb80-10087b75c44b


Let me know if this suffice.

Comment 15 Travis Nielsen 2024-06-04 17:14:20 UTC
The tools can be tricky to install in different container environments (even upstream vs downstream).
The mon logs do show the expected binding, so I do see that as sufficient, thanks!

Comment 16 Itzhak 2024-06-05 08:12:47 UTC
Okay, thanks for the clarification. 

The steps I did to test the BZ:
1. Deploy an IBMCloud 4.16 cluster.
2. Check the logs in the rook-ceph-mon pods and verify that we see only the 3300 port:
$ oc logs rook-ceph-mon-a-77cf76f8f8-4sstf -c mon | grep "at bind addrs"
debug 2024-06-04T10:23:30.922+0000 7f9adf119b00  0 starting mon.a rank 0 at public addrs v2:172.30.207.86:3300/0 at bind addrs v2:10.128.2.35:3300/0 mon_data /var/lib/ceph/mon/ceph-a fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

$ oc logs rook-ceph-mon-b-5676d6b8b6-5r9jn -c mon | grep "at bind addrs"
debug 2024-06-04T10:24:03.082+0000 7f2850268b00  0 starting mon.b rank 0 at public addrs v2:172.30.155.101:3300/0 at bind addrs v2:10.129.2.25:3300/0 mon_data /var/lib/ceph/mon/ceph-b fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

$ oc logs rook-ceph-mon-c-d86f4d766-qftbv -c mon | grep "at bind addrs"
debug 2024-06-04T10:24:23.564+0000 7fa150c01b00  0 starting mon.c rank 1 at public addrs v2:172.30.189.25:3300/0 at bind addrs v2:10.131.0.24:3300/0 mon_data /var/lib/ceph/mon/ceph-c fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/38285/.

Versions:

OC version:
Client Version: 4.10.24
Server Version: 4.16.0-0.nightly-2024-06-03-060250
Kubernetes Version: v1.29.5+87992f4


OCS version:
ocs-operator.v4.16.0-118.stable              OpenShift Container Storage        4.16.0-118.stable              Succeeded

Cluster version
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-06-03-060250   True        False         5h11m   Cluster version is 4.16.0-0.nightly-2024-06-03-060250

Rook version:
2024/06/04 15:21:14 maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
rook: v4.16.0-0.a2396a5186cc038b22154e857e0f7865e709d06a
go: go1.21.9 (Red Hat 1.21.9-1.el9_4)

Ceph version:
ceph version 18.2.1-188.el9cp (b1ae9c989e2f41dcfec0e680c11d1d9465b1db0e) reef (stable)

Comment 18 errata-xmlrpc 2024-07-17 13:12:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591


Note You need to log in before you can comment on or make changes to this bug.