Bug 2262134

Summary: rook-ceph-mon pods listen to both 3300 and 6789 port
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Eran Tamir <etamir>
Component: rookAssignee: Travis Nielsen <tnielsen>
Status: CLOSED ERRATA QA Contact: Itzhak <ikave>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.14CC: asriram, kbg, mrajanna, odf-bz-bot, rzarzyns, tnielsen
Target Milestone: ---Keywords: AutomationBackLog
Target Release: ODF 4.16.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.16.0-92 Doc Type: Bug Fix
Doc Text:
.Rook-ceph-mon pods listen to both 3300 and 6789 port Previously, when a cluster is deployed with MSGRv2, the mon pods were listening unnecessarily on port 6789 for MSGR1 traffic. With this fix, the mon daemons start with flags to suppress listening on the v1 port 6789 and only listen exclusively on the v2 port 3300 thereby reducing the attack surface area.
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-07-17 13:12:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2260844    

Description Eran Tamir 2024-01-31 18:39:16 UTC
Description of problem (please be detailed as possible and provide log
snippests):

In a cluster deployed with MSGRv2, both ports 3300 and 6789 are open. 

Version of all relevant components (if applicable):

4.13,4.14

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?

No
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Simply deploy and scan for open ports.
2. or check the mon logs

Actual results:

debug 2024-01-31T18:24:36.763+0000 ffffb1ea4040  0 starting mon.a rank 0 at public addrs v2:10.111.162.89:3300/0 at bind addrs [v2:10.244.0.10:3300/0,v1:10.244.0.10:6789/0] mon_data /var/lib/ceph/mon/ceph-a fsid 6e5077ea-85bf-4afc-8869-852fc4f5e046

Expected results:
only 3300 should be available. 

Additional info:

Comment 14 Itzhak 2024-06-04 14:53:57 UTC
I didn't succeed in running the commands in https://bugzilla.redhat.com/show_bug.cgi?id=2262134#c10. 
I got this output:
$ oc rsh rook-ceph-mon-a-77cf76f8f8-4sstf 
Defaulted container "mon" out of: mon, log-collector, chown-container-data-dir (init), init-mon-fs (init)
sh-5.1# yum install net-tools -y
sh: yum: command not found
sh-5.1# dnf install net-tools -y
sh: dnf: command not found
sh-5.1# 

Anyway, I checked the mon logs, and I saw that the port was 3300, as expected.
$ oc logs rook-ceph-mon-a-77cf76f8f8-4sstf -c mon | grep "at bind addrs"
debug 2024-06-04T10:23:30.922+0000 7f9adf119b00  0 starting mon.a rank 0 at public addrs v2:172.30.207.86:3300/0 at bind addrs v2:10.128.2.35:3300/0 mon_data /var/lib/ceph/mon/ceph-a fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

$ oc logs rook-ceph-mon-b-5676d6b8b6-5r9jn -c mon | grep "at bind addrs"
debug 2024-06-04T10:24:03.082+0000 7f2850268b00  0 starting mon.b rank 0 at public addrs v2:172.30.155.101:3300/0 at bind addrs v2:10.129.2.25:3300/0 mon_data /var/lib/ceph/mon/ceph-b fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

$ oc logs rook-ceph-mon-c-d86f4d766-qftbv -c mon | grep "at bind addrs"
debug 2024-06-04T10:24:23.564+0000 7fa150c01b00  0 starting mon.c rank 1 at public addrs v2:172.30.189.25:3300/0 at bind addrs v2:10.131.0.24:3300/0 mon_data /var/lib/ceph/mon/ceph-c fsid 96146cd1-acfa-47e9-bb80-10087b75c44b


Let me know if this suffice.

Comment 15 Travis Nielsen 2024-06-04 17:14:20 UTC
The tools can be tricky to install in different container environments (even upstream vs downstream).
The mon logs do show the expected binding, so I do see that as sufficient, thanks!

Comment 16 Itzhak 2024-06-05 08:12:47 UTC
Okay, thanks for the clarification. 

The steps I did to test the BZ:
1. Deploy an IBMCloud 4.16 cluster.
2. Check the logs in the rook-ceph-mon pods and verify that we see only the 3300 port:
$ oc logs rook-ceph-mon-a-77cf76f8f8-4sstf -c mon | grep "at bind addrs"
debug 2024-06-04T10:23:30.922+0000 7f9adf119b00  0 starting mon.a rank 0 at public addrs v2:172.30.207.86:3300/0 at bind addrs v2:10.128.2.35:3300/0 mon_data /var/lib/ceph/mon/ceph-a fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

$ oc logs rook-ceph-mon-b-5676d6b8b6-5r9jn -c mon | grep "at bind addrs"
debug 2024-06-04T10:24:03.082+0000 7f2850268b00  0 starting mon.b rank 0 at public addrs v2:172.30.155.101:3300/0 at bind addrs v2:10.129.2.25:3300/0 mon_data /var/lib/ceph/mon/ceph-b fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

$ oc logs rook-ceph-mon-c-d86f4d766-qftbv -c mon | grep "at bind addrs"
debug 2024-06-04T10:24:23.564+0000 7fa150c01b00  0 starting mon.c rank 1 at public addrs v2:172.30.189.25:3300/0 at bind addrs v2:10.131.0.24:3300/0 mon_data /var/lib/ceph/mon/ceph-c fsid 96146cd1-acfa-47e9-bb80-10087b75c44b

Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/38285/.

Versions:

OC version:
Client Version: 4.10.24
Server Version: 4.16.0-0.nightly-2024-06-03-060250
Kubernetes Version: v1.29.5+87992f4


OCS version:
ocs-operator.v4.16.0-118.stable              OpenShift Container Storage        4.16.0-118.stable              Succeeded

Cluster version
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-06-03-060250   True        False         5h11m   Cluster version is 4.16.0-0.nightly-2024-06-03-060250

Rook version:
2024/06/04 15:21:14 maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
rook: v4.16.0-0.a2396a5186cc038b22154e857e0f7865e709d06a
go: go1.21.9 (Red Hat 1.21.9-1.el9_4)

Ceph version:
ceph version 18.2.1-188.el9cp (b1ae9c989e2f41dcfec0e680c11d1d9465b1db0e) reef (stable)

Comment 18 errata-xmlrpc 2024-07-17 13:12:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591