Bug 2108970

Summary: Monitor does not honor the setting of 'ms bind msgr1 = false' and binds to v1 port anyway
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Brad Hubbard <bhubbard>
Component: RADOSAssignee: Brad Hubbard <bhubbard>
Status: NEW --- QA Contact: Pawan <pdhiran>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: akupczyk, amathuri, bhubbard, ceph-eng-bugs, cephqe-warriors, choffman, jeff.a.smith, ksirivad, lflores, nojha, pdhange, radhika.chirra, rfriedma, rzarzyns, skanta, sseshasa, vumrao
Target Milestone: ---   
Target Release: 7.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brad Hubbard 2022-07-20 07:33:55 UTC
Description of problem:

The other daemons will not bind to a v1 port if 'ms bind msgr1 = false' however the mon does not honor this config setting and binds to a v1 port regardless. 


Version-Release number of selected component (if applicable):
16.2.0-143.el8cp (but current main branch HEAD exhibits the same behavior)

How reproducible:
100%

Steps to Reproduce:
1. Set 'ms bind msgr1 = false' in ceph.conf
2.
3.

Actual results:

tcp        0      0 localhost:40991         0.0.0.0:*               LISTEN      1543641/ceph-mon    
tcp        0      0 localhost:40992         0.0.0.0:*               LISTEN      1543641/ceph-mon 

Expected results:

tcp        0      0 localhost:40991         0.0.0.0:*               LISTEN      1543641/ceph-mon

Additional info:

Comment 1 Brad Hubbard 2022-08-02 03:32:15 UTC
This is achievable on a new cluster by making sure that the initial monmap is
created with only a v2 address as in if something similar to the following were
specified in the initial ceph.conf or if "--addv a v2:127.0.0.1:40564" were
specified during the creation of the initial monmap.

mon host =  v2:127.0.0.1:40261

Of course it would have a real world address:port in a real world scenario.

In the case of an already established cluster it's currently a little trickier
as ms_bind_msgr1 is essentially ignored by the monitor if it finds its own
address in the current monmap at startup [0]. This can be worked around by
editing the existing monmap and reinjecting it back into the cluster and then
restarting the monitor something like this for the monitor mon.a.

# ceph mon getmap -o monmap.bin
got monmap epoch 1
# monmaptool --print monmap.bin
monmaptool: monmap file monmap.bin
epoch 1
fsid 42d25439-4eea-4014-8720-74d7a48ceeef
last_changed 2022-08-01T12:41:02.534665+1000
created 2022-08-01T12:41:02.534665+1000
min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:127.0.0.1:40259/0,v1:127.0.0.1:40260/0] mon.a
# monmaptool --rm a monmap.bin
monmaptool: monmap file monmap.bin
monmaptool: removing a
monmaptool: writing epoch 1 to monmap.bin (0 monitors)
# monmaptool --addv a [v2:127.0.0.1:40259/0] monmap.bin
monmaptool: monmap file monmap.bin
monmaptool: writing epoch 1 to monmap.bin (1 monitors)
# monmaptool --print monmap.bin
monmaptool: monmap file monmap.bin
epoch 1
fsid 42d25439-4eea-4014-8720-74d7a48ceeef
last_changed 2022-08-01T12:41:02.534665+1000
created 2022-08-01T12:41:02.534665+1000
min_mon_release 17 (quincy)
election_strategy: 1
0: v2:127.0.0.1:40259/0 mon.a
# ceph-mon -i a --inject-monmap monmap.bin
# netstat -tlpn|grep ceph-mon
tcp        0      0 127.0.0.1:40485         0.0.0.0:*               LISTEN      597319/bin/ceph-mon
tcp        0      0 127.0.0.1:40484         0.0.0.0:*               LISTEN      597319/bin/ceph-mon
# init-ceph stop mon
# ceph-mon -i a --inject-monmap monmap.bin
# init-ceph start mon
# netstat -tlpn|grep ceph-mon
tcp        0      0 127.0.0.1:40259         0.0.0.0:*               LISTEN      598525/bin/ceph-mon

So this is definitely achievable but I'm going to follow up with some of my
colleaugues to see if they can think of an easier way or if we need to add some
facility to make this easier. I'll update within a day or two.

[0] https://github.com/ceph/ceph/blob/e70ca62d9cfd5c5891d4327739ba6fc4159b5c03/src/ceph_mon.cc#L747-L770

Comment 2 Brad Hubbard 2022-08-03 23:07:26 UTC
Could you try creating the cluster with a command like the following using a
custom ceph.conf [0] and see if that resolves the issue for you?

# cephadm bootstrap --config initial-ceph.conf

[0] https://docs.ceph.com/en/quincy/cephadm/install/#further-information-about-cephadm-bootstrap

Comment 3 Radhika Chirra 2022-08-08 22:46:29 UTC
We haven't had a chance to try the initial ceph conf method. And the monmap method seems to be an intrusive way to get this desired behavior of disabling v2. 

Could this be fixed to allow monitors to behave similar to other deployed daemons by using the ms_bind_msgr1 setting?

Comment 4 Brad Hubbard 2022-08-15 03:42:36 UTC
Looking into the feasibility of this.