Bug 1267636 - [RFE] Add config option to cause the mons to complain to the cluster log when any client older than the configured version tries to authenticate
Summary: [RFE] Add config option to cause the mons to complain to the cluster log when...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.0
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: rc
: 3.0
Assignee: Kefu Chai
QA Contact: Vidushi Mishra
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1258382 1494421
TreeView+ depends on / blocked
 
Reported: 2015-09-30 14:33 UTC by Mike Hackett
Modified: 2019-12-16 04:59 UTC (History)
16 users (show)

Fixed In Version: RHEL: ceph-12.1.4-1.el7cp Ubuntu: ceph_12.1.4-2redhat1xenial
Doc Type: Enhancement
Doc Text:
.New ways to identify client versions This update adds the following features that help with identifying client versions to determine which clients use an old version of Red Hat Ceph Storage. * The `ceph osd set-require-min-compat-client` command adds the ability to set a minimum required release for clients to prevent new connections from older clients. By default it is set to `jewel`. To view its value, use the `ceph osd dump` command. * The `ceph features` command that reports the total number of clients and daemons and their features and releases. * If the debugging level for Monitors is set to `10` (`debug mon = 10`), addresses and features of connecting and disconnecting clients are logged to log file on a local file system.
Clone Of:
Environment:
Last Closed: 2017-12-05 23:29:38 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 13301 0 None None None Never
Red Hat Knowledge Base (Solution) 2476101 0 None None None 2016-07-29 09:59:58 UTC
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Comment 2 Mike Hackett 2015-09-30 14:55:09 UTC
Upstream tracker: #13301

Comment 4 Vikhyat Umrao 2016-07-19 08:05:41 UTC
- One more enhancement would be great here.
- As we recommend running both ceph server side daemons and clients to be in same version and if they are not in same version log a warning in ceph.log(cluster logs).

- It will help a lot in case of NOVA/KVM client instances upgrade procedure. For example if nova instances(qemu-kvm) processes are running firefly and ceph cluster daemons are running hammer.

- Now we upgrade our nova-compute nodes to hammer but did not stop and start nova instances(qemu-kvm)processes (needs a down time) or did not live-migrate(do not need downtime)to other nova-computes.

- This causes nova qemu-kvm in memory process still running firefly and server code running hammer code. After that if we will change the tunables to hammer (and bucket algorithm to straw2) all instacnes which are still running firefly code in memory will crash in instance logs:

~~~~

terminate called after throwing an instance of 'ceph::buffer::malformed_input'
  what():  buffer::malformed_input: *unsupported bucket algorithm: 5*

~~~~

- This feature will help to avoid such mistakes.
- As if we see version mismatch warning after upgrading the clients also in cluster logs(ceph.log) we will come to know we still needs to either stop and start instances or live migrate them.

Comment 6 Samuel Just 2016-07-19 13:29:52 UTC
Hmm, we probably shouldn't store information for every connected client on the mon.  However, adding a config that causes a warning when a client with an older version than the config connects would be plausible.

Comment 14 Samuel Just 2016-09-21 20:36:19 UTC
The log line should include the ip of the connecting client.

Comment 15 Kefu Chai 2016-09-22 11:08:59 UTC
there two places that we can/should do the auditing:

1. when messenger rejects a connect message from client with CEPH_MSGR_TAG_FEATURES, this happens in the messenger stack, 
  - in Pipe::accept() // simple msgr
  - in AsyncConnection::handle_connect_msg() // async msgr
1. when AuthMonitor authorizes the client and starts a session
  - in AuthMonitor::prep_auth()

we can print out the client's address and its feature bits in log if the peer's feature bits is smaller than that of local at the two places above.

but the features bit is but a bitmap. to make the report human readable, we need to have a lookup table from the bits to its corresponding version. but it would be a rough mapping, and we might need to use a offline tool to do this translation.

1. we add feature bits in new point releases, 
2. feature bits can be reused once it's deprecated and not checked by server side anymore.


case 1. if server have a feature of 0b111, while client has 0b101, and it has all required features for the assigned policy. and feature 0b010 is deprecated, and reused. monitor set it to 1 as it supports this new feature. so in this case, it's expected. 

questions:

- but what if client has 0b111, but its 0b010 predates the time when the 0b010 marked deprecated?

case 2. client is rejected by messenger.

questions:

- we don't have access to clog. (Messenger does not have a LogClient, which in turn uses Messenger actually).

so, it's not hard to have a log (not clog) implementation. and we need to collect log files from all monitors to find out the distribution of different versions (feature bits) of clients connected (rejected) in the cluster.

Comment 16 Kefu Chai 2016-09-22 11:11:27 UTC
but as the version string is not part of our messaging protocol, i don't think it's worthy to add it to MAuth. maybe the version derived from feature bits would suffice.

Comment 17 Kefu Chai 2016-09-22 11:16:39 UTC
and maybe we can simply ignore the clients rejected by msgr? as they are pretty visible (not functional).

Comment 24 Josh Durgin 2017-07-19 00:45:21 UTC
Implemented as 'ceph features' to report on connect clients clients and 'ceph osd set-require-min-compat-client $releasename' to guard against earlier clients being able to connect.

https://github.com/ceph/ceph/pull/15371

Comment 26 Josh Durgin 2017-08-21 16:55:15 UTC
This is what's merged in luminous:

1) the ability to set a minimum required release for clients, to prevent new connections from older clients ('ceph osd set-require-min-compat-client jewel') - this defaults to jewel in new clusters, and can be viewed as part of 'ceph osd dump'.

2) 'ceph features' to report the total number of clients and daemons at given featuresets and releases (e.g.:

{
"mon": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 3
}
},
"mds": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 3
}
},
"osd": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 3
}
},
"client": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 1
}
}
}

3) logging at debug mon = 10 in the monitor logs for connecting/disconnecting client address and features

Since this bug is pretty cluttered already, I'll close it referencing the relevant PRs:

https://github.com/ceph/ceph/pull/15371
https://github.com/ceph/ceph/pull/16128

Further changes can be handled in new tickets.

Comment 29 Kefu Chai 2017-10-30 13:28:50 UTC
@Bara, sorry for the latency.

> * If the debugging level for Monitors is set to `10` (`debug mon = 10`), addresses and features of connecting and disconnecting clients are logged to the main cluster log.

this is not accurate. we have a "cluster" log which is persisted by monitor. and can be watched using "ceph -w", in addition to "cluster" log, we can also watch log messages in "audit" channel.

but the "log" mentioned by Josh in the context of

> logging at debug mon = 10 in the monitor logs for connecting/disconnecting client address and features

are normally log messages sent to file in local file system[0].


--
[0] but this is configurable. because if log_to_syslog is enabled, and if syslog is configured to send log to remote syslog server, well, the log will be written to a remote server. this might be out of the scope of this bz. i put this here just for completeness.

Comment 37 errata-xmlrpc 2017-12-05 23:29:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.