Description of problem: The active MGR log message "Detected new or changed devices” is filling up every 30 mins even if there is no activity or change observed in the cluster. The requirement is to log above messages only if there is an actual change on the node. We have tried to look up in the code snippet that is generating the log message as below. ~~~ The code is here: mgr/cephadm/inventory.py: def devices_changed(self, host: str, b: List[inventory.Device]) -> bool: a = self.devices[host] if len(a) != len(b): return True aj = {d.path: d.to_json() for d in a} bj = {d.path: d.to_json() for d in b} if aj != bj: self.mgr.log.info("Detected new or changed devices on %s" % host) return True return False ~~~ And while dumping the values of both aj & bj variables, observed that the difference is only for the field "created" which is 30 mins. - This 30 mins is after which the above function is called coming from the configuration parameter mgr/cephadm/device_cache_timeout ~~~ aj: {'/dev/sdX': {'ceph_device': None, 'rejected_reasons': [], 'available': True, 'path': '/dev/sdX', 'sys_api': {'human_readable_size': '10.00 GB', 'locked': 0, 'model': 'QEMU HARDDISK', 'nr_requests': '256', 'partitions': {}, 'path': '/dev/sdX', 'removable': '0', 'rev': '2.5+', 'ro': '0', 'rotational': '1', 'sas_address': '', 'sas_device_handle': '', 'scheduler_mode': 'mq-deadline', 'sectors': 0, 'sectorsize': '512', 'size': 10737418240.0, 'support_discard': '4096', 'vendor': 'QEMU'}, 'created': '2022-10-20T01:07:32.604710Z', 'lvs': [], 'human_readable_type': 'hdd', 'device_id': 'QEMU_QEMU_HARDDISK_15bbXXXX-XXXX-40e8-XXXX-946a21dXXXXXX', 'lsm_data': {}}} ------------------------------------------------- bj: {'/dev/sdX': {'ceph_device': None, 'rejected_reasons': [], 'available': True, 'path': '/dev/sdX', 'sys_api': {'human_readable_size': '10.00 GB', 'locked': 0, 'model': 'QEMU HARDDISK', 'nr_requests': '256', 'partitions': {}, 'path': '/dev/sdX', 'removable': '0', 'rev': '2.5+', 'ro': '0', 'rotational': '1', 'sas_address': '', 'sas_device_handle': '', 'scheduler_mode': 'mq-deadline', 'sectors': 0, 'sectorsize': '512', 'size': 10737418240.0, 'support_discard': '4096', 'vendor': 'QEMU'}, 'created': '2022-10-20T01:38:30.355023Z', 'lvs': [], 'human_readable_type': 'hdd', 'device_id': 'QEMU_QEMU_HARDDISK_15bbXXXX-XXXX-40e8-XXXX-946a21dXXXXX', 'lsm_data': {}}} ~~~ Additionally, we tried to look further in the code, and observed that the value for the field "created" gets updated or applied only once at the below location in code. - https://github.com/ceph/ceph/blob/main/src/python-common/ceph/deployment/inventory.py#L74-L77 Version-Release number of selected component (if applicable): RHCS 5.2 How reproducible: Always Steps to Reproduce: 1. Simply deploy or upgrade an RHCS 5 envrionment 2. Monitor the active mgr logs for the message "Detected new or changed devices" Actual results: "Detected new or changed devices" log message is filling up even if there is no actual change. Expected results: Log the above message only if there is an actual change on the node.
While waiting for a patch for Ceph 5.2, is there any way our customers can silence this warning message so that their logs/prometheus are not inundated with these messages?
(In reply to Matthew Secaur from comment #5) > While waiting for a patch for Ceph 5.2, is there any way our customers can > silence this warning message so that their logs/prometheus are not inundated > with these messages? These logs are info level, so I think you could just set the cephadm log to cluster level to warning (ceph config set mgr mgr/cephadm/log_to_cluster_level warning). You'd miss a handful of info log messages maybe, but typically anything important enough that we really need it to be seen would be at warning or error level.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3623