Description of problem: When an OSD becomes full/nearfull it's being flagged in the ceph -s output. When we check the ceph -s -f json-pretty output though it's flagged under the "health" section as nearfull, but under the osdmap section the "full" and "nearfull" values remain set to false. Version-Release number of selected component (if applicable):12.2.8-89.el7cp How reproducible: Can reproduce by filling OSD Steps to Reproduce: 1. Create files in /var/lib/ceph/osd/ceph-<osd id> to fill to full or almost full threshold 2. ceph -s -f json-pretty 3. Actual results: [root@site1-2 ceph-2]# ceph -s -f json-pretty { "fsid": "e0b84dbb-63c6-4227-bf4d-439c28b540c2", "health": { "checks": { "OSD_NEARFULL": { "severity": "HEALTH_WARN", "summary": { "message": "1 nearfull osd(s)" } }, "POOL_NEARFULL": { "severity": "HEALTH_WARN", "summary": { "message": "30 pool(s) nearfull" } }, "MANY_OBJECTS_PER_PG": { "severity": "HEALTH_WARN", "summary": { "message": "1 pools have many more objects per pg than average" } }, "POOL_APP_NOT_ENABLED": { "severity": "HEALTH_WARN", "summary": { "message": "application not enabled on 1 pool(s)" } } }, "status": "HEALTH_WARN", "summary": [ { "severity": "HEALTH_WARN", "summary": "'ceph health' JSON format has changed in luminous. If you see this your monitoring system is scraping the wrong fields. Disable this with 'mon health preluminous compat warning = false'" } ], "overall_status": "HEALTH_WARN" }, "election_epoch": 51, "quorum": [ 0 ], "quorum_names": [ "site1-1" ], "monmap": { "epoch": 2, "fsid": "e0b84dbb-63c6-4227-bf4d-439c28b540c2", "modified": "2019-01-15 12:05:42.550312", "created": "2019-01-15 12:05:42.550312", "features": { "persistent": [ "kraken", "luminous" ], "optional": [] }, "mons": [ { "rank": 0, "name": "site1-1", "addr": "10.74.131.9:6789/0", "public_addr": "10.74.131.9:6789/0" } ] }, "osdmap": { "osdmap": { "epoch": 1647, "num_osds": 9, "num_up_osds": 8, "num_in_osds": 8, "full": false, "nearfull": false, "num_remapped_pgs": 0 } }, "pgmap": { "pgs_by_state": [ { "state_name": "active+clean", "count": 368 } ], "num_pgs": 368, "num_pools": 30, "num_objects": 406438, "data_bytes": 415559695, "bytes_used": 19873714176, "bytes_avail": 64045756416, "bytes_total": 83919470592, "read_bytes_sec": 17560, "read_op_per_sec": 17, "write_op_per_sec": 0 }, "fsmap": { "epoch": 1, "by_rank": [] }, "mgrmap": { "epoch": 335, "active_gid": 425143, "active_name": "site1-1", "active_addr": "10.74.131.9:6800/11457", "available": true, "standbys": [], "modules": [ "prometheus", "status" ], "available_modules": [ "balancer", "dashboard", "influx", "localpool", "prometheus", "restful", "selftest", "status", "zabbix" ], "services": { "prometheus": "http://site1-1.home:9283/" } }, "servicemap": { "epoch": 71, "modified": "2019-07-24 15:04:27.562961", "services": { "rgw": { "daemons": { "summary": "", "site1-5": { "start_epoch": 71, "start_stamp": "2019-07-24 15:04:27.028689", "gid": 425078, "addr": "10.74.130.180:0/3735715916", "metadata": { "arch": "x86_64", "ceph_version": "ceph version 12.2.8-52.el7cp (3af3ca15b68572a357593c261f95038d02f46201) luminous (stable)", "cpu": "Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz", "distro": "rhel", "distro_description": "Employee SKU", "distro_version": "7.5", "frontend_config#0": "civetweb port=10.74.130.180:8080 num_threads=100", "frontend_type#0": "civetweb", "hostname": "site1-5", "kernel_description": "#1 SMP Wed Mar 21 18:14:51 EDT 2018", "kernel_version": "3.10.0-862.el7.x86_64", "mem_swap_kb": "421884", "mem_total_kb": "1015508", "num_handles": "1", "os": "Linux", "pid": "24993", "zone_id": "2a4ab455-30b5-42f6-bb65-54bab4cd9de4", "zone_name": "merritt", "zonegroup_id": "61fb9103-18d3-4e77-a3e4-7b6bf9ad1b0f", "zonegroup_name": "us-zonegroup" } } } } } } } Expected results: The nearfull and full lines in the osdmap section should have the same status as the health section above. "osdmap": { "osdmap": { "epoch": 1647, "num_osds": 9, "num_up_osds": 8, "num_in_osds": 8, "full": False, <<---- these lines should update. "nearfull": True, <<---- these lines should update. "num_remapped_pgs": 0 Additional info: I tested marking an OSD down to see if other sections of the OSDMap are being updated, and they are working correctly. ** If I stop an OSD it recognized that the OSD is now 'down' ** "osdmap": { "osdmap": { "epoch": 1644, "num_osds": 9, "num_up_osds": 7, "num_in_osds": 8, "full": false, "nearfull": false, "num_remapped_pgs": 0 ** I bring the osd back 'up' and it's reflected ** "osdmap": { "osdmap": { "epoch": 1647, "num_osds": 9, "num_up_osds": 8, "num_in_osds": 8, "full": false, "nearfull": false, "num_remapped_pgs": 0 } },
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0312