Description of problem: The rados df cli command shows number of degraded objects greater than the total number of objects when some of the PGs are degraded Version-Release number of selected component (if applicable): Mon rpms: rpm -qa|grep ceph ceph-selinux-10.2.2-5.el7cp.x86_64 python-cephfs-10.2.2-5.el7cp.x86_64 ceph-common-10.2.2-5.el7cp.x86_64 ceph-base-10.2.2-5.el7cp.x86_64 libcephfs1-10.2.2-5.el7cp.x86_64 ceph-mon-10.2.2-5.el7cp.x86_64 OSD rpms: rpm -qa|grep ceph ceph-selinux-10.2.2-9.el7cp.x86_64 ceph-common-10.2.2-9.el7cp.x86_64 ceph-base-10.2.2-9.el7cp.x86_64 libcephfs1-10.2.2-9.el7cp.x86_64 python-cephfs-10.2.2-9.el7cp.x86_64 ceph-osd-10.2.2-9.el7cp.x86_64 How reproducible: Frequently Steps to Reproduce: 1. prepare some pool and add there several objects e.g. 4 objects 2. remove some OSDs so there is less OSDs than the pool requires 3. create another object Actual results: degraded object count > total object count Expected results: Degraded object count should not be more than total object count. Additional info: rados df --cluster c1 --format json {"pools":[{"name":"p1","id":"1","size_bytes":"114","size_kb":"1","num_objects":"4","num_object_clones":"0","num_object_copies":"12","num_objects_missing_on_primary":"0","num_objects_unfound":"0","num_objects_degraded":"8","read_ops":"5483","read_bytes":"4009984","write_ops":"8","write_bytes":"2048"},{"name":"p2","id":"2","size_bytes":"0","size_kb":"0","num_objects":"1","num_object_clones":"0","num_object_copies":"3","num_objects_missing_on_primary":"0","num_objects_unfound":"0","num_objects_degraded":"2","read_ops":"0","read_bytes":"0","write_ops":"2","write_bytes":"0"}],"total_objects":"5","total_used":"74284","total_avail":"31360428","total_space":"31434712"} ceph -s --cluster c1 cluster ef7329fe-01e5-4b60-8427-71112db95c9d health HEALTH_WARN 256 pgs degraded 256 pgs stuck unclean 256 pgs undersized recovery 10/15 objects degraded (66.667%) monmap e1: 1 mons at {dhcp41-235=10.70.41.235:6789/0} election epoch 3, quorum 0 dhcp41-235 osdmap e37: 2 osds: 2 up, 2 in flags sortbitwise pgmap v700: 256 pgs, 2 pools, 114 bytes data, 5 objects 74284 kB used, 30625 MB / 30697 MB avail 10/15 objects degraded (66.667%) 256 active+undersized+degraded
The builds listed above are pretty old. Please confirm that this is still happening with the latest builds (10.2.2-24.el7cp)
It's actually ok for there to be more degraded objects than objects. If the pool is configured for 4 replicas but you only have 2, each object is degraded twice. This appears to have happened with 2 osds and pool size=3, however, so it seems like each object should have been degraded once. Possibly a bug with contructing the stats. I do not think this should be a 2.0 blocker.
I haven't been able to reproduce this myself. One possible cause would be PGs getting mapped to a smaller acting set than expected, as can happen with older crush tunables. Can you reproduce with osd debugging (debug osd = 20, debug ms = 1) enabled and post the osd logs and output of 'ceph pg dump'?
Closing on the assumption that it's just the normal behavior absent any other information.