Description of problem: we were having a strange problem when running rgw system test with large number of even sized objects and noticed uneven distribution and usage of disk, (all the disks are of same weight and in size as well), Since we used the defaults crush settings while setting up the cluster i was digging into what other parameters were set wrong and noticed that the crush profile is set to firefly by default instead of optimal. I believe optimal should be set for fresh installs and for upgrade cases will be tricky and better left for admin. I am not sure if the uneven distribution is just related to firefly profile but as soon as i set this to optimal i saw large number of objects that have been misplaced 318169/515498 objects misplaced (61.721%) v8650: 1216 pgs: 377 active+recovery_wait+degraded, 110 active+remapped+wait_backfill, 5 active+remapped+backfilling, 724 active+clean; 407 GB data, 1238 GB used, 2774 GB / 4013 GB avail; 25000 kB/s wr, 60 op/s; 5107/515498 objects degraded (0.991%); 318169/515498 objects misplaced (61.721%); 339 MB/s, 88 objects/s recovering with crush firefly default settings i have seen one of the OSD's fills up soonish and could cause disk full issue when there is room for more or unless resetting crush which could cause lot of moves.
We are not changing the crush tunables with ceph-ansible. Does that mean we should? I'd think this should reside in Ceph itself.
Sage recently set this to hammer in https://github.com/ceph/ceph/pull/14959, so this will be in Luminous / RHCEPH 3.0.
Hmm, I'd like to set the luminous (3.0) defaults to jewel tunables, actually; that's the last disruptive tunable option we added (chooseleaf_stable) that requires lots of data movement to adjust/fix. We should confirm that the RHEL kernel has that support backported, though, and probably document which kernel it is. For jewel downstream it's pretty similar: we don't care so much about old userspace clients connecting to a new cluster, but we do want to make sure RHEL clients can connect. I'd suggest in this case to change it downstream, though, and not modify upstream jewel this late in its lifecycle.
Sage updated the default tunables again to jewel. https://github.com/ceph/ceph/pull/15370 Ilya, what should we document here for RHCEPH 3.0's RHEL kernel version requirements?
Default values: [cephuser@ceph-jenkins-build-run236-node8-rgw ~]$ sudo ceph osd crush show-tunables { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, "chooseleaf_vary_r": 1, "chooseleaf_stable": 1, "straw_calc_version": 1, "allowed_bucket_algs": 54, "profile": "jewel", "optimal_tunables": 1, "legacy_tunables": 0, "minimum_required_version": "jewel", "require_feature_tunables": 1, "require_feature_tunables2": 1, "has_v2_rules": 0, "require_feature_tunables3": 1, "has_v3_rules": 0, "has_v4_buckets": 1, "require_feature_tunables5": 1, "has_v5_rules": 0 } [cephuser@ceph-jenkins-build-run236-node8-rgw ~]$ sudo ceph --version ceph version 12.2.0-2.el7cp (3137b4f525c5dcc2a34fef5b0f6bcf4477312db9) luminous (rc)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387