Description of problem: pg_num_check() should take into account the crush rule of the pool. 2153654 reported an error with setting the pool size to 1 using a specified crush rules. In the interest of the dev freeze timelines, the PR where the regression was introduced had been reverted. Since the PR which made pg_num_check() to work accurately was reverted, we need to fix pg_num_check() to work accurately again. How reproducible: Always Steps to Reproduce: 1. Using a cluster with 6 OSDs, set up a new crush rule where only 1 OSDs is added as root. (See "Additional info") 2. Assuming `mon_max_pg_per_osd` configuration value is 250, create a pool with 256 pg/pgp num with the new crush rule mentioned above. Actual results: Although we exceed the `mon_max_pg_per_osd` limit - the pool will be created successfully. Since the crush rule is not taken into account while checking, we divide the projected pg num by *all* of the OSDs in the cluster, we should divide by the root OSDs of the crush rule only. ``` pool 'pool_test ' created ``` Notice osd.0: ``` ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.09859 1.00000 101 GiB 1.0 GiB 780 KiB 0 B 23 MiB 100 GiB 0.99 1.00 257 up 1 ssd 0.09859 1.00000 101 GiB 1.0 GiB 812 KiB 0 B 23 MiB 100 GiB 0.99 1.00 3 up 2 ssd 0.09859 1.00000 101 GiB 1.0 GiB 780 KiB 0 B 19 MiB 100 GiB 0.99 1.00 2 up 3 ssd 0.09859 1.00000 101 GiB 1.0 GiB 360 KiB 0 B 18 MiB 100 GiB 0.99 1.00 1 up 4 ssd 0.09859 1.00000 101 GiB 1.0 GiB 328 KiB 0 B 18 MiB 100 GiB 0.99 1.00 1 up 5 ssd 0.09859 1.00000 101 GiB 1.0 GiB 360 KiB 0 B 22 MiB 100 GiB 0.99 1.00 1 up ``` Expected results: pg_num_check() should disallow the creation of this pool (since it's pg num will exceed the per_osd limit with this crush rule) ``` Error ERANGE: pg_num 256 size 3 for this pool would result in 256 cumulative PGs per OSD (768 total PG replicas on 3 'in' root OSDs by crush rule) which exceeds the mon_max_pg_per_osd value of 250 ``` Additional info: `ceph osd tree` notice the new crush rule named: `osd_test`. ``` ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -5 0.19717 root osd_test 0 ssd 0.09859 osd.0 up 1.00000 1.00000 -1 0.59151 root default -3 0.59151 host folio 0 ssd 0.09859 osd.0 up 1.00000 1.00000 1 ssd 0.09859 osd.1 up 1.00000 1.00000 2 ssd 0.09859 osd.2 up 1.00000 1.00000 3 ssd 0.09859 osd.3 up 1.00000 1.00000 4 ssd 0.09859 osd.4 up 1.00000 1.00000 5 ssd 0.09859 osd.5 up 1.00000 1.00000 ```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:4473