2349056 – [Read Balancer] PGs not getting scaled down post removal of bulk flag on the cluster

Bug 2349056 - [Read Balancer] PGs not getting scaled down post removal of bulk flag on the cluster

Summary: [Read Balancer] PGs not getting scaled down post removal of bulk flag on the ...

Keywords:
Status:	CLOSED DUPLICATE of bug 2357061
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	7.1z5
Assignee:	Laura Flores
QA Contact:	Pawan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2025-02-28 19:37 UTC by Laura Flores
Modified:	2025-04-02 23:46 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2025-04-02 23:46:51 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph pull 62067	0	None	open	osd: add pg-upmap-primary to clean_pg_upmaps	2025-03-05 22:17:35 UTC
Red Hat Issue Tracker	RHCEPH-10690	0	None	None	None	2025-02-28 19:38:29 UTC

Description Laura Flores 2025-02-28 19:37:21 UTC

Description copied over from https://bugzilla.redhat.com/show_bug.cgi?id=2302230, which is the corresponding BZ for 8.x. This BZ is meant to target a 7.x version.

..............................

Description of problem:
PGs are not being scaled down on the pool after disabling the bulk flag on the pool. 

The New calculated pg num is being displayed in autoscale-status and ls pool detail, but the actual scale-down of the PGs in pool is not happening.

# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 37 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 20.00
pool 2 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 91 lfor 0/0/54 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs read_balance_score 2.51
pool 3 'cephfs.cephfs.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode on last_change 73 lfor 0/0/71 flags hashpspool,bulk stripe_width 0 application cephfs read_balance_score 1.05
pool 8 'balancer_test_pool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 243 pgp_num 231 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 395 lfor 0/393/391 flags hashpspool stripe_width 0 application rados read_balance_score 1.32


[root@ceph-pdhiran-cdx69q-node1-installer ~]# ceph osd pool autoscale-status
POOL                  SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
.mgr                602.6k                3.0        499.9G  0.0000                                  1.0       1              on         False
cephfs.cephfs.meta  32768                 3.0        499.9G  0.0000                                  4.0      16              on         False
balancer_test_pool   2004k                3.0        499.9G  0.0000                                  1.0      32              on         False
cephfs.cephfs.data      0                 3.0        499.9G  0.0000                                  1.0     512              on         True

# ceph df detail
--- RAW STORAGE ---
CLASS     SIZE    AVAIL    USED  RAW USED  %RAW USED
hdd    500 GiB  489 GiB  11 GiB    11 GiB       2.24
TOTAL  500 GiB  489 GiB  11 GiB    11 GiB       2.24

--- POOLS ---
POOL                ID  PGS   STORED   (DATA)  (OMAP)  OBJECTS     USED   (DATA)  (OMAP)  %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
.mgr                 1    1  598 KiB  598 KiB     0 B        2  1.8 MiB  1.8 MiB     0 B      0    154 GiB            N/A          N/A    N/A         0 B          0 B
cephfs.cephfs.meta   2   16  2.4 KiB  2.4 KiB     0 B       22   96 KiB   96 KiB     0 B      0    154 GiB            N/A          N/A    N/A         0 B          0 B
cephfs.cephfs.data   3  512      0 B      0 B     0 B        0      0 B      0 B     0 B      0    154 GiB            N/A          N/A    N/A         0 B          0 B
balancer_test_pool   8  243  2.0 MiB  2.0 MiB     0 B      501  5.9 MiB  5.9 MiB     0 B      0    154 GiB            N/A          N/A    N/A         0 B          0 B

Version-Release number of selected component (if applicable):
# ceph version
ceph version 19.1.0-4.el9cp (b2c7ded5f7885ce1d488a241a30cba80f58d28bc) squid (rc)

How reproducible:
5/5 via automated runs

Steps to Reproduce:
1. Update the balancer mode on the cluster to upmap-read.
cmd : ceph balancer mode upmap-read

2. Create a test pool (balancer_test_pool), enable the application on the pool and write some data.

3. Once few objects are created on the pool, enable the bulk flag.
cmd : ceph osd pool set balancer_test_pool bulk true

4. Once the bulk flag is set on the pool, the new PG num is calculated for the pool, and the PGs are split to increase to desired count. In this case, the New PG num was calculated to be 256, and the pool is scaled up to 256 PGs.

5. Once the scale up is complete, remove the bulk flag on the pool.
cmd : ceph osd pool set balancer_test_pool bulk false

6. After removing the bulk flag, new PG count is again calculated on the pool. In this case, 32 PGs. But the PG scale down is stuck, and does not reduce the PG count on the pool even after 30 mins. Note that there is no IOs going on the cluster.

From balancer logs : 

    "balancer_test_pool": {
        "application_metadata": {
            "rados": {}
        },
        "auid": 0,
        "cache_min_evict_age": 0,
        "cache_min_flush_age": 0,
        "cache_mode": "none",
        "cache_target_dirty_high_ratio_micro": 600000,
        "cache_target_dirty_ratio_micro": 400000,
        "cache_target_full_ratio_micro": 800000,
        "create_time": "2024-08-01T09:23:57.179867+0000",
        "crush_rule": 0,
        "erasure_code_profile": "",
        "expected_num_objects": 0,
        "fast_read": false,
        "flags": 1,
        "flags_names": "hashpspool",
        "grade_table": [],
        "hit_set_count": 0,
        "hit_set_grade_decay_rate": 0,
        "hit_set_params": {
            "type": "none"
        },
        "hit_set_period": 0,
        "hit_set_search_last_n": 0,
        "last_change": "395",
        "last_force_op_resend": "0",
        "last_force_op_resend_preluminous": "391",
        "last_force_op_resend_prenautilus": "393",
        "last_pg_merge_meta": {
            "last_epoch_clean": 392,
            "last_epoch_started": 392,
            "ready_epoch": 393,
            "source_pgid": "8.f3",
            "source_version": "162'27",
            "target_version": "162'27"
        },
        "min_read_recency_for_promote": 0,
        "min_size": 2,
        "min_write_recency_for_promote": 0,
        "object_hash": 2,
        "options": {},
        "peering_crush_bucket_barrier": 0,
        "peering_crush_bucket_count": 0,
        "peering_crush_bucket_mandatory_member": 2147483647,
        "peering_crush_bucket_target": 0,
        "pg_autoscale_mode": "on",
        "pg_num": 243,
        "pg_num_pending": 243,
        "pg_num_target": 32,
        "pg_placement_num": 231,
        "pg_placement_num_target": 32,
        "pool": 8,
        "pool_name": "balancer_test_pool",
        "pool_snaps": [],
        "quota_max_bytes": 0,
        "quota_max_objects": 0,
        "read_balance": {
            "average_primary_affinity": 1.0,
            "average_primary_affinity_weighted": 1.0,
            "optimal_score": 1.0,
            "primary_affinity_weighted": 1.0000001192092896,
            "raw_score_acting": 1.3200000524520874,
            "raw_score_stable": 1.3200000524520874,
            "score_acting": 1.3200000524520874,
            "score_stable": 1.3200000524520874,
            "score_type": "Fair distribution"
        },
        "read_tier": -1,
        "removed_snaps": "[]",
        "size": 3,
        "snap_epoch": 0,
        "snap_mode": "selfmanaged",
        "snap_seq": 0,
        "stripe_width": 0,
        "target_max_bytes": 0,
        "target_max_objects": 0,
        "tier_of": -1,
        "tiers": [],
        "type": 1,
        "use_gmt_hitset": true,
        "write_tier": -1
    },
}
2024-08-01 10:00:24,781 [Dummy-2] [DEBUG] [root] root_ids [-1, -1, -1, -1] pools [1, 2, 3, 8] with 20 osds, pg_target 2000
2024-08-01 10:00:24,781 [Dummy-2] [INFO] [root] effective_target_ratio 0.0 0.0 0 536787025920
2024-08-01 10:00:24,781 [Dummy-2] [INFO] [root] Pool '.mgr' root_id -1 using 3.449025238318487e-06 of space, bias 1.0, pg target 0.0022993501588789915 quantized to 1 (current 1)
2024-08-01 10:00:24,782 [Dummy-2] [INFO] [root] effective_target_ratio 0.0 0.0 0 536787025920
2024-08-01 10:00:24,782 [Dummy-2] [INFO] [root] Pool 'cephfs.cephfs.meta' root_id -1 using 1.8313408345053914e-07 of space, bias 4.0, pg target 0.0004883575558681044 quantized to 16 (current 16)
2024-08-01 10:00:24,782 [Dummy-2] [INFO] [root] effective_target_ratio 0.0 0.0 0 536787025920
2024-08-01 10:00:24,782 [Dummy-2] [INFO] [root] effective_target_ratio 0.0 0.0 0 536787025920
2024-08-01 10:00:24,782 [Dummy-2] [INFO] [root] Pool 'balancer_test_pool' root_id -1 using 1.1468771976090014e-05 of space, bias 1.0, pg target 0.0076458479840600104 quantized to 32 (current 32)
2024-08-01 10:00:24,782 [Dummy-2] [INFO] [root] effective_target_ratio 0.0 0.0 0 536787025920
2024-08-01 10:00:24,782 [Dummy-2] [INFO] [root] effective_target_ratio 0.0 0.0 0 536787025920
2024-08-01 10:00:24,782 [Dummy-2] [INFO] [root] Pool 'cephfs.cephfs.data' root_id -1 using 0.0 of space, bias 1.0, pg target 666.6666666666666 quantized to 512 (current 512)

Actual results:
The PGs are not being scaled down on the pool, once new PG count is identified by the PG Autoscaler.

Expected results:
The PGs should be scaled down on the pool, once new PG count is identified by the PG Autoscaler

Note You need to log in before you can comment on or make changes to this bug.