Bug 2354646 - [8.0] [Read Balancer] Make rm-pg-upmap-primary able to remove mappings by force
Summary: [8.0] [Read Balancer] Make rm-pg-upmap-primary able to remove mappings by force
Keywords:
Status: CLOSED DUPLICATE of bug 2357063
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 7.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 8.0z4
Assignee: Laura Flores
QA Contact: Pawan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2025-03-24 22:25 UTC by Laura Flores
Modified: 2025-04-03 00:12 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: A new command, `ceph osd rm-pg-upmap-primary-all`, has been added that allows users to clear all pg-upmap-primary mappings in the osdmap when desired. As with the existing command `ceph osd rm-pg-upmap-primary <pgid>`, this new command should be used with caution, as it directly modifies primary PG mappings and can impact read performance (this excludes any data movement). Reason: Users who want to remove all pg-upmap-primary mappings may do so more easily now with one command. This command may also be used to remove invalid mappings left over from a bug where pg-upmap-primary entries were left in the osdmap after users deleted a pool. Result: If a user has pg-upmap-primary mappings in their osdmap, the expected result after running the new command should be that all pg-upmap-primary mappings have been removed from the cluster. This includes valid and invalid pg-upmap-primary mappings.
Clone Of:
Environment:
Last Closed: 2025-04-03 00:00:29 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 62421 0 None Merged squid: mon, osd: add command to remove invalid pg-upmap-primary entries 2025-03-24 22:25:53 UTC
Red Hat Issue Tracker RHCEPH-10949 0 None None None 2025-03-24 22:26:50 UTC

Description Laura Flores 2025-03-24 22:25:53 UTC
This bug was initially created as a copy of Bug #2349077

I am copying this bug because: 
This needs to be backported to the 8.0 z-stream in addition to 8.1.


Description of problem:

Corresponding upstream tracker here: https://tracker.ceph.com/issues/69760

Essentially, the user was running a v18.2.1 cluster and hit BZ#2290580, which we know occurs when clients older than Reef are erroneously allowed to connect to the cluster when pg_upmap_primary, a strictly-Reef feature, is employed.

The user also hit BZ#2348970, which occurs when a pool is deleted and "phantom" pg_upmap_primary entries for that pool are left in the OSDMap. Therefore, the user cannot remove the pg_upmap_primary entries prior to upgrading from the broken encoder to the fixed encoder, which is the suggested workaround for BZ#2290580.

The idea for a fix is to provide the option to force-removal of a "phantom" pg_upmap_primary mapping, and potentially to relax the assertion in the OSDMap encoder.

The net effect: Although fixes for BZ#2290580 are already included in v18.2.4, the user still experiences difficulty if they hit the crash try to upgrade.

Version-Release number of selected component (if applicable):
v18.2.1

Comment 2 Storage PM bot 2025-03-24 22:26:01 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.


Note You need to log in before you can comment on or make changes to this bug.