Bug 2354646

Summary: [8.0] [Read Balancer] Make rm-pg-upmap-primary able to remove mappings by force
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Laura Flores <lflores>
Component: RADOSAssignee: Laura Flores <lflores>
Status: CLOSED DUPLICATE QA Contact: Pawan <pdhiran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.1CC: bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, vumrao
Target Milestone: ---   
Target Release: 8.0z4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: A new command, `ceph osd rm-pg-upmap-primary-all`, has been added that allows users to clear all pg-upmap-primary mappings in the osdmap when desired. As with the existing command `ceph osd rm-pg-upmap-primary <pgid>`, this new command should be used with caution, as it directly modifies primary PG mappings and can impact read performance (this excludes any data movement). Reason: Users who want to remove all pg-upmap-primary mappings may do so more easily now with one command. This command may also be used to remove invalid mappings left over from a bug where pg-upmap-primary entries were left in the osdmap after users deleted a pool. Result: If a user has pg-upmap-primary mappings in their osdmap, the expected result after running the new command should be that all pg-upmap-primary mappings have been removed from the cluster. This includes valid and invalid pg-upmap-primary mappings.
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-04-03 00:00:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Laura Flores 2025-03-24 22:25:53 UTC
This bug was initially created as a copy of Bug #2349077

I am copying this bug because: 
This needs to be backported to the 8.0 z-stream in addition to 8.1.


Description of problem:

Corresponding upstream tracker here: https://tracker.ceph.com/issues/69760

Essentially, the user was running a v18.2.1 cluster and hit BZ#2290580, which we know occurs when clients older than Reef are erroneously allowed to connect to the cluster when pg_upmap_primary, a strictly-Reef feature, is employed.

The user also hit BZ#2348970, which occurs when a pool is deleted and "phantom" pg_upmap_primary entries for that pool are left in the OSDMap. Therefore, the user cannot remove the pg_upmap_primary entries prior to upgrading from the broken encoder to the fixed encoder, which is the suggested workaround for BZ#2290580.

The idea for a fix is to provide the option to force-removal of a "phantom" pg_upmap_primary mapping, and potentially to relax the assertion in the OSDMap encoder.

The net effect: Although fixes for BZ#2290580 are already included in v18.2.4, the user still experiences difficulty if they hit the crash try to upgrade.

Version-Release number of selected component (if applicable):
v18.2.1

Comment 2 Storage PM bot 2025-03-24 22:26:01 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.