Bug 2080982

Summary: [rbd-mirror] : force promote crashed for some images - snapshot/PromoteRequest.cc: 261: FAILED ceph_assert(info != nullptr)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: RBD-MirrorAssignee: Ilya Dryomov <idryomov>
Status: ASSIGNED --- QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.1CC: amagrawa, ceph-eng-bugs, idryomov, jdurgin, kseeger, mmurthy, sostapov, vereddy
Target Milestone: ---   
Target Release: 7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2135372    

Description Vasishta 2022-05-02 14:04:25 UTC
Description of problem:
When force promote was issued on images, promote operation on some images crashed 

Version-Release number of selected component (if applicable):
16.2.7-109.el8cp

How reproducible:
Tried only once

Steps to Reproduce:
1. Configure snapshot based mirroring between two clusters
2. Create 50 images (with ec pool) one one cluster and 50 on another cluster (without ec pool)
3. Promote images on opposite clusters with --force option.
4. Issue was observed when non-expool images were being promoted on one of the clusters. (Mirroring daemon of the opposite cluster went down as rbd_mirror_die_after_seconds was being tested)

Actual results:
    -2> 2022-05-02T13:06:32.854+0000 7fc111a46700 10 monclient: get_auth_request con 0x7fc0f400b830 auth_method 0
    -1> 2022-05-02T13:06:32.871+0000 7fc110a44700 -1 /builddir/build/BUILD/ceph-16.2.7/src/librbd/mirror/snapshot/PromoteRequest.cc: In function 'void librbd::mirror::snapshot::PromoteRequest<ImageCtxT>::rollback() [with ImageCtxT = librbd::ImageCtx]' thread 7fc110a44700 time 2022-05-02T13:06:32.871080+0000
/builddir/build/BUILD/ceph-16.2.7/src/librbd/mirror/snapshot/PromoteRequest.cc: 261: FAILED ceph_assert(info != nullptr)

Impact :
Some images won't get promoted

Expected results:
promote shouldn't crash

Workaround:
Note down images which were not promoted, retry promote.

Additional information 
# ceph crash ls
ID                                                                ENTITY        NEW  
2022-05-02T13:04:17.888612Z_212a8a21-a839-4d29-a36b-b8f68dc913e8  client.admin   *   
2022-05-02T13:04:49.777751Z_a40ab99b-d13e-4c90-830f-b5002d11515b  client.admin   *   
2022-05-02T13:06:32.873214Z_b0b9c415-4008-4031-ab0c-4478824d707f  client.admin   *

Comment 3 Ilya Dryomov 2022-06-29 13:57:36 UTC
*** Bug 2102107 has been marked as a duplicate of this bug. ***

Comment 13 Scott Ostapovicz 2023-02-06 16:53:36 UTC
 Missed the 5.3 z1 window.  Moving to 6.1.  Please advise if this is a problem.

Comment 14 Josh Durgin 2023-03-22 23:03:57 UTC
As discussed in the DR meetings, force promote fixes will take longer to land. Moving out of 6.1.