Bug 2134786 - Recovery started even though the norecover flag is raised in the stretch cluster
Summary: Recovery started even though the norecover flag is raised in the stretch clu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 6.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 7.1
Assignee: Aishwarya Mathuria
QA Contact: skanta
Akash Raj
URL:
Whiteboard:
Depends On:
Blocks: 2267614 2298578 2298579
TreeView+ depends on / blocked
 
Reported: 2022-10-14 10:13 UTC by skanta
Modified: 2024-08-13 03:02 UTC (History)
18 users (show)

Fixed In Version: ceph-18.2.1-4.el9cp
Doc Type: Bug Fix
Doc Text:
.Autoscaler no longer runs while the `norecover` flag is set Previously, the autoscaler would run while the `norecover` flag was set leading to creation of new PGs and these PGs requiring to be backfilled. Running of autoscaler while the `norecover` flag is set allowed in cases where I/O is blocked on missing or degraded objects in order to avoid client I/O hanging indefinitely. With this fix, the autoscaler does not run while the `norecover` flag is set.
Clone Of:
Environment:
Last Closed: 2024-06-13 14:19:42 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 54708 0 None open mgr/pg_autoscaler: add check for norecover flag 2023-12-14 20:42:50 UTC
Red Hat Issue Tracker RHCEPH-5444 0 None None None 2022-10-14 10:35:08 UTC
Red Hat Product Errata RHSA-2024:3925 0 None None None 2024-06-13 14:19:48 UTC

Comment 38 Aishwarya Mathuria 2023-08-25 06:12:46 UTC
From the cluster logs we could make out that the PGs went into recovering state after the autoscaler kicked in and PG splitting had occurred. 
The cluster logs unfortunately did not contain the logs that captured recovery messages. However, I was able to reproduce this issue on a vstart cluster. 
To do this, I followed the following steps:
1. Set norecovery and nobackfill flags
2. Created a pool
3. Used radosbench to write objects to this pool 
4. While objects are being written to this pool, I increased the number of PGs (pg_num) of this pool.

Once I did this, I observed that a few PGs went into recovering state and using the PG query command I could see the num_objects_recovered increasing. 
I believe the reason I could not capture these logs in the cluster is that by the time we started recording debug_osd=30 OSD logs, all I/O on the cluster had been stopped.
From the logs, we figured out that recovery starts when we try write to a degraded object or try to read a missing object.
Both of these instances cause maybe_kick_recovery() to start recovery for that particular object (recover_object). 
This work-flow does not check if the norecover flag is set or not. 
I will be working on fixing this issue. Attaching the logs below for reference:

Write to a degraded object: 

2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded mbc={255={(2+0)=2}}] do_op osd_op(client.4155.0:466 1.1 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304 in=4194304b] snapc 0=[] RETRY=1 ondisk+retry+write+known_if_redirected+supports_pool_eio e22) v8 may_write -> write-ordered flags ondisk+retry+write+known_if_redirected+supports_pool_eio
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded mbc={255={(2+0)=2}}] readable_with_acting: locs:1,2
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded mbc={255={(2+0)=2}}] do_op degraded_or_backfilling_object
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded mbc={255={(2+0)=2}}] do_op wait_for_degraded_object
2023-08-24T10:53:21.916+0530 7ffa5831e6c0  7 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded mbc={255={(2+0)=2}}] object 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head v 21'466, recovering.
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded mbc={255={(2+0)=2}}] prep_object_replica_pushes: on 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head

Starting recovery_op and building push op: 

2023-08-24T10:53:21.916+0530 7ffa5831e6c0 20 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded mbc={255={(2+0)=2}}] recovery got recovery read lock on 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded mbc={255={(2+0)=2}}] start_recovery_op 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 23 start_recovery_op pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=1 mbc={255={(2+0)=2}}] 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head (0/3 rops)
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=1 mbc={255={(2+0)=2}}] recover_object: 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 20 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=1 mbc={255={(2+0)=2}}] start_pushes soid 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=1 mbc={255={(2+0)=2}}] prep_push_to_replica: 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head v21'466 size 4194304 to osd.0
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 15 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=1 mbc={255={(2+0)=2}}] push_to_replica snapset is 0=[]:{}
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=1 mbc={255={(2+0)=2}}] calc_head_subsets 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head clone_overlap {}
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=1 mbc={255={(2+0)=2}}] calc_head_subsets 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head data_subset [0~4194304]
2023-08-24T10:53:21.916+0530 7ffa5831e6c0 10 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=1 mbc={255={(2+0)=2}}] skipping clone, nothing needs to clone
2023-08-24T10:53:21.916+0530 7ffa5831e6c0  7 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=1 mbc={255={(2+0)=2}}] build_push_op 1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head v 21'466 size 4194304 recovery_info: ObjectRecoveryInfo(1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head@21'466, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0)

Send pushes to osd.0 from osd.1:

2023-08-24T10:53:22.049+0530 7ffa5831e6c0 20 osd.1 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=0 lpr=22 pi=[18,22)/1 crt=21'468 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded rops=2 mbc={255={(2+0)=2}}] send_pushes: sending push PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false)) to osd.0
2023-08-24T10:53:22.049+0530 7ffa5831e6c0  1 -- [v2:192.168.68.109:6812/1081787642,v1:192.168.68.109:6813/1081787642] --> [v2:192.168.68.109:6804/2591898341,v1:192.168.68.109:6805/2591898341] -- MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4 -- 0x10d031e0 con 0xa3fd180


osd.0 logs queueing PGRecoveryMsg:

2023-08-24T10:53:22.060+0530 7f212a5546c0  1 -- [v2:192.168.68.109:6804/2591898341,v1:192.168.68.109:6805/2591898341] <== osd.1 v2:192.168.68.109:6812/1081787642 514 ==== MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4 ==== 4195427+0+0 (crc 0 0 0) 0xb78e680 con 0x8acdb00
2023-08-24T10:53:22.060+0530 7f212a5546c0 15 osd.0 23 enqueue_op MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4 prio 196 type 105 cost 4195304 latency 0.008027 epoch 23 MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4
2023-08-24T10:53:22.060+0530 7f212a5546c0 20 osd.0 op_wq(1) _enqueue OpSchedulerItem(1.1 PGRecoveryMsg(op=MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4) class_id 2 prio 196 cost 4195304 e23)
2023-08-24T10:53:22.139+0530 7f210f51e6c0 10 osd.0 23 dequeue_op MOSDPGPush(1.1 23/22 [PushOp(1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head, version: 21'466, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:9dc90fba:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object465:head@21'466, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4 finish
2023-08-24T10:53:22.139+0530 7f210f51e6c0 20 osd.0 op_wq(1) _process 1.1 to_process <> waiting <> waiting_peering {}
2023-08-24T10:53:22.139+0530 7f210f51e6c0 20 osd.0 op_wq(1) _process OpSchedulerItem(1.1 PGRecoveryMsg(op=MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4) class_id 2 prio 196 cost 4195304 e23) queued
2023-08-24T10:53:22.139+0530 7f210f51e6c0 30 osd.0 pg_epoch: 23 pg[1.1( v 21'468 lc 21'466 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=1 lpr=22 pi=[18,22)/1 luod=0'0 crt=21'468 lcod 0'0 mlcod 0'0 active m=1 mbc={}] lock
2023-08-24T10:53:22.139+0530 7f210f51e6c0 20 osd.0 op_wq(1) _process 1.1 to_process <OpSchedulerItem(1.1 PGRecoveryMsg(op=MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4) class_id 2 prio 196 cost 4195304 e23)> waiting <> waiting_peering {}
2023-08-24T10:53:22.140+0530 7f210f51e6c0 20 osd.0 op_wq(1) _process OpSchedulerItem(1.1 PGRecoveryMsg(op=MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4) class_id 2 prio 196 cost 4195304 e23) pg 0xb772000
2023-08-24T10:53:22.140+0530 7f210f51e6c0 10 osd.0 23 dequeue_op MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4 prio 196 cost 4195304 latency 0.087563 MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4 pg pg[1.1( v 21'468 lc 21'466 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=1 lpr=22 pi=[18,22)/1 luod=0'0 crt=21'468 lcod 0'0 mlcod 0'0 active m=1 mbc={}]
2023-08-24T10:53:22.140+0530 7f210f51e6c0 10 osd.0 23 maybe_share_map: con v2:192.168.68.109:6812/1081787642 our osdmap epoch of 23 is not newer than session's projected_epoch of 23
2023-08-24T10:53:22.140+0530 7f210f51e6c0 10 osd.0 pg_epoch: 23 pg[1.1( v 21'468 lc 21'466 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=1 lpr=22 pi=[18,22)/1 luod=0'0 crt=21'468 lcod 0'0 mlcod 0'0 active m=1 mbc={}] _handle_message: MOSDPGPush(1.1 23/22 [PushOp(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head, version: 21'467, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v4
2023-08-24T10:53:22.140+0530 7f210f51e6c0 10 osd.0 pg_epoch: 23 pg[1.1( v 21'468 lc 21'466 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=1 lpr=22 pi=[18,22)/1 luod=0'0 crt=21'468 lcod 0'0 mlcod 0'0 active m=1 mbc={}] handle_push ObjectRecoveryInfo(1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head@21'467, size: 4194304, copy_subset: [0~4194304], clone_subset: {}, snapset: 0=[]:{}, object_exist: 0)ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false)
2023-08-24T10:53:22.140+0530 7f210f51e6c0 20 osd.0 pg_epoch: 23 pg[1.1( v 21'468 lc 21'466 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=1 lpr=22 pi=[18,22)/1 luod=0'0 crt=21'468 lcod 0'0 mlcod 0'0 active m=1 mbc={}] submit_push_data recovering object 1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head copy_subset: [0~4194304] intervals_included: [0~4194304] data_zeros: []
2023-08-24T10:53:22.140+0530 7f210f51e6c0 10 osd.0 pg_epoch: 23 pg[1.1( v 21'468 lc 21'466 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=1 lpr=22 pi=[18,22)/1 luod=0'0 crt=21'468 lcod 0'0 mlcod 0'0 active m=1 mbc={}] on_local_recover: 1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head
2023-08-24T10:53:22.140+0530 7f210f51e6c0 10 osd.0 pg_epoch: 23 pg[1.1( v 21'468 lc 21'466 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=1 lpr=22 pi=[18,22)/1 luod=0'0 crt=21'468 lcod 0'0 mlcod 0'0 active m=1 mbc={}] PeeringState::recover_got got missing 1:bb40a492:::benchmark_data_li-9e0a7d4c-34fa-11b2-a85c-bf_356346_object466:head v 21'467
2023-08-24T10:53:22.140+0530 7f210f51e6c0 10 osd.0 pg_epoch: 23 pg[1.1( v 21'468 (0'0,21'468] local-lis/les=22/23 n=117 ec=22/18 lis/c=22/18 les/c/f=23/19/0 sis=22) [1,0,2] r=1 lpr=22 pi=[18,22)/1 luod=0'0 crt=21'468 lcod 0'0 mlcod 0'0 active mbc={}] PeeringState::recover_got last_complete now 21'468 log.complete_to at end

Comment 40 Aishwarya Mathuria 2023-12-14 04:41:23 UTC
PR currently being reviewed - https://github.com/ceph/ceph/pull/54708

Comment 46 Aishwarya Mathuria 2024-03-06 13:47:50 UTC
Hi Akash, 

I've added the doc text and type.

Comment 47 errata-xmlrpc 2024-06-13 14:19:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925


Note You need to log in before you can comment on or make changes to this bug.