Description of problem: >>> Upstream tracker: https://tracker.ceph.com/issues/66289 It has been observed during the regular Release Criteria testing of 7.1 and otherwise, that when cluster goes through a recovery phase, the average client throughput captured during this time is inconsistent over several runs. The test workflow where this was found is called OSDFailure testing and consists of the following rounds: a. Warming up the OSDs with pure writes (a.k.a Fill workload) b. Measuring the performance of the cluster over a period of 1 hour using hybrid workload c. Injecting failure by bringing down an OSD host with continuous hybrid IOs d. Injecting failure by bringing down two OSD host with continuous hybrid IOs e. Down OSD hosts are brought back up and performance of the cluster is monitored during recovery phase with continuous hybrid IOs While the performance of the cluster with respect to client IO has been consistent during phases a, b and c, the same cannot be said when the cluster goes through phase d, and e Contrary to the above, the performance has remained stable and consistent with WPQ osd_op_queue throughout the testing Test Phases description specific to our use case : Phase 1: Consists of warming up the cluster followed by 1 measurement round - The 192 OSDs are distributed across 8 nodes with 24 OSDs per node. - 300 RGW buckets are created and are each filled with 750K objects - Objects are in the range of small object sizes [1KiB, 4KiB, 16KiB, 64KiB, 256KiB] - 5 clients together fill the RGW pool with around 225 million objects - The client workload is initiated using the warp tool. Phase 2: OSD node 1 failure Consists of injecting failure by bringing down an OSD host with continuous hybrid IOs - In this phase one OSD node(24 OSDs) is brought down and the cluster is monitored along with the collection of client and recovery metrics. Phase 3: OSD node 2 failure Consists of injecting failure by bringing down two OSD host with continuous hybrid IOs - After around a couple of hours, another OSD node (24 OSDs) is brought down and the same metrics are collected as above. Phase 4: Bring up both OSD nodes and monitor cluster. Down OSD hosts are brought back up to gauge recovery performance - Both the failed OSD nodes are brought up and the cluster recovery/backfill is monitored along with the client metrics. The test doesn't wait for the cluster to complete the recovery/backfill process fully due to the long recovery times. Version-Release number of selected component (if applicable): 18.2.1-188.el9cp How reproducible: 5/5 Steps to Reproduce: 1. Warp up the cluster by filling it up till 10% capacity 2. Bring one OSD node done with continuous background IOs for 2 hours, observe the cluster behaviour during this time 3. Bring another OSD node done with continuous background IOs for 2 hours, observe the cluster behaviour during this time 4. Bring back all down OSD nodes up with continuous background IOs for two hours, let the cluster recover Actual results: Client Throughput observed during OSD down scenarios is inconsistent across multiple runs Expected results: Client Throughput with background recovery regardless of low or high should be consistent across multiple runs Additional info: Results from different runs ============================================================================================================================================================================= Result doc: https://docs.google.com/spreadsheets/d/1mdyRqcaQAtY4McMV3TLplXhYd3QU8baZUlYMQe98NSs/edit?gid=1735478984#gid=1735478984 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | First run | small sized object | mClock | | (RHCS 7.1 - 18.2.1-188.el9cp) | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Job | Workload | Total Read Throughput | Total Write Throughput | Avg Latency | Avg RGW | Avg RGW | Avg OSD | Avg OSD | Avg Recovery | Recovery Time | | ID | | MB/s | Objs/s | MB/s | Objs/s | (ms) | %CPU | %Mem | %CPU | %Mem | with IO (MB/s) | w/o IO (MB/s) | hh:mm | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | w1 | fill 225M objs | - | - | 1128 | 17529 | 86 | | | | | -- | -- | | | w2 | hybrid noFailure | 485 | 7384 | 376 | 5745 | 66 | | | | | -- | -- | 2313 | | w2 | hybrid OSDnode1 | 234 | 3539 | 182 | 2754 | 133 | | | | | 138 | -- | PGs Unclean | | w2 | hybrid OSDnode2 | 329 | 5030 | 257 | 3914 | 94 | | | | | 52 | -- | after | | w2 | hybrid OSDrecover | 397 | 6040 | 308 | 4699 | 81 | | | | | 9 | 22 | 24 hours | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Timeline for phase 3 from job w2 2024/05/25-03:16:43: Starting warp hybrid phase 3 2024/05/25-03:20:25: Stop all OSDs on f27-h13-000-6048r 2024/05/25-05:35:06: Completed warp hybrid phase 3 OSD perf dump - http://f28-h28-000-r630.rdu2.scalelab.redhat.com/RESULTS/w2_240524-220748_runtest-71_small/osd-perfdumps/ OSD logs: root@f28-h28-000-r630:~/rc-2024/OSDfailure-warp/RESULTS/w2_240524-220748_runtest-71_small/osd-logs or http://f28-h28-000-r630.rdu2.scalelab.redhat.com/RESULTS/w2_240524-220748_runtest-71_small/osd-logs/ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Second run | small sized object | mClock | | (RHCS 7.1 - 18.2.1-188.el9cp) | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Job | Workload | Total Read Throughput | Total Write Throughput | Avg Latency | Avg RGW | Avg RGW | Avg OSD | Avg OSD | Avg Recovery | Recovery Time | | ID | | MB/s | Objs/s | MB/s | Objs/s | (ms) | %CPU | %Mem | %CPU | %Mem | with IO (MB/s) | w/o IO (MB/s) | hh:mm | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | w3 | fill 225M objs | - | - | 1152 | 17855 | 84 | | | | | -- | -- | | | w4 | hybrid noFailure | 491 | 7493 | 382 | 5829 | 65 | | | | | -- | -- | 2180 | | w4 | hybrid OSDnode1 | 234 | 3579 | 183 | 2785 | 131 | | | | | 120 | -- | PGs Unclean | | w4 | hybrid OSDnode2 | 241 | 3658 | 188 | 2846 | 126 | | | | | 70 | -- | after | | w4 | hybrid OSDrecover | 400 | 6088 | 310 | 4736 | 80 | | | | | 12 | 23 | 24 hours | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Timeline for phase 3 from job w4 2024/05/27-07:22:56: Starting warp hybrid phase 3 2024/05/27-07:26:37: Stop all OSDs on f27-h13-000-6048r 2024/05/27-09:36:56: Completed warp hybrid phase 3 OSD perf dump - http://f28-h28-000-r630.rdu2.scalelab.redhat.com/RESULTS/w4/osd-perfdumps/ OSD logs: Not available ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Third run | small sized object | mClock | | (RHCS 7.1 - 18.2.1-188.el9cp) | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Job | Workload | Total Read Throughput | Total Write Throughput | Avg Latency | Avg RGW | Avg RGW | Avg OSD | Avg OSD | Avg Recovery | Recovery Time | | ID | | MB/s | Objs/s | MB/s | Objs/s | (ms) | %CPU | %Mem | %CPU | %Mem | with IO (MB/s) | w/o IO (MB/s) | hh:mm | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | w11 | fill 225M objs | - | - | 1305 | 20224 | 74 | 721 | 0.21 | 91 | 2.3 | -- | -- | | | w12 | hybrid noFailure | 492 | 7493 | 382 | 5829 | 65 | 524 | 0.2 | 69 | 2.8 | -- | -- | 2363 | | w12 | hybrid OSDnode1 | 228 | 3449 | 178 | 2684 | 137 | 434 | 0.21 | 57 | 2.7 | 147 | -- | PGs Unclean | | w12 | hybrid OSDnode2 | 331 | 5035 | 258 | 3917 | 93 | 375 | 0.2 | 51 | 2.7 | 52 | -- | after | | w12 | hybrid OSDrecover | 414 | 6282 | 320 | 4888 | 79 | 354 | 0.2 | 43 | 2.2 | 6 | 28 | 24 hours | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Timeline for phase 3 from job w12 2024/06/10-10:02:42: Starting warp hybrid phase 3 2024/06/10-10:06:23: Stop all OSDs on f27-h13-000-6048r 2024/06/10-12:21:08: Completed warp hybrid phase 3 OSD perf dump - http://f28-h28-000-r630.rdu2.scalelab.redhat.com/RESULTS/w12_240610-044907_runtest-71_runtest_small_repeat2/osd-perfdumps/ OSD logs: http://f28-h28-000-r630.rdu2.scalelab.redhat.com/RESULTS/w12_240610-044907_runtest-71_runtest_small_repeat2/osd-logs/ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Fourth run | small sized object | mClock | | (RHCS 7.1 - 18.2.1-188.el9cp) | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Job | Workload | Total Read Throughput | Total Write Throughput | Avg Latency | Avg RGW | Avg RGW | Avg OSD | Avg OSD | Avg Recovery | Recovery Time | | ID | | MB/s | Objs/s | MB/s | Objs/s | (ms) | %CPU | %Mem | %CPU | %Mem | with IO (MB/s) | w/o IO (MB/s) | hh:mm | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | w13 | fill 225M objs | - | - | 1306 | 20259 | 74 | 537 | 0.2 | 69 | 2.3 | -- | -- | | | w14 | hybrid noFailure | 491 | 7485 | 381 | 5823 | 66 | 291 | 0.2 | 38 | 2.8 | -- | -- | 2470 | | w14 | hybrid OSDnode1 | 217 | 3309 | 170 | 2575 | 145 | 277 | 0.2 | 36 | 2.7 | 140 | -- | PGs Unclean | | w14 | hybrid OSDnode2 | 341 | 5193 | 266 | 4040 | 91 | 260 | 0.2 | 35 | 2.7 | 59 | -- | after | | w14 | hybrid OSDrecover | 411 | 6237 | 318 | 4852 | 79 | 260 | 0.2 | 33 | 2.2 | 7 | 25 | 24 hours | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Timeline for phase 3 from job w14 2024/06/12-04:17:03: Starting warp hybrid phase 3 2024/06/12-04:20:44: Stop all OSDs on f27-h13-000-6048r 2024/06/12-06:36:01: Completed warp hybrid phase 3 OSD perf dump - http://f28-h28-000-r630.rdu2.scalelab.redhat.com/RESULTS/w14_240611-225522_runtest-71_runtest_small_repeat3/osd-perfdumps/ OSD logs: http://f28-h28-000-r630.rdu2.scalelab.redhat.com/RESULTS/w14_240611-225522_runtest-71_runtest_small_repeat3/osd-logs/ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | First run | small sized object | WPQ | | (RHCS 7.1 - 18.2.1-188.el9cp) | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Job | Workload | Total Read Throughput | Total Write Throughput | Avg Latency | Avg RGW | Avg RGW | Avg OSD | Avg OSD | Avg Recovery | Recovery Time | | ID | | MB/s | Objs/s | MB/s | Objs/s | (ms) | %CPU | %Mem | %CPU | %Mem | with IO (MB/s) | w/o IO (MB/s) | hh:mm | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | w9 | fill 225M objs | - | - | 1266 | 19365 | 77 | 482 | 0.2 | 64 | 2.3 | -- | -- | | | w10 | hybrid noFailure | 488 | 7433 | 379 | 5783 | 65 | 206 | 0.2 | 29 | 2.8 | -- | -- | 3835 | | w10 | hybrid OSDnode1 | 399 | 6075 | 309 | 4726 | 80 | 213 | 0.2 | 29 | 2.7 | 76 | -- | PGs Unclean | | w10 | hybrid OSDnode2 | 386 | 5860 | 299 | 4559 | 82 | 219 | 0.2 | 30 | 2.7 | 42 | -- | after | | w10 | hybrid OSDrecover | 410 | 6233 | 317 | 4849 | 81 | 224 | 0.2 | 29 | 2.2 | 5 | 6 | 24 hours | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> Timeline for phase 3 from job w10 2024/06/08-15:05:03: Starting warp hybrid phase 3 2024/06/08-15:08:45: Stop all OSDs on f27-h13-000-6048r 2024/06/08-17:26:07: Completed warp hybrid phase 3 OSD perf dump - http://f28-h28-000-r630.rdu2.scalelab.redhat.com/RESULTS/w10_240608-093336_runtest-71_runtest_small_wpq/osd-perfdumps/ OSD logs: http://f28-h28-000-r630.rdu2.scalelab.redhat.com/RESULTS/w10_240608-093336_runtest-71_runtest_small_wpq/osd-logs/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fixes, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2025:2457