From Paul Cuzner's testing of rbd-mirror: Introducing a network delay between clusters for a small workload (100 images, 2,500 IOPS) showed the following; o Measuring the effect of 20ms latency applied before the rbd-mirror relationships were created ▪ After several hours, tests with as few as 50 images (250GB of data) were not able to achieve synchronization o Measuring the effect of various network latencies after the initial synchronization was complete ▪ At 10ms, the sync time is extended by at least 30% but replication success remains consistent. ▪ Based on cloudping data, there are NO compatible AWS regions that exhibit this latency ▪ At 20ms latency, network bandwidth and CPU load imply replication is not happening, but snapshot timestamps are changing – it's just very, very slow! ▪ Changes to concurrent_image_syncs are needed to force rbd-mirror to run more concurrent sessions to accommodate the 20ms network delay. The downside of this strategy is increased CPU load, as more sync tasks are handled concurrently. ▪ Using the cloudping data with a 20ms ceiling there are 10 regions that have the potential to support snapshot rbd-mirror across 14 region-to-region combinations (code and output) ▪ At 50ms latency, with 50 concurrent images syncs, the images do not replicate within the replication interval. Snapshots are taken at the primary cluster but after 2 hours the secondary site has not been able to achieve a consistent state with the primary cluster
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3623