Bug 2045094

Summary: [rbd-mirror] : mirror image status - snapshot - each element needs to be consistent with each other - bytes_per_second and replay_state
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: RBD-MirrorAssignee: Ilya Dryomov <idryomov>
Status: NEW --- QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 5.0CC: ceph-eng-bugs, idryomov, jdurgin, vereddy
Target Milestone: ---   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vasishta 2022-01-25 15:26:54 UTC
Description of problem:
in mirror image status, each element needs to be consistent with each other
Example - bytes_per_second and replay_state

As per understanding, bytes_per_second refers to the speed at which sync is happening and replay_state is the status of sync
replay_state should be idle only when sync is done, i,e bytes_per_second reaches zero

Version-Release number of selected component (if applicable):
16.2.0-146.el8cp

How reproducible:
Have tried in 4.3, it was reproducible

Steps to Reproduce:
1. Configure snapshot based mirroring
2. add some data into an image and take a mirror snapshot
3. observe mirror image status at secondary site

Actual results:
$ sudo rbd mirror image status two_way_rep_snapshot_image/image01
image01:
  global_id:   40118ceb-d5a1-4079-bfeb-89375e23e681
  state:       up+replaying
  description: replaying, {"bytes_per_second":244032232.73,"bytes_per_snapshot":2684354560.0,"local_snapshot_timestamp":1643123739,"remote_snapshot_timestamp":1643123739,"replay_state":"idle"}

Expected results:
bytes_per_second should be zero by the time reply state is reported as idle

Additional info:
Observed this for images of various sizes from 1G to 50G, snapshot sync and  complete resync

Comment 1 Sunny Kumar 2022-05-19 12:55:27 UTC
Hi Vasishta,

You might observe some delay because the stats actually get updated after it re-scans the image. Which is basically needed to calculate the average number of bytes got copied per snapshot.

Hope this helps.
Thanks,
Sunny

Comment 2 Vasishta 2022-05-19 13:04:32 UTC
Hi Sunny,

Thanks for the insights on the slight delay in the update of bytes_per_second.
Though I agree that the delay very small, I think it would be a nice thing if we can bring co-ordination between related stats if bytes_per_second is a metric that denotes the data being copied from the peer cluster.

Regards,
Vasishta