1640262 – RBD mirroring - mirror status of some images remains outdated

Bug 1640262 - RBD mirroring - mirror status of some images remains outdated

Summary: RBD mirroring - mirror status of some images remains outdated

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD-Mirror
Sub Component:
Version:	3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	3.2
Assignee:	Jason Dillaman
QA Contact:	Vasishta
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-17 17:21 UTC by Vasishta
Modified:	2019-01-03 19:02 UTC (History)
CC List:	7 users (show)
Fixed In Version:	RHEL: ceph-12.2.8-27.el7cp Ubuntu: ceph_12.2.8-25redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-01-03 19:02:09 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	36500	None	None	None	2018-10-17 20:50:26 UTC
Ceph Project Bug Tracker	36659	None	None	None	2018-10-31 18:31:13 UTC
Github	ceph ceph pull 24320	None	closed	luminous: rbd: [rbd-mirror] failed assertion when updating mirror status	2021-02-10 12:17:03 UTC
Github	ceph ceph pull 24646	None	closed	rbd-mirror: always attempt to restart canceled status update task	2021-02-10 12:17:03 UTC
Red Hat Product Errata	RHBA-2019:0020	None	None	None	2019-01-03 19:02:17 UTC

Description Vasishta 2018-10-17 17:21:57 UTC

Description of problem:
Mirror status of some rbd images remain outdated for indefinite amount of time resulting in confusion regarding data sync. 

Version-Release number of selected component (if applicable):
rbd-mirror-12.2.8-16.el7cp.x86_64

How reproducible:
Faced issue on 2-3 images per 10 images created

Steps to Reproduce:
1. Configure two clusters and configure rbd-mirroring (We had established multi-secondary, One-way mirroring)
2. Create some images and wait for a while.
3. Check mirror image status and check 'last_update' and compare it with current system time.

Actual results:
Mirror image status of some images remains outdated for indefinite time
(status of xyz9 and xyz10 here depicts that xyz9's status has not been updated for too long)

xyz9:
  global_id:   aa8c6814-c4ba-4ad6-a770-abca58aefbcf
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=7], mirror_position=[], entries_behind_master=7
  last_update: 2018-10-17 15:48:09
xyz10:
  global_id:   618ef890-c376-4822-a688-50d11347989f
  state:       up+replaying
  description: replaying, master_position=[object_number=1, tag_tid=2, entry_tid=1], mirror_position=[object_number=1, tag_tid=2, entry_tid=1], entries_behind_master=0
  last_update: 2018-10-17 17:13:40


Expected results:
Mirror status must be intact

Comment 8 Vasishta 2018-10-26 06:12:09 UTC

Hi Jason,

It was observed that the mirror status of image which was forcefully promoted (while working on failover scenario) is not getting updated for indefinite time. Is this expected ?

--------------------------------
$ sudo rbd mirror image promote data/big --force
Image promoted to primary

$ date "+%T"
06:11:37

$  sudo rbd mirror image status data/big
big:
  global_id:   9bfdf3e1-29d0-4c48-9bd1-e0ed3848d20f
  state:       up+replaying
  description: replaying, master_position=[object_number=257, tag_tid=4, entry_tid=131609], mirror_position=[object_number=257, tag_tid=4, entry_tid=131609], entries_behind_master=0
  last_update: 2018-10-26 05:47:10

$ sudo rbd info data/big
rbd image 'big':
	size 10GiB in 2560 objects
	order 22 (4MiB objects)
	block_name_prefix: rbd_data.10806b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, journaling
	flags: 
	create_timestamp: Thu Oct 25 09:04:32 2018
	journal: 10806b8b4567
	mirroring state: enabled
	mirroring global id: 9bfdf3e1-29d0-4c48-9bd1-e0ed3848d20f
	mirroring primary: true

--------------------------------

Is this expected ?

Regards,
Vasishta Shatsry
QE, Ceph

Comment 9 Jason Dillaman 2018-10-26 12:42:28 UTC

@Vasishta: do you have any additional steps to repeat this behavior? Was this cluster the primary cluster or one of the secondary clusters?

Comment 10 Vasishta 2018-10-26 15:44:22 UTC

Jason, 

I just put down primary and one of the secondary and promoted image from remaining one secondary.

Comment 11 Jason Dillaman 2018-10-26 17:39:55 UTC

@Vasishta: "put down primary" as in shut down the cluster? That would result in the rbd-mirror daemon in-flight ops to that cluster getting hung, so at least that would explain why it wasn't getting updated.

Comment 12 Vasishta 2018-10-27 03:33:57 UTC

(In reply to Jason Dillaman from comment #11)
> @Vasishta: "put down primary" as in shut down the cluster? 

Yes, I had shutdown the cluster.


> That would result in the rbd-mirror daemon in-flight ops to that cluster getting hung, so at
> least that would explain why it wasn't getting updated.

Oh okay, Thanks for the clarification.

Comment 18 errata-xmlrpc 2019-01-03 19:02:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020

Note You need to log in before you can comment on or make changes to this bug.