1365648 – [rbd-mirror] - Unable to write data on the promoted image from secondary rbd host

Bug 1365648 - [rbd-mirror] - Unable to write data on the promoted image from secondary rbd host

Summary: [rbd-mirror] - Unable to write data on the promoted image from secondary rbd ...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	3.0
Assignee:	Jason Dillaman
QA Contact:	Vasishta
Docs Contact:	Bara Ancincova
URL:
Whiteboard:
Depends On:	1379835 1379837 1382044
Blocks:	1283413 1322504 1383917 1412948
TreeView+	depends on / blocked

Reported:	2016-08-09 18:07 UTC by Hemanth Kumar
Modified:	2019-02-26 22:31 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	.Unable to write data on a promoted image after a non orderly shutdown In RBD mirroring configuration, after an non orderly shutdown of the local cluster, images are demoted to non-primary on the local cluster and promoted to primary on the remote cluster. If this happens and the `rbd-mirror` daemon is not restarted on the remote cluster, it is not possible to write data on the promoted image because `rbd-daemon` considers the demoted image on the local cluster to be the primary one. To avoid this issue, restart the `rbd-mirror` daemon to gain the read/write access to the promoted image.
Clone Of:
Environment:
Last Closed:	2017-08-08 19:57:33 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	16974	0	None	None	None	2016-08-09 19:39:24 UTC
Red Hat Product Errata	RHSA-2016:2815	0	normal	SHIPPED_LIVE	Moderate: Red Hat Ceph Storage security, bug fix, and enhancement update	2017-03-22 02:06:33 UTC

Comment 8 Jason Dillaman 2016-09-09 19:42:27 UTC

Note: this issue impacts the integration of the Cinder RBD replication driver. A forced-promotion results in a read-only image for OpenStack instances attempting to access the volume.

Comment 9 Jason Dillaman 2016-09-22 18:52:07 UTC

Upstream, master branch PR: https://github.com/ceph/ceph/pull/11090

Comment 10 Jason Dillaman 2016-09-27 20:12:06 UTC

Added dependency on two upstream issues that this issue depends upon for a clean cherry-pick.

Comment 15 Rachana Patel 2016-10-24 20:22:03 UTC

Verified with 10.2.3-8.el7cp.x86_64, able to write to promoted image hence moving to verifed state

Comment 21 errata-xmlrpc 2016-11-22 19:29:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html

Comment 22 Rachana Patel 2017-02-16 23:38:03 UTC

Description:-
============
Able to reproduce issue where remote is unreachable and Admin performs Failover after a Non-Orderly Shutdown hence reopening this bug

version
=======
10.2.5-13.el7cp.x86_64



Steps to Reproduce:
===================
1. have 3 cluster. Site A being primary and Site B and site C are secondary sites
(site B has bidirectional relation with A while C has one-directional)
2. enable pool level or image level mirroring for few images.
3. create images and let it sync to secondary
4. perform a Non-orderly shutdown of master and Failover to site B

images on site B are read-only even after successful promotion of images.

Comment 23 Harish NV Rao 2017-02-17 09:40:48 UTC

This has worked in 2.1 (see comment 15) and failing in 2.2 now. Looks like a regression. So, setting the target release to 2.2.

Comment 24 Jason Dillaman 2017-02-17 13:05:29 UTC

@Harish: your assumption is not correct -- when it was validated for 2.1, the remote cluster was never shut down. This issue has always been an issue and has never been fixed (thus it has never regressed). Moving back to 2.3 since this will not be fixed in time for 2.2.

Comment 33 Federico Lucifredi 2017-02-17 23:00:44 UTC

@rachan, @harish, let's rewrite the test case. Shutdown should not be "orderly" for this test.


If you want to be gentler on the primary cluster than pulling plugs would be, I think pulling the network link to the secondary is just fine. Jason, do you agree?

Comment 45 Ken Dreyer (Red Hat) 2017-05-05 14:09:11 UTC

Since this BZ was attached to a shipped errata, but the issue is unfixed, I recommend we open another BZ to track this, because we cannot attach it to any advisory now.

Jason and Harish, are you ok with this plan?

Note You need to log in before you can comment on or make changes to this bug.