1435984 – CEPH RBD mirroring - cinder failover-host replicated volumes in state error

Bug 1435984 - CEPH RBD mirroring - cinder failover-host replicated volumes in state error

Summary: CEPH RBD mirroring - cinder failover-host replicated volumes in state error

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-cinder
Sub Component:
Version:	11.0 (Ocata)
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jon Bernard
QA Contact:	Tzach Shefi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1412804
TreeView+	depends on / blocked

Reported:	2017-03-26 12:16 UTC by Tzach Shefi
Modified:	2019-07-23 19:15 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-23 19:15:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Cinder logs and config (75.73 KB, application/x-gzip) 2017-03-26 12:16 UTC, Tzach Shefi	no flags	Details
View All

Description Tzach Shefi 2017-03-26 12:16:25 UTC

Created attachment 1266509 [details]
Cinder logs and config

Description of problem: On an RBD mirroring deployment, post cinder failover-host command replicated volumes remain in status error. Replicated volume's snapshot however remains in status available. 


Version-Release number of selected component (if applicable):
rhel7.3
openstack-cinder-10.0.1-0.20170310192919.b05afc3.el7ost.noarch
python-cinderclient-1.11.0-1.el7ost.noarch
puppet-cinder-10.3.0-1.el7ost.noarch
python-cinder-10.0.1-0.20170310192919.b05afc3.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. Create 3 Cinder volumes two replicated and one none replicated. 

cinder list
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| ID                                   | Status    | Name        | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| 95bf822c-5390-4618-bd26-494e4767435d | available | NoneReplvol | 1    | -           | false    |             |
| a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | RepVolume2  | 1    | REPL        | false    |             |
| e7447352-06fa-443b-ab18-9be707b90f8b | available | RepVolume1  | 1    | REPL        | false    |             |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+

2. Create snapshot's of all volumes:

cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+-----------------+------+
| ID                                   | Volume ID                            | Status    | Name            | Size |
+--------------------------------------+--------------------------------------+-----------+-----------------+------+
| 0ae49ade-13ca-4e4c-b894-26650a457352 | e7447352-06fa-443b-ab18-9be707b90f8b | available | SnapRepVolume1  | 1    |
| 5726b798-8ece-4984-bd36-5728bdd6b177 | a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | SnapRepVolume2  | 1    |
| c51fba12-e31e-4b3d-9da1-6f32ffae13bc | 95bf822c-5390-4618-bd26-494e4767435d | available | SnapNoneReplvol | 1    |
+--------------------------------------+--------------------------------------+-

I've deleted two snapshots from ^ list but it's irrelevant to this bug. 

3. #inder failover-host   (wait for a few minutes)
#cinder service-list --withreplication

cinder service-list --withreplication
+------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| Binary           | Host                   | Zone | Status   | State | Updated_at                 | Replication Status | Active Backend ID | Frozen | Disabled Reason |
+------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| cinder-scheduler | hostgroup              | nova | enabled  | up    | 2017-03-26T12:00:34.000000 |                    |                   |        | -               |
| cinder-volume    | hostgroup@tripleo_ceph | nova | disabled | up    | 2017-03-26T12:00:33.000000 | failed-over        | cephb             | False  | failed-over     |
+------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+



As can be seen system has now failed over to cephb

4. Notice all three volumes are in error state, expected but only for none replicated volume!

 [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+-------------+------+-------------+----------+-------------+
| ID                                   | Status | Name        | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+-------------+------+-------------+----------+-------------+
| 95bf822c-5390-4618-bd26-494e4767435d | error  | NoneReplvol | 1    | -           | false    |             |
| a21307f8-5c58-4ad4-8245-92fbe05d3951 | error  | RepVolume2  | 1    | REPL        | false    |             |
| e7447352-06fa-443b-ab18-9be707b90f8b | error  | RepVolume1  | 1    | REPL        | false    |             |
+--------------------------------------+--------+-------------+------+-------------+----------+-------------+

5. Even more odd snapshot of replicated volume, the only one left before I failed over is still available.

 [stack@undercloud-0 ~]$ cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+----------------+------+
| ID                                   | Volume ID                            | Status    | Name           | Size |
+--------------------------------------+--------------------------------------+-----------+----------------+------+
| 5726b798-8ece-4984-bd36-5728bdd6b177 | a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | SnapRepVolume2 | 1    |
+--------------------------------------+--------------------------------------+-----------+----------------+------+


Actual results:

All volumes are in error state, this is expected only for None replicated volume. 

It's odd that a replicated volume's snapshot is available, yet it's base volume isn't. 

Expected results:

None replicated volumes should indeed be  in error state, yet replicated volumes should be available. 

Additional info:

Comment 1 Paul Grist 2017-04-06 03:09:20 UTC

Targeting the replication testing bugs found to OSP12

Comment 6 John Visser 2019-07-23 19:15:06 UTC

Will be resolved when  replication is correctly deployed (OSP-17) and tripleO can deploy a replicated volume

Note You need to log in before you can comment on or make changes to this bug.