Bug 1435984

Summary:

CEPH RBD mirroring - cinder failover-host replicated volumes in state error

Product:

Red Hat OpenStack

Reporter:

Tzach Shefi <tshefi>

Component:

openstack-cinder

Assignee:

Jon Bernard <jobernar>

Status:

CLOSED WONTFIX

QA Contact:

Tzach Shefi <tshefi>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

11.0 (Ocata)

CC:

eharney, geguileo, jvisser, pgrist, scohen, srevivo, tvignaud

Target Milestone:

---

Keywords:

Triaged, ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-07-23 19:15:06 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1412804

Attachments:

Description	Flags
Cinder logs and config	none

Description Tzach Shefi 2017-03-26 12:16:25 UTC

Created attachment 1266509 [details]
Cinder logs and config

Description of problem: On an RBD mirroring deployment, post cinder failover-host command replicated volumes remain in status error. Replicated volume's snapshot however remains in status available. 


Version-Release number of selected component (if applicable):
rhel7.3
openstack-cinder-10.0.1-0.20170310192919.b05afc3.el7ost.noarch
python-cinderclient-1.11.0-1.el7ost.noarch
puppet-cinder-10.3.0-1.el7ost.noarch
python-cinder-10.0.1-0.20170310192919.b05afc3.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. Create 3 Cinder volumes two replicated and one none replicated. 

cinder list
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| ID                                   | Status    | Name        | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| 95bf822c-5390-4618-bd26-494e4767435d | available | NoneReplvol | 1    | -           | false    |             |
| a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | RepVolume2  | 1    | REPL        | false    |             |
| e7447352-06fa-443b-ab18-9be707b90f8b | available | RepVolume1  | 1    | REPL        | false    |             |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+

2. Create snapshot's of all volumes:

cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+-----------------+------+
| ID                                   | Volume ID                            | Status    | Name            | Size |
+--------------------------------------+--------------------------------------+-----------+-----------------+------+
| 0ae49ade-13ca-4e4c-b894-26650a457352 | e7447352-06fa-443b-ab18-9be707b90f8b | available | SnapRepVolume1  | 1    |
| 5726b798-8ece-4984-bd36-5728bdd6b177 | a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | SnapRepVolume2  | 1    |
| c51fba12-e31e-4b3d-9da1-6f32ffae13bc | 95bf822c-5390-4618-bd26-494e4767435d | available | SnapNoneReplvol | 1    |
+--------------------------------------+--------------------------------------+-

I've deleted two snapshots from ^ list but it's irrelevant to this bug. 

3. #inder failover-host   (wait for a few minutes)
#cinder service-list --withreplication

cinder service-list --withreplication
+------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| Binary           | Host                   | Zone | Status   | State | Updated_at                 | Replication Status | Active Backend ID | Frozen | Disabled Reason |
+------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| cinder-scheduler | hostgroup              | nova | enabled  | up    | 2017-03-26T12:00:34.000000 |                    |                   |        | -               |
| cinder-volume    | hostgroup@tripleo_ceph | nova | disabled | up    | 2017-03-26T12:00:33.000000 | failed-over        | cephb             | False  | failed-over     |
+------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+



As can be seen system has now failed over to cephb

4. Notice all three volumes are in error state, expected but only for none replicated volume!

 [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+-------------+------+-------------+----------+-------------+
| ID                                   | Status | Name        | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+-------------+------+-------------+----------+-------------+
| 95bf822c-5390-4618-bd26-494e4767435d | error  | NoneReplvol | 1    | -           | false    |             |
| a21307f8-5c58-4ad4-8245-92fbe05d3951 | error  | RepVolume2  | 1    | REPL        | false    |             |
| e7447352-06fa-443b-ab18-9be707b90f8b | error  | RepVolume1  | 1    | REPL        | false    |             |
+--------------------------------------+--------+-------------+------+-------------+----------+-------------+

5. Even more odd snapshot of replicated volume, the only one left before I failed over is still available.

 [stack@undercloud-0 ~]$ cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+----------------+------+
| ID                                   | Volume ID                            | Status    | Name           | Size |
+--------------------------------------+--------------------------------------+-----------+----------------+------+
| 5726b798-8ece-4984-bd36-5728bdd6b177 | a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | SnapRepVolume2 | 1    |
+--------------------------------------+--------------------------------------+-----------+----------------+------+


Actual results:

All volumes are in error state, this is expected only for None replicated volume. 

It's odd that a replicated volume's snapshot is available, yet it's base volume isn't. 

Expected results:

None replicated volumes should indeed be  in error state, yet replicated volumes should be available. 

Additional info:

Comment 1 Paul Grist 2017-04-06 03:09:20 UTC

Targeting the replication testing bugs found to OSP12

Comment 6 John Visser 2019-07-23 19:15:06 UTC

Will be resolved when  replication is correctly deployed (OSP-17) and tripleO can deploy a replicated volume