Bug 1435984

Summary: CEPH RBD mirroring - cinder failover-host replicated volumes in state error
Product: Red Hat OpenStack Reporter: Tzach Shefi <tshefi>
Component: openstack-cinderAssignee: Jon Bernard <jobernar>
Status: CLOSED WONTFIX QA Contact: Tzach Shefi <tshefi>
Severity: medium Docs Contact:
Priority: medium    
Version: 11.0 (Ocata)CC: eharney, geguileo, jvisser, pgrist, scohen, srevivo, tvignaud
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-23 19:15:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1412804    
Attachments:
Description Flags
Cinder logs and config none

Description Tzach Shefi 2017-03-26 12:16:25 UTC
Created attachment 1266509 [details]
Cinder logs and config

Description of problem: On an RBD mirroring deployment, post cinder failover-host command replicated volumes remain in status error. Replicated volume's snapshot however remains in status available. 


Version-Release number of selected component (if applicable):
rhel7.3
openstack-cinder-10.0.1-0.20170310192919.b05afc3.el7ost.noarch
python-cinderclient-1.11.0-1.el7ost.noarch
puppet-cinder-10.3.0-1.el7ost.noarch
python-cinder-10.0.1-0.20170310192919.b05afc3.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. Create 3 Cinder volumes two replicated and one none replicated. 

cinder list
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| ID                                   | Status    | Name        | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| 95bf822c-5390-4618-bd26-494e4767435d | available | NoneReplvol | 1    | -           | false    |             |
| a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | RepVolume2  | 1    | REPL        | false    |             |
| e7447352-06fa-443b-ab18-9be707b90f8b | available | RepVolume1  | 1    | REPL        | false    |             |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+

2. Create snapshot's of all volumes:

cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+-----------------+------+
| ID                                   | Volume ID                            | Status    | Name            | Size |
+--------------------------------------+--------------------------------------+-----------+-----------------+------+
| 0ae49ade-13ca-4e4c-b894-26650a457352 | e7447352-06fa-443b-ab18-9be707b90f8b | available | SnapRepVolume1  | 1    |
| 5726b798-8ece-4984-bd36-5728bdd6b177 | a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | SnapRepVolume2  | 1    |
| c51fba12-e31e-4b3d-9da1-6f32ffae13bc | 95bf822c-5390-4618-bd26-494e4767435d | available | SnapNoneReplvol | 1    |
+--------------------------------------+--------------------------------------+-

I've deleted two snapshots from ^ list but it's irrelevant to this bug. 

3. #inder failover-host   (wait for a few minutes)
#cinder service-list --withreplication

cinder service-list --withreplication
+------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| Binary           | Host                   | Zone | Status   | State | Updated_at                 | Replication Status | Active Backend ID | Frozen | Disabled Reason |
+------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| cinder-scheduler | hostgroup              | nova | enabled  | up    | 2017-03-26T12:00:34.000000 |                    |                   |        | -               |
| cinder-volume    | hostgroup@tripleo_ceph | nova | disabled | up    | 2017-03-26T12:00:33.000000 | failed-over        | cephb             | False  | failed-over     |
+------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+



As can be seen system has now failed over to cephb

4. Notice all three volumes are in error state, expected but only for none replicated volume!

 [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+-------------+------+-------------+----------+-------------+
| ID                                   | Status | Name        | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+-------------+------+-------------+----------+-------------+
| 95bf822c-5390-4618-bd26-494e4767435d | error  | NoneReplvol | 1    | -           | false    |             |
| a21307f8-5c58-4ad4-8245-92fbe05d3951 | error  | RepVolume2  | 1    | REPL        | false    |             |
| e7447352-06fa-443b-ab18-9be707b90f8b | error  | RepVolume1  | 1    | REPL        | false    |             |
+--------------------------------------+--------+-------------+------+-------------+----------+-------------+

5. Even more odd snapshot of replicated volume, the only one left before I failed over is still available.

 [stack@undercloud-0 ~]$ cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+----------------+------+
| ID                                   | Volume ID                            | Status    | Name           | Size |
+--------------------------------------+--------------------------------------+-----------+----------------+------+
| 5726b798-8ece-4984-bd36-5728bdd6b177 | a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | SnapRepVolume2 | 1    |
+--------------------------------------+--------------------------------------+-----------+----------------+------+


Actual results:

All volumes are in error state, this is expected only for None replicated volume. 

It's odd that a replicated volume's snapshot is available, yet it's base volume isn't. 

Expected results:

None replicated volumes should indeed be  in error state, yet replicated volumes should be available. 

Additional info:

Comment 1 Paul Grist 2017-04-06 03:09:20 UTC
Targeting the replication testing bugs found to OSP12

Comment 6 John Visser 2019-07-23 19:15:06 UTC
Will be resolved when  replication is correctly deployed (OSP-17) and tripleO can deploy a replicated volume