Bug 1435984
Summary: | CEPH RBD mirroring - cinder failover-host replicated volumes in state error | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Tzach Shefi <tshefi> | ||||
Component: | openstack-cinder | Assignee: | Jon Bernard <jobernar> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Tzach Shefi <tshefi> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 11.0 (Ocata) | CC: | eharney, geguileo, jvisser, pgrist, scohen, srevivo, tvignaud | ||||
Target Milestone: | --- | Keywords: | Triaged, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-07-23 19:15:06 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1412804 | ||||||
Attachments: |
|
Targeting the replication testing bugs found to OSP12 Will be resolved when replication is correctly deployed (OSP-17) and tripleO can deploy a replicated volume |
Created attachment 1266509 [details] Cinder logs and config Description of problem: On an RBD mirroring deployment, post cinder failover-host command replicated volumes remain in status error. Replicated volume's snapshot however remains in status available. Version-Release number of selected component (if applicable): rhel7.3 openstack-cinder-10.0.1-0.20170310192919.b05afc3.el7ost.noarch python-cinderclient-1.11.0-1.el7ost.noarch puppet-cinder-10.3.0-1.el7ost.noarch python-cinder-10.0.1-0.20170310192919.b05afc3.el7ost.noarch How reproducible: Steps to Reproduce: 1. Create 3 Cinder volumes two replicated and one none replicated. cinder list +--------------------------------------+-----------+-------------+------+-------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+-------------+------+-------------+----------+-------------+ | 95bf822c-5390-4618-bd26-494e4767435d | available | NoneReplvol | 1 | - | false | | | a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | RepVolume2 | 1 | REPL | false | | | e7447352-06fa-443b-ab18-9be707b90f8b | available | RepVolume1 | 1 | REPL | false | | +--------------------------------------+-----------+-------------+------+-------------+----------+-------------+ 2. Create snapshot's of all volumes: cinder snapshot-list +--------------------------------------+--------------------------------------+-----------+-----------------+------+ | ID | Volume ID | Status | Name | Size | +--------------------------------------+--------------------------------------+-----------+-----------------+------+ | 0ae49ade-13ca-4e4c-b894-26650a457352 | e7447352-06fa-443b-ab18-9be707b90f8b | available | SnapRepVolume1 | 1 | | 5726b798-8ece-4984-bd36-5728bdd6b177 | a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | SnapRepVolume2 | 1 | | c51fba12-e31e-4b3d-9da1-6f32ffae13bc | 95bf822c-5390-4618-bd26-494e4767435d | available | SnapNoneReplvol | 1 | +--------------------------------------+--------------------------------------+- I've deleted two snapshots from ^ list but it's irrelevant to this bug. 3. #inder failover-host (wait for a few minutes) #cinder service-list --withreplication cinder service-list --withreplication +------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+ | Binary | Host | Zone | Status | State | Updated_at | Replication Status | Active Backend ID | Frozen | Disabled Reason | +------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+ | cinder-scheduler | hostgroup | nova | enabled | up | 2017-03-26T12:00:34.000000 | | | | - | | cinder-volume | hostgroup@tripleo_ceph | nova | disabled | up | 2017-03-26T12:00:33.000000 | failed-over | cephb | False | failed-over | +------------------+------------------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+ As can be seen system has now failed over to cephb 4. Notice all three volumes are in error state, expected but only for none replicated volume! [stack@undercloud-0 ~]$ cinder list +--------------------------------------+--------+-------------+------+-------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+--------+-------------+------+-------------+----------+-------------+ | 95bf822c-5390-4618-bd26-494e4767435d | error | NoneReplvol | 1 | - | false | | | a21307f8-5c58-4ad4-8245-92fbe05d3951 | error | RepVolume2 | 1 | REPL | false | | | e7447352-06fa-443b-ab18-9be707b90f8b | error | RepVolume1 | 1 | REPL | false | | +--------------------------------------+--------+-------------+------+-------------+----------+-------------+ 5. Even more odd snapshot of replicated volume, the only one left before I failed over is still available. [stack@undercloud-0 ~]$ cinder snapshot-list +--------------------------------------+--------------------------------------+-----------+----------------+------+ | ID | Volume ID | Status | Name | Size | +--------------------------------------+--------------------------------------+-----------+----------------+------+ | 5726b798-8ece-4984-bd36-5728bdd6b177 | a21307f8-5c58-4ad4-8245-92fbe05d3951 | available | SnapRepVolume2 | 1 | +--------------------------------------+--------------------------------------+-----------+----------------+------+ Actual results: All volumes are in error state, this is expected only for None replicated volume. It's odd that a replicated volume's snapshot is available, yet it's base volume isn't. Expected results: None replicated volumes should indeed be in error state, yet replicated volumes should be available. Additional info: