| Summary: | [Ceph] Failed to delete a volume when running tempest tests | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | lkuchlan <lkuchlan> |
| Component: | openstack-cinder | Assignee: | Jon Bernard <jobernar> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Tzach Shefi <tshefi> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | 10.0 (Newton) | CC: | dsariel, egafford, eharney, jobernar, lkuchlan, pgrist, srevivo, yfried |
| Target Milestone: | --- | Keywords: | Automation, AutomationBlocker, Triaged, ZStream |
| Target Release: | 11.0 (Ocata) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-04-06 17:25:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
This also fails both versions of scenario/test_volume_boot_pattern/test_volume_boot_pattern on cleanup:
Traceback (most recent call last):
File "/root/tempest-dir/tempest/lib/common/rest_client.py", line 864, in wait_for_resource_deletion
raise exceptions.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (TestVolumeBootPatternV2:_run_cleanups) Failed to delete volume be27b1ce-fec1-4831-a0fc-91c6fe66d9e1 within the required time (300 s).
https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/Nightly/job/qe-nightly-8_director-rhel-7.2-virthost-1cont_1comp-ipv4-vxlan-ceph-external/105/
The test is cleaning up a volume with a server and a snapshot attached to it. Even though the server and snapshot are deleted in Openstack(nova and cinder) DB, Ceph takes longer to clear its "watchers" on the attached volume, so when the cinder-delete requests is submitted it refuses to delete the volume. On the other hand, Cinder is async so it reports 202 on REST delete request.
The result is that the user (Tempest) believes the resource (volume) is being deleted and waits for it to disappear (loop on GET requests) until timeout is reached and the test fails.
In my opinion, Cinder should set the volume status to ERROR when DELETE is refused by backend (same as Nova does). Also, a delete loop could be a nice-to-have, if there isn't one already.
Jon can you take a look and see if you should link the launchpad bug here to the patch that looks like it may be the issue - https://review.openstack.org/#/c/281550/ The theory described here is different from what the posted patch addresses, so there might be another issue. I'm not sure I could change the volume status in that circumstance, so the solution may lie in tempest, will look closer. No recent progress on this issue. Moving to RHOS 11 for triage. This is no longer reproducing, so closing it out. Thanks for the update. |
Description of problem: Failed to delete a volume when the volume is created from a snapshot with time out when running tempest tests, using Ceph backend How reproducible: 100% Steps to Reproduce: Run tempest tests: 1. testr init 2. testr run tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot 3. testr run tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern Actual results: Failed to delete a volume snapshot with time out while using a Ceph backend Captured traceback: ~~~~~~~~~~~~~~~~~~~ Traceback (most recent call last): File "tempest/lib/common/utils/test_utils.py", line 84, in call_and_ignore_notfound_exc return func(*args, **kwargs) File "tempest/lib/common/rest_client.py", line 864, in wait_for_resource_deletion raise exceptions.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: (VolumesV1SnapshotTestJSON:_run_cleanups) Failed to delete volume-snapshot 09b80e0f-8598-4bb1-a823-c30cecb4fd03 within the required time (196 s). Expected results: Volume snapshot should be deleted successfully Additional info: {1} tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_volume_from_snapshot [200.510540s] ... FAILED This is the only information I have related to this test: http://logs.openstack.org/62/372062/10/check/gate-tempest-dsvm-full-devstack-plugin-ceph-ubuntu-xenial/6468ae2/console.html https://projects.engineering.redhat.com/browse/RHOSINFRA-313