Bug 1530603 - snapshots.list following delete sometimes fails (using the API)
Summary: snapshots.list following delete sometimes fails (using the API)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.2.1
: ---
Assignee: Daniel Erez
QA Contact: Lilach Zitnitski
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-03 13:13 UTC by Yedidyah Bar David
Modified: 2018-02-12 11:48 UTC (History)
3 users (show)

Fixed In Version: ovirt-engine-4.2.1.2
Clone Of:
Environment:
Last Closed: 2018-02-12 11:48:09 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 86132 0 master MERGED restapi: SnapshotsResource - ignore GetVmConfiguration failure 2018-01-14 10:35:59 UTC

Description Yedidyah Bar David 2018-01-03 13:13:53 UTC
Description of problem:

See [1].

ovirt-requests-log has:

[03/Jan/2018:04:04:55 -0500] 192.168.200.1 "Correlation-Id: e04649fe-d5a8-4960-a4c2-62317c4b44ef" "Duration: 492586us" "DELETE /ovirt-engine/api/vms/45283516-8016-4fc9-b1c3-0f954f461102/snapshots/41e33b04-329c-4de5-acf4-7b9ab9bb4e5f HTTP/1.1" 254

and later:

[03/Jan/2018:04:05:40 -0500] 192.168.200.1 "Correlation-Id: 61dc1c6a-c97a-4ab2-85c9-7980fadb9c46" "Duration: 88061us" "GET /ovirt-engine/api/vms?search=name%3Dvm0 HTTP/1.1" 7004

[03/Jan/2018:04:05:40 -0500] 192.168.200.1 "Correlation-Id: 6ac1a177-37b0-4c32-981b-5ad2613af58e" "Duration: 425023us" "GET /ovirt-engine/api/vms/45283516-8016-4fc9-b1c3-0f954f461102/snapshots HTTP/1.1" 8750

[03/Jan/2018:04:05:43 -0500] 192.168.200.1 "Correlation-Id: 9d222519-d2b4-4072-97c1-ee1d08569a51" "Duration: 88089us" "GET /ovirt-engine/api/vms?search=name%3Dvm0 HTTP/1.1" 7004

[03/Jan/2018:04:05:43 -0500] 192.168.200.1 "Correlation-Id: 37d73c43-ccb7-4f68-b684-7dbdccdf0b44" "Duration: 52298us" "GET /ovirt-engine/api/vms/45283516-8016-4fc9-b1c3-0f954f461102/snapshots HTTP/1.1" 155

engine.log has:

2018-01-03 04:05:43,759-05 WARN  [org.ovirt.engine.core.bll.GetVmConfigurationBySnapshotQuery] (default task-3) [37d73c43-ccb7-4f68-b684-7dbdccdf0b44] Snapshot '41e33b04-329c-4de5-acf4-7b9ab9bb4e5f' does not exist

2018-01-03 04:05:43,759-05 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-3) [] Operation Failed: Entity not found: null

This fails the test. lago.log has:

2018-01-03 09:05:43,758::testlib.py::assert_equals_within::227::ovirtlago.testlib::ERROR::    * Unhandled exception in <function <lambda> at 0x477ec80>
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 219, in assert_equals_within
    res = func()
  File "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-ansible-suite-master/test-scenarios/004_basic_sanity.py", line 190, in <lambda>
    (len(api.vms.get(VM0_NAME).snapshots.list()) == 2) and
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py", line 34602, in list
    headers={"All-Content":all_content}
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 46, in get
    return self.request(method='GET', url=url, headers=headers, cls=cls)
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 122, in request
    persistent_auth=self.__persistent_auth
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 79, in do_request
    persistent_auth)
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 162, in __do_request
    raise errors.RequestError(response_code, response_reason, response_body)
RequestError: 
status: 404
reason: Not Found
detail: Entity not found: null

The code doing this, in that test, is [2] (this is a link to a gerrit change, but that change does not touch the test code - so you can check he-basic-master code in current master):

 187     api.vms.get(VM0_NAME).snapshots.list()[-2].delete()
 188     testlib.assert_true_within_short(
 189         lambda:
 190         (len(api.vms.get(VM0_NAME).snapshots.list()) == 2) and
 191         (api.vms.get(VM0_NAME).snapshots.list()[-1].snapshot_status
 192          == 'ok'),
 193     )

In theory it might a bug in the test - perhaps we should refresh something in the sdk or something like that.

[1] http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/3275/artifact/exported-artifacts/he-basic-ansible_suite_master__logs/test_logs/he-basic-ansible-suite-master/post-004_basic_sanity.py/

[2] https://gerrit.ovirt.org/gitweb?p=ovirt-system-tests.git;a=blob;f=he-basic-suite-master/test-scenarios/004_basic_sanity.py;h=d49c06476061eb96947e13e780c113e2c176ad3c;hb=f85558977e325636164b8bf6113a10ba3cd44c75#l151

Comment 1 Allon Mureinik 2018-01-04 15:01:32 UTC
Daniel, I have a vague recollection you handled something similar with domains missing some/all of their connections?

Comment 2 Daniel Erez 2018-01-09 16:15:33 UTC
(In reply to Yedidyah Bar David from comment #0)
> 
> The code doing this, in that test, is [2] (this is a link to a gerrit
> change, but that change does not touch the test code - so you can check
> he-basic-master code in current master):
> 
>  187     api.vms.get(VM0_NAME).snapshots.list()[-2].delete()
>  188     testlib.assert_true_within_short(
>  189         lambda:
>  190         (len(api.vms.get(VM0_NAME).snapshots.list()) == 2) and
>  191         (api.vms.get(VM0_NAME).snapshots.list()[-1].snapshot_status
>  192          == 'ok'),
>  193     )

The error [1] originated form GetVmConfigurationBySnapshotQuery, which is a part of the flow of retrieving snapshots (GetVmConfigurationBySnapshot). The snapshot is missing since it's being deleted. To avoid such races, the suggested patch ignores the result of GetVmConfigurationBySnapshot on failure.
This should fix the issue, however, I would suggest to simplify the test by checking the existence of the deleted snapshot. E.g. something like:

  snapshots_service = vm_service.snapshots_service()
  snapshot = snaps_service.list()[-2]
  snapshot_service = snapshots_service.snapshot_service(snapshot.id)
  snapshot_service.remove()
  testlib.assert_true_within_short(
      lambda:
      get_snapshot(snapshots_service, snapshot.id) == None
      ),
  )

  def get_snapshot(snapshots_service, id):
    try:
        return snapshots_service.snapshot_service('cc').get()
    except:
        return None


[1] [org.ovirt.engine.core.bll.GetVmConfigurationBySnapshotQuery] (default task-3) [37d73c43-ccb7-4f68-b684-7dbdccdf0b44] Snapshot '41e33b04-329c-4de5-acf4-7b9ab9bb4e5f' does not exist

Comment 3 Lilach Zitnitski 2018-01-28 09:30:00 UTC
Daniel, can you add steps to reproduce?

Comment 4 Daniel Erez 2018-01-28 09:43:15 UTC
(In reply to Lilach Zitnitski from comment #3)
> Daniel, can you add steps to reproduce?

Steps are described in the description[*], it was encountered on OST.
In a nutshell, the scenario is: (using the api)
* Delete a snapshot
* Immediately get snapshots list

[*]
"
The code doing this, in that test, is [2] (this is a link to a gerrit change, but that change does not touch the test code - so you can check he-basic-master code in current master):

 187     api.vms.get(VM0_NAME).snapshots.list()[-2].delete()
 188     testlib.assert_true_within_short(
 189         lambda:
 190         (len(api.vms.get(VM0_NAME).snapshots.list()) == 2) and
 191         (api.vms.get(VM0_NAME).snapshots.list()[-1].snapshot_status
 192          == 'ok'),
 193     )
"

Comment 5 Allon Mureinik 2018-01-28 10:15:08 UTC
Daniel - looking at the patch, it seems this bug is not specific to the V3 API. Can you please confirm (and edit the title) or refute (and explain how I've got it wrong)?

Comment 6 Daniel Erez 2018-01-28 10:19:50 UTC
(In reply to Allon Mureinik from comment #5)
> Daniel - looking at the patch, it seems this bug is not specific to the V3
> API. Can you please confirm (and edit the title) or refute (and explain how
> I've got it wrong)?

Indeed, it has been reproduced also in v4.

Comment 7 Lilach Zitnitski 2018-01-28 16:29:58 UTC
--------------------------------------
Tested with the following code:
----------------------------------------
rhvm-4.2.1.3-0.1.el7.noarch
vdsm-4.20.17-1.el7ev.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. using API, remove snapshot
2. using API, get snapshot list

Actual results:
No errors appear in the log

Expected results:

Moving to VERIFIED!

Comment 8 Sandro Bonazzola 2018-02-12 11:48:09 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.