Bug 1448399 - Listing storagedomains fails with 404
Summary: Listing storagedomains fails with 404
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.2.0
: 4.2.0
Assignee: Daniel Erez
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-05 11:15 UTC by Fabrice Bacchella
Modified: 2017-12-22 07:22 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-12-20 10:44:43 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: exception+


Attachments (Terms of Use)
storage_domain_static dump (2.20 KB, application/x-gzip)
2017-05-11 10:02 UTC, Fabrice Bacchella
no flags Details
storage_server_connections dump (929 bytes, application/x-gzip)
2017-05-11 10:02 UTC, Fabrice Bacchella
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 77322 0 master MERGED restapi: use runQuery to get a StorageServerConnection 2020-11-13 17:24:18 UTC

Description Fabrice Bacchella 2017-05-05 11:15:54 UTC
The following snippet

sds_service = connection.system_service().storage_domains_service()
sd = sds_service.list(search='name=aname')[0]

 fails with:

ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "Entity not found: Storage server connection: id=6860d96f-557e-4d82-a209-401d72bd6e16". HTTP response code is 404.

It the client log, I got:
GET /ovirt-engine/api/storagedomains?search=name%3Daname HTTP/1.1
...
HTTP/1.1 404 Not Found
...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>Entity not found: Storage server connection: id=6860d96f-557e-4d82-a209-401d72bd6e16</detail>
    <reason>Operation Failed</reason>
</fault>

But I try to list the available services:
GET /ovirt-engine/api HTTP/1.1
...
<link href="/ovirt-engine/api/storagedomains?search={query}" rel="storagedomains/search"/>
...

And in engine.log:
2017-05-05 13:11:38,395+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-17) [] Operation Failed: Entity not found: Storage server connection: id=6860d96f-557e-4d82-a209-401d72bd6e16

Nothing else.

The version I use:
The ovirt's version I use:
<product_info>
 <name>oVirt Engine</name>
 <vendor>ovirt.org</vendor>
 <version>
   <build>1</build>
   <full_version>4.1.1.8-1.el7.centos</full_version>
   <major>4</major>
   <minor>1</minor>
   <revision>0</revision>
 </version>
</product_info>

Comment 1 Ondra Machacek 2017-05-05 11:37:34 UTC
I think we shouldn't fail with 404 if storage connection doesn't exists, we should ignore and continue.

Comment 2 Ondra Machacek 2017-05-05 11:38:07 UTC
To workaround this issue just remove the nonexistent storage connection.

Comment 3 Fabrice Bacchella 2017-05-05 12:40:27 UTC
sds_service.list() fails to:

GET /ovirt-engine/api/storagedomains HTTP/1.1
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>Entity not found: Storage server connection: id=6860d96f-557e-4d82-a209-401d72bd6e16</detail>
    <reason>Operation Failed</reason>
</fault>

Comment 4 Daniel Erez 2017-05-09 12:25:59 UTC
Hi Fabrice,

* Can you please attach full logs and script.
* What's the status of the relevant storage domain? Did you manually removed the storage connection?
* Also, please attach a db dump or the content of storage_server_connections table.

Comment 5 Fabrice Bacchella 2017-05-11 10:02:22 UTC
Created attachment 1277812 [details]
storage_domain_static dump

Comment 6 Fabrice Bacchella 2017-05-11 10:02:57 UTC
Created attachment 1277813 [details]
storage_server_connections dump

Comment 7 Fabrice Bacchella 2017-05-11 10:35:10 UTC
Using the content of storage_domain_static, I now have a list of my domains. So I'm running the following code, using my SDK:

for i in ('2a9fe2d7-ea38-4ced-a274-32734b7b571b', '072fbaa1-08f3-4a40-9f34-a5ca22dd1d74', 'f38b1422-82f2-44ff-b081-d3183ac2c11e', '90765c23-f911-4ab0-ba9e-dfcc11c83acb', '42e01e9c-a5f5-441d-8e90-a36eddd8eb03', '2ea4a078-3a66-4d1c-9239-622fbd45dd3b', '3d086f11-03a2-4fe7-ae84-6efdcb9c950a', '74e1dc39-ee39-4385-988f-a3e8ced63d84', '5dd13cd0-2fe9-4bd4-9769-613a4f700c7b', 'de424c0c-8c33-46d7-a08f-ebe769239a26', '814984aa-c521-4c4f-b066-05317b4d8daf', '7c5291d3-11e2-420f-99ad-47a376013671'):
    try:
        print context.storagedomains.get(id=i).name
    except Exception as e:
        print e

try:
    print context.storagedomains.list()
except Exception as e:
    print e

And getting the following results:
ISO_DOMAIN
ovirt-image-repository
vmsys01
3dpse-data01
vmslow01
vmdata01
ng318
3dpse-data02
Fault reason is "Operation Failed". Fault detail is "Entity not found: Storage server connection: id=739c5f7e-e09c-4b96-a8a7-576b7d56be21". HTTP response code is 404.
Fault reason is "Operation Failed". Fault detail is "Entity not found: Storage server connection: id=6860d96f-557e-4d82-a209-401d72bd6e16". HTTP response code is 404.
ng319
ng314
Fault reason is "Operation Failed". Fault detail is "Entity not found: Storage server connection: id=6860d96f-557e-4d82-a209-401d72bd6e16". HTTP response code is 404.

I have two broken domain, ng317 and ng316. If I want to manage then using the GUI, I get an Uncaught exception when I click on "Manage domain"

So there is inconsistency in my database, I don't know where they are coming from, and how old they are. But I should not get 404, but 500 instead. My request is good, the error is coming from the engine.

Comment 8 Fabrice Bacchella 2017-05-11 10:37:18 UTC
There are staging domains in staging datacenter, so I'm not afraid of loosing them and can do any tests you want on them.

Comment 9 Daniel Erez 2017-05-14 11:17:57 UTC
The storage connections for domains ng316 and ng317 are indeed missing. This is an unexpected situation that could have been caused due to a bug or manual manipulation of the db. So the given error code is actually correct as the entities are missing (i.e. the issue here is the missing entities). As we don't know what triggered the issue or how to reproduce it, closing the bug for now. Please re-open if reproduced.

Comment 10 Fabrice Bacchella 2017-05-15 09:17:37 UTC
I don't agree with closing this bug.

Ok, I agree you can't help me about the storage connection missing, and so my storage domain are in a broken state.

But there is still many problems:

Look at this log:
DEBUG:root:GET /ovirt-engine/api/storagedomains HTTP/1.1
DEBUG:root:User-Agent: PythonSDK/4.1.3
DEBUG:root:Version: 4
DEBUG:root:Content-Type: application/xml
DEBUG:root:Accept: application/xml
DEBUG:root:Content-Length: 0
DEBUG:root:
DEBUG:root:HTTP/1.1 404 Not Found
DEBUG:root:Date: Mon, 15 May 2017 08:39:36 GMT
DEBUG:root:Server: Apache
DEBUG:root:Content-Type: application/xml
DEBUG:root:Content-Length: 217
DEBUG:root:HTTP error before end of send, stop sending
DEBUG:root:
DEBUG:root:?
DEBUG:root:<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
DEBUG:root:<fault>
DEBUG:root:    <detail>Entity not found: Storage server connection: id=6860d96f-557e-4d82-a209-401d72bd6e16</detail>
DEBUG:root:    <reason>Operation Failed</reason>
DEBUG:root:</fault>
DEBUG:root:Closing connection 1

I just wanted to enumerate domains and getting a 404. I should still be able to see good ones, in the current situation, I can see none.

And even if storage domain are broken they still exist. So they should be returned when listing all domains. It's individual domain manipulation that should fails, I can see them on the web UI, why not in the REST API ?

I'm getting a 404. In REST, I can expect http error code to be meaningful. There is a internal error in ovirt, the request is good, so I should get a 500 only when doing a GET /ovirt-engine/api/storagedomains/<bad-domain-UUID>

Comment 11 Daniel Erez 2017-05-15 11:20:13 UTC
@Fabrice - I understand your point and frustration, in order to mitigate the issue I suggest manually adding the missing storage connections (with mock values):

E.g.

ng316:
INSERT INTO storage_server_connections VALUES ('6860d96f-557e-4d82-a209-401d72bd6e16', '/data/ng316', null, null, null, null, null, 1, null, null, null, null, null, null);

ng317:
INSERT INTO storage_server_connections VALUES ('739c5f7e-e09c-4b96-a8a7-576b7d56be21', '/data/ng317', null, null, null, null, null, 1, null, null, null, null, null, null);


Since a storage connection is a crucial part of the storage domain entity, we can't support a situation of missing connections entities. I.e. the solution should be either finding the root cause of the issue or adding the connections manually.

Comment 12 Fabrice Bacchella 2017-05-15 11:34:12 UTC
Thanks for the tip. But about the the HTTP status ? You see no problem at having an internal error of the db transforming as a 404 ?

The actual problem with those particular domains are not big deal for me, as they are only test domain, I can drop them and continue my work (I hop I can do that without having a more inconsistent database). And I understand that with such a problem I can't expect a 100% perfect solution for my use case, as you can't manage all the failure mode.

But I think good and clear error management, even for end user, is an important aspect of a good software. When I saw 404, and no domain returned, I started to think that the URL  /ovirt-engine/api/storagedomains was broken. The message was good, but I was confronted with inconsistent diagnostic messages: 404  for the URL or missing storage connection ? It made the problem more difficult to resolve. I think that the main point of this bug.

Comment 13 Daniel Erez 2017-05-15 11:43:21 UTC
(In reply to Fabrice Bacchella from comment #12)
> Thanks for the tip. But about the the HTTP status ? You see no problem at
> having an internal error of the db transforming as a 404 ?

@Juan - What's your take on that? Is 404 code status acceptable in such scenario?
I.e. listing storage domains (on python sdk) with missing storage connection entity in db (the storage connection entity is mandatory and should be always available on a proper env). 

> 
> The actual problem with those particular domains are not big deal for me, as
> they are only test domain, I can drop them and continue my work (I hop I can
> do that without having a more inconsistent database). And I understand that
> with such a problem I can't expect a 100% perfect solution for my use case,
> as you can't manage all the failure mode.
> 
> But I think good and clear error management, even for end user, is an
> important aspect of a good software. When I saw 404, and no domain returned,
> I started to think that the URL  /ovirt-engine/api/storagedomains was
> broken. The message was good, but I was confronted with inconsistent
> diagnostic messages: 404  for the URL or missing storage connection ? It
> made the problem more difficult to resolve. I think that the main point of
> this bug.

Comment 14 Juan Hernández 2017-05-17 08:14:15 UTC
Returning 404 in this case isn't correct, it should be 500.

This is quite similar to bug 1332881. I think we should investigate it deeper, find the root cause and fix it.

Comment 15 Daniel Erez 2017-05-17 09:39:29 UTC
(In reply to Juan Hernández from comment #14)
> Returning 404 in this case isn't correct, it should be 500.
> 
> This is quite similar to bug 1332881. I think we should investigate it
> deeper, find the root cause and fix it.

The root cause is the missing storage connection, but we didn't find a reproducing scenario for it (and, afaik, this issue occurred only once). In bug 1332881, it was a cleanup issue, i.e. a stale storage connection remained in db. 
Any way, is there anything we can do in the rest-api in order to return code 500 in such scenario?

Comment 16 Juan Hernández 2017-05-17 11:23:00 UTC
Currently in the API we are using the 'getEntity' method to find the storage server connection. That method, by default, generates the 404 error response if the entity can't be found. We can use the 'runQuery' method instead, explicilty check the result and generate the 500 error response. Something like this:

  private StorageServerConnections getStorageServerConnection(String id) {
    VdcQueryReturnValue result = runQuery(
      VdcQueryType.GetStorageServerConnectionById,
      new StorageServerConnectionQueryParametersBase(id)
    );
    if (result.getSucceeded() && result.getReturnValue() != null) {
      return (StorageServerConnections) result.getReturnValue();
    }
    throw new WebFaultException(
      null,
      "Can't find storage server connection for id '" + id + "'.",
      Status.INTERNAL_SERVER_ERROR
    );
  }

There are other places in the BackendStorageDomainsResource class that are using the 'getEntity' method in a similar way. For each of then please consider if it is correct to return 404.

Comment 17 Red Hat Bugzilla Rules Engine 2017-05-25 08:58:15 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 18 Kevin Alon Goldblatt 2017-07-24 17:26:56 UTC
Please provide steps to reproduce this issue in order to verify the fix

Comment 19 Daniel Erez 2017-07-25 08:30:50 UTC
(In reply to Kevin Alon Goldblatt from comment #18)
> Please provide steps to reproduce this issue in order to verify the fix

We haven't found a specific scenario for reproducing the issue. However, you can simulate it manually by removing the relevant storage server connection from DB (or just temporarily renaming it's id - 'storage_server_connections' table). Then, querying the storage domain should return error code 500.

Comment 20 Kevin Alon Goldblatt 2017-07-25 11:05:23 UTC
Verified with the following code:
---------------------------------------
ovirt-engine-4.2.0-0.0.master.20170723141021.git463826a.el7.centos.noarch
vdsm-4.20.1-218.git1b7671f.el7.centos.x86_64

Verified with the following scenario:
---------------------------------------
1. Change the id of a storage domain in the database
2. Ran a query via the browser as follows:
https://xxx.xxx.xxx.xxx.redhat.com/ovirt-engine/api/storagedomains >>>> this returned error 500

Moving to VERIFIED!

Comment 21 Sandro Bonazzola 2017-12-20 10:44:43 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.