Bug 1636463 - nova-compute's UUID doesn't correlate with nova_api's resource_providers table contents
Summary: nova-compute's UUID doesn't correlate with nova_api's resource_providers tabl...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 12.0 (Pike)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-05 13:14 UTC by Alex Stupnikov
Modified: 2023-03-21 19:00 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-15 12:03:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-11696 0 None None None 2021-12-10 18:01:23 UTC

Description Alex Stupnikov 2018-10-05 13:14:41 UTC
Description of problem:

After a major upgrade from RHOSP 10 to RHOSP 12 customer is unable to migrate instances to one compute node. The reason is simple: resource_providers table (from nova_api DB) contains a record for this compute with incorrect UUID.

As a result, we get the following errors in controller's nova-api.log:


nova-placement-api.log.4:2018-09-29 03:00:33.567 13 INFO nova.api.openstack.placement.requestlog [req-6b4052bd-320b-423f-9e4c-70ff982b1cc2 05dece580b6045f69b78773126d70e9f d25886a9aad140bc9bd68e4847d07a83 - default default] 172.31.0.21 "GET /placement/resource_providers/COMPUTE_UUID_THAT_DOESN'T_EXIST_IN_nova_api_DB" status: 404 len: 227 microversion: 1.0
Version-Release number of selected component (if applicable):


At the same time the following error occurs in compute's nova-compute.log:

  2018-09-30 04:00:46.130 1 INFO nova.scheduler.client.report [req-bc2109d9-fff0-42a0-b537-1750bfdc13c0 - - - - -] [req-a4bb47ce-552a-4b0e-ab23-8d8c4e4dc26f] Another thread already created a resource provider with the UUID COMPUTE_UUID_THAT_DOESN'T_EXIST_IN_nova_api_DB. Grabbing that record from the placement API.
  2018-09-30 04:00:46.142 1 WARNING nova.scheduler.client.report [req-bc2109d9-fff0-42a0-b537-1750bfdc13c0 - - - - -] Unable to refresh my resource provider record
  2018-09-30 04:00:46.213 1 WARNING nova.scheduler.client.report [req-bc2109d9-fff0-42a0-b537-1750bfdc13c0 - - - - -] Unable to submit allocation for instance d38c4fd9-f66c-4656-95a9-717c51d067cb (400 <html>
   <head>
    <title>400 Bad Request</title>
   </head>
   <body>
    <h1>400 Bad Request</h1>
    The server could not comply with the request since it is either malformed or otherwise incorrect.<br /><br />
  Allocation for resource provider 'COMPUTE_UUID_THAT_DOESN'T_EXIST_IN_nova_api_DB' that does not exist.
  
  
   </body>
  </html>)


It looks like there is a problem in DB, since we get COMPUTE_UUID_THAT_DOESN'T_EXIST_IN_nova_api_DB in """nova hypervisor-list""" output.

I will provide links to appropriate sosreports in the next message.

Setting Severity to High as customer's production doesn't work properly because of this issue. At this point I need advice on quick fix/workaround (for example, to change something in DB or to change nova-ompute's config, etc) and could provide more data to investigate this further.


Note You need to log in before you can comment on or make changes to this bug.