Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1602852

Summary:	Instance launch consistently fails after launching 4 instances with nova placement-api in RHOS 13
Product:	Red Hat OpenStack	Reporter:	Punit Kundal <pkundal>
Component:	openstack-nova	Assignee:	OSP DFG:Compute <osp-dfg-compute>
Status:	CLOSED ERRATA	QA Contact:	OSP DFG:Compute <osp-dfg-compute>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	13.0 (Queens)	CC:	akaris, awaugama, berrange, dasmith, eglynn, jhakimra, joflynn, jraju, jschluet, kchamart, mbooth, mmahudha, mschuppe, mwitt, nova-maint, pkundal, sbauza, sferdjao, sgordon, sputhenp, srevivo, vromanso
Target Milestone:	z2	Keywords:	Triaged, ZStream
Target Release:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-nova-17.0.5-2.d7864fbgit.el7ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-08-29 16:40:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 6 Matthew Booth 2018-07-19 12:01:20 UTC

There's a NoValidHost at 2018-07-18 19:08:56.092. At 2018-07-18 19:08:20.200 we see nova compute getting inventory and allocations from placement:

Inventory:
{
    "resource_provider_generation": 48,
    "inventories": {
        "VCPU": {"allocation_ratio": 16.0, "total": 48, "reserved": 0, "step_size": 1, "min_unit": 1, "max_unit": 48},
        "MEMORY_MB": {"allocation_ratio": 1.0, "total": 262050, "reserved": 4096, "step_size": 1, "min_unit": 1, "max_unit": 262050},
        "DISK_GB": {"allocation_ratio": 1.0, "total": 558, "reserved": 0, "step_size": 1, "min_unit": 1, "max_unit": 558}
    }
}

Usage:
{
    "resource_provider_generation": 48,
    "allocations": {
        "33ff4e37-c55b-414d-b8b3-87b5981ab690":
            {"resources": {"VCPU": 1, "MEMORY_MB": 2048, "DISK_GB": 30}},
        "6dac0468-767c-4ad5-85c8-221206e4296e":
            {"resources": {"VCPU": 1, "MEMORY_MB": 2048, "DISK_GB": 30}},
        "b578bc98-ca19-4cc5-b3a8-1d1e5af58bac":
            {"resources": {"VCPU": 3, "MEMORY_MB": 6144, "DISK_GB": 90}},
        "2c53a4a8-bfa6-4d16-9a6f-f314c4d33d53":
            {"resources": {"VCPU": 3, "MEMORY_MB": 6144, "DISK_GB": 90}},
        "a8f4f605-8d8b-41be-8d48-c23348ef41e8":
            {"resources": {"VCPU": 1, "MEMORY_MB": 2048, "DISK_GB": 30}},
        "4b941155-a99a-486a-adcb-bd5ec67c72db":
            {"resources": {"VCPU": 3, "MEMORY_MB": 6144, "DISK_GB": 90}},
        "6bdaf3e2-c08d-40d5-946b-41ddf1213abe":
            {"resources": {"VCPU": 3, "MEMORY_MB": 6144, "DISK_GB": 90}},
        "050e4e1b-73e7-4615-9762-66902bc0148b":
            {"resources": {"VCPU": 1, "MEMORY_MB": 2048, "DISK_GB": 30}},
        "f7d6d55c-1a09-4ee4-b5b5-a28d5ba8db74":
            {"resources": {"VCPU": 2, "MEMORY_MB": 4096, "DISK_GB": 60}}
    }
}

So we can see that placement believes the compute is out of disk. However, I have a suspicion that some of those allocations don't relate to instances on the compute. We see several instances of:

2018-07-18 19:07:20.328 [e-0/N-CPU] 1 INFO nova.compute.resource_tracker [req-c07e604d-8e10-4448-805c-4ae08a79ad72 46cc17d1cdd74e7db8fe0e445f13f7d2 a658040386a545d28a20f94272734aa8 - default default] Insta
nce f7d6d55c-1a09-4ee4-b5b5-a28d5ba8db74 has allocations against this compute host but is not found in the database.

These are not necessarily bugs, but I'm investigating.

Comment 7 Matthew Booth 2018-07-19 13:12:14 UTC

It looks to me as though all the 'phantom' allocations are associated with instances which encountered a messaging error during creation. That problem is one we should continue to investigate.

To get the customer going again, we should manually delete the allocations from placement. Please could you re-fetch allocations, then confirm the status of each instance and delete allocations for any which no longer exist using:

https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-delete

Comment 15 melanie witt 2018-07-27 17:33:26 UTC

I looked through the logs for instance f7d6d55c-1a09-4ee4-b5b5-a28d5ba8db74 which was mentioned in comment 6 as being logged about in nova-compute.log "Instance f7d6d55c-1a09-4ee4-b5b5-a28d5ba8db74 has allocations against this compute host but is not found in the database."

In nova-api.log.8 I found that instance f7d6d55c-1a09-4ee4-b5b5-a28d5ba8db74 was local deleted because it had never been scheduled, instance.host = None, presumably because of the MQ timeout issues:

[instance: f7d6d55c-1a09-4ee4-b5b5-a28d5ba8db74] instance's host None is down, deleting from database

And I did *not* see the expected logging for deletion of the allocation: "Deleted allocation for instance" and did not see any requests in the nova-placement-api.log for the instance allocation.

In installed-rpms I found the deployment has openstack-nova-*-17.0.3 installed and that version is *before* the bug "Allocations are not cleaned up in placement for instance 'local delete' case" mentioned in comment 13 was fixed. The fix was released in nova version 7.0.5.

You will need to install openstack-nova-*-17.0.5 to get the fix to delete allocations when deleting instances that failed to schedule or have a compute node down.

Comment 20 Joanne O'Flynn 2018-08-13 10:07:06 UTC

This bug is marked for inclusion in the errata but does not currently contain draft documentation text. To ensure the timely release of this advisory please provide draft documentation text for this bug as soon as possible.

If you do not think this bug requires errata documentation, set the requires_doc_text flag to "-".


To add draft documentation text:

* Select the documentation type from the "Doc Type" drop down field.

* A template will be provided in the "Doc Text" field based on the "Doc Type" value selected. Enter draft text in the "Doc Text" field.

Comment 22 errata-xmlrpc 2018-08-29 16:40:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2588

Comment 23 Red Hat Bugzilla 2023-09-18 00:14:06 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days