Bug 1441349 - ceilometer-polling: libvirt: QEMU Driver error : Domain not found: no domain with matching uuid
Summary: ceilometer-polling: libvirt: QEMU Driver error : Domain not found: no domain ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 9.0 (Mitaka)
Assignee: Mehdi ABAAKOUK
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On: 1390846 1454576
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-11 18:00 UTC by Mehdi ABAAKOUK
Modified: 2017-06-19 19:10 UTC (History)
7 users (show)

Fixed In Version: openstack-ceilometer-6.1.5-6.el7ost
Doc Type: Bug Fix
Doc Text:
Cause: The ceilometer compute agent retrieves the list of instances residing on the Compute node and caches it. When an instance is deleted the cache is not cleaned. Consequence: The ceilometer compute agent will log many useless messages about missing instances on the compute node until it is restarted. Fix: Deleted instances are now updated the cache. Result: When the cache is refreshed, the ceilometer compute agent no longer logs incorrect messages about missing instances.
Clone Of: 1390846
Environment:
Last Closed: 2017-06-19 19:10:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1507 0 normal SHIPPED_LIVE openstack-ceilometer bug fix advisory 2017-06-19 23:10:01 UTC

Description Mehdi ABAAKOUK 2017-04-11 18:00:31 UTC
+++ This bug was initially created as a clone of Bug #1390846 +++

Description of problem:

ceilometer-polling: libvirt: QEMU Driver error : Domain not found: no domain with matching uuid <UUID> is output in /var/log/messages.

Version-Release number of selected component (if applicable):

OSP 8.0

How reproducible:

100%

Steps to Reproduce:

One possible reproduce method:

1. Create an instance
2. virsh undefine <domain>
3. Wait for 10 minutes and check /var/log/messages

Actual results:

The above message will be output

Expected results:

Can we add some exception handling when some instance has been deleted ?


Additional info:

--- Additional comment from Mehdi ABAAKOUK on 2016-11-18 15:17:55 CET ---

If the domain is destroyed in libvirt but not in nova, that the expected message.

--- Additional comment from Chen on 2016-11-18 16:44:23 CET ---

Hi Mehdi,

Thank you for your reply.

Is there any chance that we can have a better handling so that we can forbid this message from outputting in the logs ? Polling a non-existent instance has no meaning so can we make this error message silent ?

Best Regards,
Chen

--- Additional comment from Mehdi ABAAKOUK on 2016-11-21 14:10:24 CET ---

If you want the message to disappear, you must delete the instance in nova.
For ceilometer, the instance should exists because nova tell us it exists.
If something is wrong/outofsync between libvirt and nova and we can't really known why from a Ceilometer PoV, so we print a message. We can't do more.

--- Additional comment from Chen on 2017-03-14 07:29:58 CET ---

Hi Mehdi,

Could you please confirm our RHSOP8 is being affected by the following bugs ?

https://bugs.launchpad.net/ceilometer/+bug/1656166
https://review.openstack.org/#/c/333129/

Best Regards,
Chen

--- Additional comment from Mehdi ABAAKOUK on 2017-03-14 09:31:22 CET ---

Good finding,

Yes, the caching mechanism have been introduced in RHOSP 8, and have this issue.

But the fix can't be backported alone and depends on another feature introduced in RHOSP9: https://review.openstack.org/#/c/284322/

These also introduces two new configuration options.

The bug also affects RHOS 9 and 10.

--- Additional comment from Chen on 2017-03-16 07:22:59 CET ---

Hi Mehdi,

Thank you for your reply.

So just to clarify, is this issue

1. impossible to be backported to OSP8 due to https://review.openstack.org/#/c/284322/

2. can be backported to OSP8 despite of https://review.openstack.org/#/c/284322/
 but it would take more time

Which one is correct ?

Best Regards,
Chen

--- Additional comment from Mehdi ABAAKOUK on 2017-03-16 08:49:48 CET ---

It's option 2, also these changes have to be backported on OSP8, OSP9 and OSP10 to ensure future upgrade will not introduce regression again.

Comment 8 errata-xmlrpc 2017-06-19 19:10:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1507


Note You need to log in before you can comment on or make changes to this bug.