Bug 1390846

Summary: ceilometer-polling: libvirt: QEMU Driver error : Domain not found: no domain with matching uuid
Product: Red Hat OpenStack Reporter: Chen <cchen>
Component: openstack-ceilometerAssignee: Mehdi ABAAKOUK <mabaakou>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.0 (Liberty)CC: augol, cshastri, jbiao, jruzicka, knakai, mabaakou, mlopes, srevivo
Target Milestone: ---Keywords: Reopened, Triaged, ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ceilometer-5.0.5-6.el7ost Doc Type: Bug Fix
Doc Text:
Previously, the ceilometer compute agent retrieved the list of instances residing on the Compute node and cached it, but when a instance was deleted the cache is not cleaned. Consequently, the ceilometer compute agent would logs many superfluous messages about missing instances on the compute node, until it was restarted. With this update, deleted instances are processed to update the cache accordingly. As a result, when the cache is refreshed, the ceilometer compute agent stops logging messages about missing instances.
Story Points: ---
Clone Of:
: 1441349 1454576 (view as bug list) Environment:
Last Closed: 2017-06-20 12:47:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1441349, 1454576    

Description Chen 2016-11-02 05:12:05 UTC
Description of problem:

ceilometer-polling: libvirt: QEMU Driver error : Domain not found: no domain with matching uuid <UUID> is output in /var/log/messages.

Version-Release number of selected component (if applicable):

OSP 8.0

How reproducible:

100%

Steps to Reproduce:

One possible reproduce method:

1. Create an instance
2. virsh undefine <domain>
3. Wait for 10 minutes and check /var/log/messages

Actual results:

The above message will be output

Expected results:

Can we add some exception handling when some instance has been deleted ?


Additional info:

Comment 1 Mehdi ABAAKOUK 2016-11-18 14:17:55 UTC
If the domain is destroyed in libvirt but not in nova, that the expected message.

Comment 2 Chen 2016-11-18 15:44:23 UTC
Hi Mehdi,

Thank you for your reply.

Is there any chance that we can have a better handling so that we can forbid this message from outputting in the logs ? Polling a non-existent instance has no meaning so can we make this error message silent ?

Best Regards,
Chen

Comment 3 Mehdi ABAAKOUK 2016-11-21 13:10:24 UTC
If you want the message to disappear, you must delete the instance in nova.
For ceilometer, the instance should exists because nova tell us it exists.
If something is wrong/outofsync between libvirt and nova and we can't really known why from a Ceilometer PoV, so we print a message. We can't do more.

Comment 4 Chen 2017-03-14 06:29:58 UTC
Hi Mehdi,

Could you please confirm our RHSOP8 is being affected by the following bugs ?

https://bugs.launchpad.net/ceilometer/+bug/1656166
https://review.openstack.org/#/c/333129/

Best Regards,
Chen

Comment 5 Mehdi ABAAKOUK 2017-03-14 08:31:22 UTC
Good finding,

Yes, the caching mechanism have been introduced in RHOSP 8, and have this issue.

But the fix can't be backported alone and depends on another feature introduced in RHOSP9: https://review.openstack.org/#/c/284322/

These also introduces two new configuration options.

The bug also affects RHOS 9 and 10.

Comment 6 Chen 2017-03-16 06:22:59 UTC
Hi Mehdi,

Thank you for your reply.

So just to clarify, is this issue

1. impossible to be backported to OSP8 due to https://review.openstack.org/#/c/284322/

2. can be backported to OSP8 despite of https://review.openstack.org/#/c/284322/
 but it would take more time

Which one is correct ?

Best Regards,
Chen

Comment 7 Mehdi ABAAKOUK 2017-03-16 07:49:48 UTC
It's option 2, also these changes have to be backported on OSP8, OSP9 and OSP10 to ensure future upgrade will not introduce regression again.

Comment 10 Mehdi ABAAKOUK 2017-05-16 05:06:32 UTC
Everything is already done, I just don't get why this BZ does not go to ON_QA phase.

Comment 17 Mehdi ABAAKOUK 2017-05-23 06:22:44 UTC
For further OSP10 question about this bug the clone is here: https://bugzilla.redhat.com/show_bug.cgi?id=1454576

Comment 23 Sasha Smolyak 2017-06-15 14:51:44 UTC
No message in var/log/messages after undefying instance

Comment 25 errata-xmlrpc 2017-06-20 12:47:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1543