Bug 1546168 - Some libvirt domains created by nova have an empty "<nova:owner>" attribute in the embedded metadata
Summary: Some libvirt domains created by nova have an empty "<nova:owner>" attribute i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 12.0 (Pike)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: z5
: 11.0 (Ocata)
Assignee: Lee Yarwood
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks: 1558103
TreeView+ depends on / blocked
 
Reported: 2018-02-16 13:41 UTC by Lars Kellogg-Stedman
Modified: 2023-01-13 10:46 UTC (History)
12 users (show)

Fixed In Version: openstack-nova-15.0.8-13.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1558103 (view as bug list)
Environment:
Last Closed: 2018-05-18 17:14:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
a list of nova servers with bad metadata (987 bytes, text/plain)
2018-02-16 13:41 UTC, Lars Kellogg-Stedman
no flags Details
output of openstack server event list (8.76 KB, text/plain)
2018-02-16 13:42 UTC, Lars Kellogg-Stedman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-21390 0 None None None 2023-01-13 10:46:25 UTC
Red Hat Product Errata RHBA-2018:1624 0 None None None 2018-05-18 17:16:29 UTC

Description Lars Kellogg-Stedman 2018-02-16 13:41:47 UTC
Created attachment 1397038 [details]
a list of nova servers with bad metadata

Description of problem:

I am working with a Red Hat academic partner.  They have deployed OSP12 manually (no director).  Some of the compute instances have an empty "<nova:owner>" attribute in the embedded metadata in the libvirt XML.  This is in turn causing ceilometer to blow up.

For example:

  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="14.0.3-9.el7ost"/>
      <nova:name>wm-solr</nova:name>
      <nova:creationTime>2017-05-24 15:29:24</nova:creationTime>
      <nova:flavor name="m1.large">
        <nova:memory>8192</nova:memory>
        <nova:disk>80</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>4</nova:vcpus>
      </nova:flavor>
      <nova:owner/>
      <nova:root type="image" uuid="a332ad63-8b38-4700-b818-b39fa69233a9"/>
    </nova:instance>
  </metadata>

This causes ceilometer to fail because it is trying to retrieve the owner/user and owner/project elements. Out of 144 servers, there are about 14 instances that exhibit this problem.  The servers still exists in the Nova inventory, as do the owning project and user. These instances were all created with previous versions of OSP.

I've attached the result of a simple validation that I ran across the compute nodes that shows, for each server exhibiting this problem, which version of nova was used to create it.


Version-Release number of selected component (if applicable):

openstack-nova-compute-16.0.2-9.el7ost.noarch

lyarwood points out that https://review.openstack.org/#/c/399679/ landed upstream recently and maybe is relevant to this issue.

Comment 1 Lars Kellogg-Stedman 2018-02-16 13:42:46 UTC
Created attachment 1397039 [details]
output of openstack server event list

lyarwood suggests that the result of 'openstack server event list' for one of the servers might be of interest.

Comment 2 Lee Yarwood 2018-02-16 14:04:35 UTC
(In reply to Lars Kellogg-Stedman from comment #1)
> Created attachment 1397039 [details]
> output of openstack server event list
> 
> lyarwood suggests that the result of 'openstack server event list' for one
> of the servers might be of interest.

I think this could be linked to the evacuation of the instance but I've not attempted to reproduce. It appears we tried to use a request context with user_name, project_id and project_name all set to None, that results in LibvirtConfigGuestMetaNovaOwner.format_dom() returning <nova:owner/> :

nova/virt/libvirt/driver.py

3842     def _get_guest_config_meta(self, context, instance):                        
3843         """Get metadata config for guest."""                                    
[..]                                                                  
3854         if context is not None:                                                 
3855             ometa = vconfig.LibvirtConfigGuestMetaNovaOwner()                   
3856             ometa.userid = context.user_id                                      
3857             ometa.username = context.user_name                                  
3858             ometa.projectid = context.project_id                                
3859             ometa.projectname = context.project_name                            
3860             meta.owner = ometa 

nova/virt/libvirt/config.py

2476 class LibvirtConfigGuestMetaNovaOwner(LibvirtConfigObject):
[..]
2489     def format_dom(self):                                                       
2490         meta = super(LibvirtConfigGuestMetaNovaOwner, self).format_dom()        
2491         if self.userid is not None and self.username is not None:               
2492             user = self._text_node("user", self.username)                       
2493             user.set("uuid", self.userid)                                       
2494             meta.append(user)                                                   
2495         if self.projectid is not None and self.projectname is not None:         
2496             project = self._text_node("project", self.projectname)              
2497             project.set("uuid", self.projectid)                                 
2498             meta.append(project)                                                
2499         return meta                                    

https://review.openstack.org/#/c/399679/ actually landed in Pike and has switched to using the instance object to populate these fields. I'll try to backport this to Ocata and Newton for OSP to help avoid this going forward.

For this customer the best way to workaround this now is to stop and start the instances, forcing the domain XML to be recreated with the correct owner details on Pike. Can you confirm that this resolves the issue with Ceilometer? FWIW I'd also suggest following up with that team to handle this situation.

Comment 3 Lars Kellogg-Stedman 2018-02-16 14:07:37 UTC
I will ask the customer about stopping/starting these servers. That may not be possible at this time.

On the ceilometer side, I have opened https://bugs.launchpad.net/ceilometer/+bug/1749960 upstream and submitted a fix that would make ceilometer less sensitive to this sort of issue.

Comment 4 Lars Kellogg-Stedman 2018-02-16 14:16:45 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1546176 is the bugzilla version of the upstream bug.

Comment 14 errata-xmlrpc 2018-05-18 17:14:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1624


Note You need to log in before you can comment on or make changes to this bug.