Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1516952 - Cannot boot vm with sriov port after upgrade OSP11 to OSP12
Cannot boot vm with sriov port after upgrade OSP11 to OSP12
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
12.0 (Pike)
x86_64 Linux
urgent Severity urgent
: rc
: 12.0 (Pike)
Assigned To: Stephen Finucane
Eran Kuris
: Triaged
Depends On: 1507225 1516634
Blocks: 1518879
  Show dependency treegraph
 
Reported: 2017-11-23 11:29 EST by Eran Kuris
Modified: 2018-02-05 14:18 EST (History)
18 users (show)

See Also:
Fixed In Version: openstack-nova-16.0.2-3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1518879 (view as bug list)
Environment:
Last Closed: 2017-12-13 17:23:28 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
comput_sos (17.06 MB, application/x-xz)
2017-11-23 11:29 EST, Eran Kuris
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1735188 None None None 2017-11-29 09:56 EST
OpenStack gerrit 523914 None master: NEW nova: Fix ValueError when loading old pci device record (I5de0979e280004c1ce0acc99d69cc96089a704f8) 2017-11-29 11:19 EST
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-15 20:43:25 EST

  None (edit)
Description Eran Kuris 2017-11-23 11:29:22 EST
Created attachment 1358311 [details]
comput_sos

Description of problem:
After upgrade OSP11 to OSP12 (with sriov & Composable roles), getting an error when trying to boot VM with sriov port.
In nova logs I see this trace: 
2017-11-23 12:09:36.028 1 INFO nova.service [req-af2ce51c-73fc-4ea4-9b67-0c71c80f031a - - - - -] Updating service version for nova-compute on compute-0.localdomain from 16 to 22
2017-11-23 12:09:36.284 1 WARNING nova.compute.monitors [req-af2ce51c-73fc-4ea4-9b67-0c71c80f031a - - - - -] Excluding nova.compute.monitors.cpu monitor virt_driver. Not in the list of enabl
ed monitors (CONF.compute_monitors).
2017-11-23 12:09:36.942 1 WARNING nova.pci.utils [req-af2ce51c-73fc-4ea4-9b67-0c71c80f031a - - - - -] No net device was found for VF 0000:05:11.0: PciDeviceNotFoundById: PCI device 0000:05:1
1.0 not found
2017-11-23 12:09:37.479 1 ERROR nova.compute.manager [req-af2ce51c-73fc-4ea4-9b67-0c71c80f031a - - - - -] Error updating resources for node compute-0.localdomain.: ValueError: Field `uuid' c
annot be None
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 123, in _object_dispatch
    return getattr(target, method)(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper
    result = fn(cls, context, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nova/objects/pci_device.py", line 458, in get_by_compute_node
    db_dev_list)
  File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 1121, in obj_make_list
    **extra_args)
  File "/usr/lib/python2.7/site-packages/nova/objects/pci_device.py", line 194, in _from_db_object
    setattr(pci_device, key, db_dev[key])
  File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 72, in setter
    field_value = field.coerce(self, name, value)
  File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/fields.py", line 193, in coerce
    return self._null(obj, attr)

  File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/fields.py", line 171, in _null
    raise ValueError(_("Field `%s' cannot be None") % attr)

ValueError: Field `uuid' cannot be None


Version-Release number of selected component (if applicable):
OSP12
rpm -qa |grep nova 
python-nova-16.0.2-2.el7ost.noarch
python-novaclient-9.1.1-1.el7ost.noarch
openstack-nova-placement-api-16.0.2-2.el7ost.noarch
openstack-nova-console-16.0.2-2.el7ost.noarch
openstack-nova-scheduler-16.0.2-2.el7ost.noarch
puppet-nova-11.4.0-2.el7ost.noarch
openstack-nova-novncproxy-16.0.2-2.el7ost.noarch
openstack-nova-common-16.0.2-2.el7ost.noarch
openstack-nova-api-16.0.2-2.el7ost.noarch
openstack-nova-conductor-16.0.2-2.el7ost.noarch
[root@compute-0 ~]# rpm -qa |grep sriov
openstack-neutron-sriov-nic-agent-11.0.1-5.el7ost.noarch
[root@compute-0 ~]# rpm -qa |grep openvs
openstack-neutron-openvswitch-11.0.1-5.el7ost.noarch
openvswitch-ovn-host-2.7.2-4.git20170719.el7fdp.x86_64
openvswitch-2.7.2-4.git20170719.el7fdp.x86_64
openvswitch-ovn-common-2.7.2-4.git20170719.el7fdp.x86_64
openvswitch-ovn-central-2.7.2-4.git20170719.el7fdp.x86_64
python-openvswitch-2.7.2-4.git20170719.el7fdp.noarch

How reproducible:
100%

Steps to Reproduce:
1.Deploy OSP-11 sriov with Composable role 
2.Run upgrade to osp12 use this guide: https://gitlab.cee.redhat.com/mcornea/OSP11-OSP12-Upgrade/blob/master/README.md
3. after upgrade process completed try to boot VM with SRIOV port.

Actual results:
Getting error 

Expected results:


Additional info:
vm with normal port can be booted and it works well.
The old instances from OSP11 still working with full connectivity
Comment 1 Eran Kuris 2017-11-23 11:38:14 EST
According to log and debugging with Dev there is some communication between the nova-compute manager and the Nova conductor that there is some kind of constraint being violated "Field 'uuid' cannot be None".
Now it may turn out that neutron isn't returning some kind of payload on an existing port that is supposed to match up with something in the database and it is not but...
The stack trace is specific to nova's handling of PCI resource management

Thanks to Brent Eagles & Marius Cornea for help
Comment 3 Stephen Finucane 2017-11-29 08:51:38 EST
This looks like an issue with commit 15ac5b688bf6d91ac42ca33860d187d80289d82d in upstream nova, which added the UUID field to the PciDevice model (pci_devices table). This change contained an online migration to populate the field with a UUID but that clearly isn't being applied here. This could be an issue with upgrades or with the change itself. My money's on the latter.
Comment 4 Lee Yarwood 2017-11-29 09:07:42 EST
(In reply to Stephen Finucane from comment #3)
> This looks like an issue with commit
> 15ac5b688bf6d91ac42ca33860d187d80289d82d in upstream nova, which added the
> UUID field to the PciDevice model (pci_devices table). This change contained
> an online migration to populate the field with a UUID but that clearly isn't
> being applied here. This could be an issue with upgrades or with the change
> itself. My money's on the latter.

Well, either way we need controller logs ASAP from the upgraded node to confirm if the migrations were run for n-api.

In addition I'd like more details on the roles used here, we've seen issues with the use of roles shipped within infrared so I wouldn't be surprised if that's causing an issue here.
Comment 5 Eran Kuris 2017-11-29 09:32:14 EST
(In reply to Lee Yarwood from comment #4)
> (In reply to Stephen Finucane from comment #3)
> > This looks like an issue with commit
> > 15ac5b688bf6d91ac42ca33860d187d80289d82d in upstream nova, which added the
> > UUID field to the PciDevice model (pci_devices table). This change contained
> > an online migration to populate the field with a UUID but that clearly isn't
> > being applied here. This could be an issue with upgrades or with the change
> > itself. My money's on the latter.
> 
> Well, either way, we need controller logs ASAP from the upgraded node to
> confirm if the migrations were run for n-API.

I am working on deploy new setup and reproduce the issue.
 
>, In addition, I'd like more details on the roles used here, we've seen issues
> with the use of roles shipped within infrared so I wouldn't be surprised if
> that's causing an issue here.

This is the templates file that I am using, the roles that I am using are "Compute" & "Contoler": 

https://code.engineering.redhat.com/gerrit/gitweb?p=Neutron-QE.git;a=tree;f=BM_heat_template/ospd-11-multiple-nic-vlans-sriov-hybrid-ha;h=085c2382ab582545c193d3829b07dbcb207f196a;hb=refs/heads/master

I will let you know when I have setup with reproduction.
Comment 6 Matt Riedemann 2017-11-29 09:46:39 EST
(8:45:18 AM) mriedem: i see the problem
(8:45:26 AM) mriedem: _from_db_object isn't handling the uuid column properly
(8:45:40 AM) mriedem: https://review.openstack.org/#/c/469147/2/nova/objects/pci_device.py@194
(8:45:45 AM) mriedem: there should be a skip in there
(8:46:13 AM) mriedem: if key not in ('extra_info', 'uuid'):
(8:46:21 AM) mriedem: stephenfin: do you have a launchpad bug yet?
Comment 7 Matt Riedemann 2017-11-29 10:53:58 EST
https://review.openstack.org/#/c/523914/
Comment 10 Eran Kuris 2017-11-30 11:27:45 EST
Fixed verified during upgrade from OSP11 to OSP12 puddle 2017-11-29.2 pass. 
Old instances worked well as expected.
I success to boot new instance with Normal port & SRIOV port {PF & VF } 

rpm -qa | grep nova 
python-novaclient-9.1.1-1.el7ost.noarch
openstack-nova-compute-16.0.2-3.el7ost.noarch
openstack-nova-scheduler-16.0.2-3.el7ost.noarch
openstack-nova-conductor-16.0.2-3.el7ost.noarch
openstack-nova-common-16.0.2-3.el7ost.noarch
python-nova-16.0.2-3.el7ost.noarch
openstack-nova-placement-api-16.0.2-3.el7ost.noarch
openstack-nova-novncproxy-16.0.2-3.el7ost.noarch
openstack-nova-migration-16.0.2-3.el7ost.noarch
openstack-nova-console-16.0.2-3.el7ost.noarch
puppet-nova-11.4.0-2.el7ost.noarch
openstack-nova-api-16.0.2-3.el7ost.noarch
Comment 13 errata-xmlrpc 2017-12-13 17:23:28 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.