Hide Forgot
Created attachment 1358311 [details] comput_sos Description of problem: After upgrade OSP11 to OSP12 (with sriov & Composable roles), getting an error when trying to boot VM with sriov port. In nova logs I see this trace: 2017-11-23 12:09:36.028 1 INFO nova.service [req-af2ce51c-73fc-4ea4-9b67-0c71c80f031a - - - - -] Updating service version for nova-compute on compute-0.localdomain from 16 to 22 2017-11-23 12:09:36.284 1 WARNING nova.compute.monitors [req-af2ce51c-73fc-4ea4-9b67-0c71c80f031a - - - - -] Excluding nova.compute.monitors.cpu monitor virt_driver. Not in the list of enabl ed monitors (CONF.compute_monitors). 2017-11-23 12:09:36.942 1 WARNING nova.pci.utils [req-af2ce51c-73fc-4ea4-9b67-0c71c80f031a - - - - -] No net device was found for VF 0000:05:11.0: PciDeviceNotFoundById: PCI device 0000:05:1 1.0 not found 2017-11-23 12:09:37.479 1 ERROR nova.compute.manager [req-af2ce51c-73fc-4ea4-9b67-0c71c80f031a - - - - -] Error updating resources for node compute-0.localdomain.: ValueError: Field `uuid' c annot be None Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 123, in _object_dispatch return getattr(target, method)(*args, **kwargs) File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper result = fn(cls, context, *args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/objects/pci_device.py", line 458, in get_by_compute_node db_dev_list) File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 1121, in obj_make_list **extra_args) File "/usr/lib/python2.7/site-packages/nova/objects/pci_device.py", line 194, in _from_db_object setattr(pci_device, key, db_dev[key]) File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 72, in setter field_value = field.coerce(self, name, value) File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/fields.py", line 193, in coerce return self._null(obj, attr) File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/fields.py", line 171, in _null raise ValueError(_("Field `%s' cannot be None") % attr) ValueError: Field `uuid' cannot be None Version-Release number of selected component (if applicable): OSP12 rpm -qa |grep nova python-nova-16.0.2-2.el7ost.noarch python-novaclient-9.1.1-1.el7ost.noarch openstack-nova-placement-api-16.0.2-2.el7ost.noarch openstack-nova-console-16.0.2-2.el7ost.noarch openstack-nova-scheduler-16.0.2-2.el7ost.noarch puppet-nova-11.4.0-2.el7ost.noarch openstack-nova-novncproxy-16.0.2-2.el7ost.noarch openstack-nova-common-16.0.2-2.el7ost.noarch openstack-nova-api-16.0.2-2.el7ost.noarch openstack-nova-conductor-16.0.2-2.el7ost.noarch [root@compute-0 ~]# rpm -qa |grep sriov openstack-neutron-sriov-nic-agent-11.0.1-5.el7ost.noarch [root@compute-0 ~]# rpm -qa |grep openvs openstack-neutron-openvswitch-11.0.1-5.el7ost.noarch openvswitch-ovn-host-2.7.2-4.git20170719.el7fdp.x86_64 openvswitch-2.7.2-4.git20170719.el7fdp.x86_64 openvswitch-ovn-common-2.7.2-4.git20170719.el7fdp.x86_64 openvswitch-ovn-central-2.7.2-4.git20170719.el7fdp.x86_64 python-openvswitch-2.7.2-4.git20170719.el7fdp.noarch How reproducible: 100% Steps to Reproduce: 1.Deploy OSP-11 sriov with Composable role 2.Run upgrade to osp12 use this guide: https://gitlab.cee.redhat.com/mcornea/OSP11-OSP12-Upgrade/blob/master/README.md 3. after upgrade process completed try to boot VM with SRIOV port. Actual results: Getting error Expected results: Additional info: vm with normal port can be booted and it works well. The old instances from OSP11 still working with full connectivity
According to log and debugging with Dev there is some communication between the nova-compute manager and the Nova conductor that there is some kind of constraint being violated "Field 'uuid' cannot be None". Now it may turn out that neutron isn't returning some kind of payload on an existing port that is supposed to match up with something in the database and it is not but... The stack trace is specific to nova's handling of PCI resource management Thanks to Brent Eagles & Marius Cornea for help
This looks like an issue with commit 15ac5b688bf6d91ac42ca33860d187d80289d82d in upstream nova, which added the UUID field to the PciDevice model (pci_devices table). This change contained an online migration to populate the field with a UUID but that clearly isn't being applied here. This could be an issue with upgrades or with the change itself. My money's on the latter.
(In reply to Stephen Finucane from comment #3) > This looks like an issue with commit > 15ac5b688bf6d91ac42ca33860d187d80289d82d in upstream nova, which added the > UUID field to the PciDevice model (pci_devices table). This change contained > an online migration to populate the field with a UUID but that clearly isn't > being applied here. This could be an issue with upgrades or with the change > itself. My money's on the latter. Well, either way we need controller logs ASAP from the upgraded node to confirm if the migrations were run for n-api. In addition I'd like more details on the roles used here, we've seen issues with the use of roles shipped within infrared so I wouldn't be surprised if that's causing an issue here.
(In reply to Lee Yarwood from comment #4) > (In reply to Stephen Finucane from comment #3) > > This looks like an issue with commit > > 15ac5b688bf6d91ac42ca33860d187d80289d82d in upstream nova, which added the > > UUID field to the PciDevice model (pci_devices table). This change contained > > an online migration to populate the field with a UUID but that clearly isn't > > being applied here. This could be an issue with upgrades or with the change > > itself. My money's on the latter. > > Well, either way, we need controller logs ASAP from the upgraded node to > confirm if the migrations were run for n-API. I am working on deploy new setup and reproduce the issue. >, In addition, I'd like more details on the roles used here, we've seen issues > with the use of roles shipped within infrared so I wouldn't be surprised if > that's causing an issue here. This is the templates file that I am using, the roles that I am using are "Compute" & "Contoler": https://code.engineering.redhat.com/gerrit/gitweb?p=Neutron-QE.git;a=tree;f=BM_heat_template/ospd-11-multiple-nic-vlans-sriov-hybrid-ha;h=085c2382ab582545c193d3829b07dbcb207f196a;hb=refs/heads/master I will let you know when I have setup with reproduction.
(8:45:18 AM) mriedem: i see the problem (8:45:26 AM) mriedem: _from_db_object isn't handling the uuid column properly (8:45:40 AM) mriedem: https://review.openstack.org/#/c/469147/2/nova/objects/pci_device.py@194 (8:45:45 AM) mriedem: there should be a skip in there (8:46:13 AM) mriedem: if key not in ('extra_info', 'uuid'): (8:46:21 AM) mriedem: stephenfin: do you have a launchpad bug yet?
https://review.openstack.org/#/c/523914/
Fixed verified during upgrade from OSP11 to OSP12 puddle 2017-11-29.2 pass. Old instances worked well as expected. I success to boot new instance with Normal port & SRIOV port {PF & VF } rpm -qa | grep nova python-novaclient-9.1.1-1.el7ost.noarch openstack-nova-compute-16.0.2-3.el7ost.noarch openstack-nova-scheduler-16.0.2-3.el7ost.noarch openstack-nova-conductor-16.0.2-3.el7ost.noarch openstack-nova-common-16.0.2-3.el7ost.noarch python-nova-16.0.2-3.el7ost.noarch openstack-nova-placement-api-16.0.2-3.el7ost.noarch openstack-nova-novncproxy-16.0.2-3.el7ost.noarch openstack-nova-migration-16.0.2-3.el7ost.noarch openstack-nova-console-16.0.2-3.el7ost.noarch puppet-nova-11.4.0-2.el7ost.noarch openstack-nova-api-16.0.2-3.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462