Bug 1516952
Summary: | Cannot boot vm with sriov port after upgrade OSP11 to OSP12 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eran Kuris <ekuris> | ||||
Component: | openstack-nova | Assignee: | Stephen Finucane <stephenfin> | ||||
Status: | CLOSED ERRATA | QA Contact: | Eran Kuris <ekuris> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 12.0 (Pike) | CC: | berrange, dasmith, eglynn, ekuris, jlibosva, jschluet, kchamart, lyarwood, mcornea, mriedem, oblaut, sbauza, sferdjao, sgordon, srevivo, stephenfin, vromanso | ||||
Target Milestone: | rc | Keywords: | Triaged | ||||
Target Release: | 12.0 (Pike) | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-nova-16.0.2-3.el7ost | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1518879 (view as bug list) | Environment: | |||||
Last Closed: | 2017-12-13 22:23:28 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1507225, 1516634 | ||||||
Bug Blocks: | 1518879 | ||||||
Attachments: |
|
Description
Eran Kuris
2017-11-23 16:29:22 UTC
According to log and debugging with Dev there is some communication between the nova-compute manager and the Nova conductor that there is some kind of constraint being violated "Field 'uuid' cannot be None". Now it may turn out that neutron isn't returning some kind of payload on an existing port that is supposed to match up with something in the database and it is not but... The stack trace is specific to nova's handling of PCI resource management Thanks to Brent Eagles & Marius Cornea for help This looks like an issue with commit 15ac5b688bf6d91ac42ca33860d187d80289d82d in upstream nova, which added the UUID field to the PciDevice model (pci_devices table). This change contained an online migration to populate the field with a UUID but that clearly isn't being applied here. This could be an issue with upgrades or with the change itself. My money's on the latter. (In reply to Stephen Finucane from comment #3) > This looks like an issue with commit > 15ac5b688bf6d91ac42ca33860d187d80289d82d in upstream nova, which added the > UUID field to the PciDevice model (pci_devices table). This change contained > an online migration to populate the field with a UUID but that clearly isn't > being applied here. This could be an issue with upgrades or with the change > itself. My money's on the latter. Well, either way we need controller logs ASAP from the upgraded node to confirm if the migrations were run for n-api. In addition I'd like more details on the roles used here, we've seen issues with the use of roles shipped within infrared so I wouldn't be surprised if that's causing an issue here. (In reply to Lee Yarwood from comment #4) > (In reply to Stephen Finucane from comment #3) > > This looks like an issue with commit > > 15ac5b688bf6d91ac42ca33860d187d80289d82d in upstream nova, which added the > > UUID field to the PciDevice model (pci_devices table). This change contained > > an online migration to populate the field with a UUID but that clearly isn't > > being applied here. This could be an issue with upgrades or with the change > > itself. My money's on the latter. > > Well, either way, we need controller logs ASAP from the upgraded node to > confirm if the migrations were run for n-API. I am working on deploy new setup and reproduce the issue. >, In addition, I'd like more details on the roles used here, we've seen issues > with the use of roles shipped within infrared so I wouldn't be surprised if > that's causing an issue here. This is the templates file that I am using, the roles that I am using are "Compute" & "Contoler": https://code.engineering.redhat.com/gerrit/gitweb?p=Neutron-QE.git;a=tree;f=BM_heat_template/ospd-11-multiple-nic-vlans-sriov-hybrid-ha;h=085c2382ab582545c193d3829b07dbcb207f196a;hb=refs/heads/master I will let you know when I have setup with reproduction. (8:45:18 AM) mriedem: i see the problem (8:45:26 AM) mriedem: _from_db_object isn't handling the uuid column properly (8:45:40 AM) mriedem: https://review.openstack.org/#/c/469147/2/nova/objects/pci_device.py@194 (8:45:45 AM) mriedem: there should be a skip in there (8:46:13 AM) mriedem: if key not in ('extra_info', 'uuid'): (8:46:21 AM) mriedem: stephenfin: do you have a launchpad bug yet? Fixed verified during upgrade from OSP11 to OSP12 puddle 2017-11-29.2 pass. Old instances worked well as expected. I success to boot new instance with Normal port & SRIOV port {PF & VF } rpm -qa | grep nova python-novaclient-9.1.1-1.el7ost.noarch openstack-nova-compute-16.0.2-3.el7ost.noarch openstack-nova-scheduler-16.0.2-3.el7ost.noarch openstack-nova-conductor-16.0.2-3.el7ost.noarch openstack-nova-common-16.0.2-3.el7ost.noarch python-nova-16.0.2-3.el7ost.noarch openstack-nova-placement-api-16.0.2-3.el7ost.noarch openstack-nova-novncproxy-16.0.2-3.el7ost.noarch openstack-nova-migration-16.0.2-3.el7ost.noarch openstack-nova-console-16.0.2-3.el7ost.noarch puppet-nova-11.4.0-2.el7ost.noarch openstack-nova-api-16.0.2-3.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462 |