Bug 1809212 - [OSP15->16] Instance fails to create after Controllers upgrade.
Summary: [OSP15->16] Instance fails to create after Controllers upgrade.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Rajesh Tailor
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-02 15:29 UTC by Jose Luis Franco
Modified: 2023-03-21 19:28 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-20 11:34:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jose Luis Franco 2020-03-02 15:29:30 UTC
Description of problem:

After upgrading the controllers from OSP15 to OSP16 a workload test is executed by the upgrades jobs, in which a VM is created. However, it failed to create:

(qe-Cloud-0) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+---------------------+--------+----------+-----------------------------+--------+
| ID                                   | Name                | Status | Networks | Image                       | Flavor |
+--------------------------------------+---------------------+--------+----------+-----------------------------+--------+
| 507efff8-911a-4a61-b816-27c1946544f7 | instance_4e8ddddea4 | ERROR  |          | upgrade_workload_4e8ddddea4 |        |
+--------------------------------------+---------------------+--------+----------+-----------------------------+--------+

When checking the compute node logs, we can observe the following appearing in the nova-compute.log on and on:

    2020-03-02 13:50:05.353 8 DEBUG oslo_service.periodic_task [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Running periodic task ComputeManager._instance_usage_audit run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:217
    2020-03-02 13:50:11.343 8 DEBUG oslo_service.periodic_task [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Running periodic task ComputeManager._heal_instance_info_cache run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:217
    2020-03-02 13:50:11.343 8 DEBUG nova.compute.manager [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Starting heal instance info cache _heal_instance_info_cache /usr/lib/python3.6/site-packages/nova/compute/manager.py:7520
    2020-03-02 13:50:11.344 8 DEBUG nova.compute.manager [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Rebuilding the list of instances to heal _heal_instance_info_cache /usr/lib/python3.6/site-packages/nova/compute/manager.py:7524
    2020-03-02 13:50:11.359 8 DEBUG nova.compute.manager [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Didn't find any instances for network info cache update. _heal_instance_info_cache /usr/lib/python3.6/site-packages/nova/compute/manager.py:7606
    2020-03-02 13:50:12.341 8 DEBUG oslo_service.periodic_task [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Running periodic task ComputeManager._sync_scheduler_instance_info run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:217
    2020-03-02 13:50:23.331 8 DEBUG oslo_service.periodic_task [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Running periodic task ComputeManager.update_available_resource run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:217
    2020-03-02 13:50:23.359 8 DEBUG nova.compute.resource_tracker [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Auditing locally available compute resources for compute-0.redhat.local (node: compute-0.redhat.local) update_available_resource /usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py:731
    2020-03-02 13:50:23.359 8 DEBUG oslo_concurrency.processutils [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Running cmd (subprocess): ceph df --format=json --id openstack --conf /etc/ceph/ceph.conf execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372
    2020-03-02 13:50:23.791 8 DEBUG oslo_concurrency.processutils [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] CMD "ceph df --format=json --id openstack --conf /etc/ceph/ceph.conf" returned: 0 in 0.432s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409
    2020-03-02 13:50:23.831 8 DEBUG nova.compute.resource_tracker [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Hypervisor/Node resource view: name=compute-0.redhat.local free_ram=5796MB free_disk=150.0GB free_vcpus=4 pci_devices=[{"dev_id": "pci_0000_00_06_0", "address": "0000:00:06.0", "product_id": "2934", "vendor_id": "8086", "numa_node": null, "label": "label_8086_2934", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_00_0", "address": "0000:00:00.0", "product_id": "1237", "vendor_id": "8086", "numa_node": null, "label": "label_8086_1237", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_01_0", "address": "0000:00:01.0", "product_id": "7000", "vendor_id": "8086", "numa_node": null, "label": "label_8086_7000", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_01_1", "address": "0000:00:01.1", "product_id": "7010", "vendor_id": "8086", "numa_node": null, "label": "label_8086_7010", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_03_0", "address": "0000:00:03.0", "product_id": "1000", "vendor_id": "1af4", "numa_node": null, "label": "label_1af4_1000", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_09_0", "address": "0000:00:09.0", "product_id": "1002", "vendor_id": "1af4", "numa_node": null, "label": "label_1af4_1002", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_0a_0", "address": "0000:00:0a.0", "product_id": "1005", "vendor_id": "1af4", "numa_node": null, "label": "label_1af4_1005", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_02_0", "address": "0000:00:02.0", "product_id": "00b8", "vendor_id": "1013", "numa_node": null, "label": "label_1013_00b8", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_08_0", "address": "0000:00:08.0", "product_id": "1001", "vendor_id": "1af4", "numa_node": null, "label": "label_1af4_1001", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_06_2", "address": "0000:00:06.2", "product_id": "2936", "vendor_id": "8086", "numa_node": null, "label": "label_8086_2936", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_06_1", "address": "0000:00:06.1", "product_id": "2935", "vendor_id": "8086", "numa_node": null, "label": "label_8086_2935", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_01_3", "address": "0000:00:01.3", "product_id": "7113", "vendor_id": "8086", "numa_node": null, "label": "label_8086_7113", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_04_0", "address": "0000:00:04.0", "product_id": "1000", "vendor_id": "1af4", "numa_node": null, "label": "label_1af4_1000", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_05_0", "address": "0000:00:05.0", "product_id": "1000", "vendor_id": "1af4", "numa_node": null, "label": "label_1af4_1000", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_06_7", "address": "0000:00:06.7", "product_id": "293a", "vendor_id": "8086", "numa_node": null, "label": "label_8086_293a", "dev_type": "type-PCI"}, {"dev_id": "pci_0000_00_07_0", "address": "0000:00:07.0", "product_id": "1003", "vendor_id": "1af4", "numa_node": null, "label": "label_1af4_1003", "dev_type": "type-PCI"}] _report_hypervisor_resource_view /usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py:875
    2020-03-02 13:50:23.832 8 DEBUG oslo_concurrency.lockutils [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Lock "compute_resources" acquired by "nova.compute.resource_tracker.ResourceTracker._update_available_resource" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
    2020-03-02 13:50:24.225 8 ERROR nova.compute.resource_tracker [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Skipping removal of allocations for deleted instances: The request you have made requires authentication. (HTTP 401) (Request-ID: req-2bf5cd56-b911-450f-a8ef-3e8846ea00bf): keystoneauth1.exceptions.http.Unauthorized: The request you have made requires authentication. (HTTP 401) (Request-ID: req-2bf5cd56-b911-450f-a8ef-3e8846ea00bf)
    2020-03-02 13:50:24.225 8 DEBUG nova.compute.resource_tracker [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Total usable vcpus: 4, total allocated vcpus: 0 _report_final_resource_view /usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py:891
    2020-03-02 13:50:24.226 8 DEBUG nova.compute.resource_tracker [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Final resource view: name=compute-0.redhat.local phys_ram=5796MB used_ram=4096MB phys_disk=150GB used_disk=0GB total_vcpus=4 used_vcpus=0 pci_stats=[] _report_final_resource_view /usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py:913
    2020-03-02 13:50:24.931 8 DEBUG oslo_concurrency.lockutils [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Lock "compute_resources" released by "nova.compute.resource_tracker.ResourceTracker._update_available_resource" :: held 1.099s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager [req-3dc94050-2da0-40c5-9f68-9c3a78252418 - - - - -] Error updating resources for node compute-0.redhat.local.: nova.exception.ResourceProviderCreationFailed: Failed to create resource provider compute-0.redhat.local
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager Traceback (most recent call last):
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 8256, in _update_available_resource_for_node
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     startup=startup)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 748, in update_available_resource
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     self._update_available_resource(context, resources, startup=startup)
    2020-03-02 13:020-03-02 13:50:24.931 8 ERROR nova.compute.manager     return f(*args, **kwargs)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 829, in _update_available_resource
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     self._update(context, cn, startup=startup)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 1036, in _update
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     self._update_to_placement(context, compute_node, startup)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/retrying.py", line 68, in wrapped_f
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     return Retrying(*dargs, **dkw).call(f, *args, **kw)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/retrying.py", line 223, in call
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     return attempt.get(self._wrap_exception)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/retrying.py", line 261, in get
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     six.reraise(self.value[0], self.value[1], self.value[2])
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     raise value
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/retrying.py", line 217, in call
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/compute/resource_tracker.py", line 962, in _update_to_placement
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     context, compute_node.uuid, name=compute_node.hypervisor_hostname)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 873, in get_provider_tree_and_ensure_root
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     parent_provider_uuid=parent_provider_uuid)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 667, in _ensure_resource_provider
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager     name=name or uuid)
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager nova.exception.ResourceProviderCreationFailed: Failed to create resource provider compute-0.redhat.local
    2020-03-02 13:50:24.931 8 ERROR nova.compute.manager
    50:24.931 8 ERROR nova.compute.manager   File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner


There is an environment with the issue reproduced in case it would be useful.



Version-Release number of selected component (if applicable):

[root@compute-0 heat-admin]# sudo rpm -qa | grep nova
python3-novaclient-13.0.2-0.20191118050438.c7e793c.el8ost.noarch
puppet-nova-14.4.1-0.20191213020436.334f4be.el8ost.noarch


How reproducible:


Steps to Reproduce:
1. Run the CI job: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/upgrade/job/DFG-upgrades-upgrade-upgrade-15-16_director-rhel-virthost-3cont_2comp_3ceph-ipv4-geneve-HA/ 
2.
3.

Actual results:


Expected results:


Additional info:


Note You need to log in before you can comment on or make changes to this bug.