Bug 1851239 - [13->16.1] Instances post upgrade shutdown on "nova-compute attempted direct database access which is not allowed by policy"
Summary: [13->16.1] Instances post upgrade shutdown on "nova-compute attempted direct ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z3
: ---
Assignee: Ollie Walsh
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On: 1849235
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-25 21:07 UTC by Archit Modi
Modified: 2020-11-20 14:48 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-20 14:48:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1871482 0 None None None 2020-06-25 21:07:56 UTC
OpenStack gerrit 718552 0 None MERGED Refactor nova db config 2021-01-04 09:00:29 UTC

Description Archit Modi 2020-06-25 21:07:56 UTC
Description of problem: After successful completion of 13->16.1 upgrade, instances that were spun up before/during upgrade (on rhos-13) are now shutdown due to the following error:
nova.exception.DBNotAllowed: nova-compute attempted direct database access which is not allowed by policy

How reproducible: always

Steps to Reproduce:
1. Deploy Rhos-13 & start upgrade process to 16.1
2. During upgrade spin up an instance and live migrate to another compute host
3. After successful FFWD2 upgrade, the instance is shown as ACTIVE but state is shutdown (powering off)

Actual results:
#spin up an instance
+--------------------------------------+-----------+--------+------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
| ID                                   | Name      | Status | Task State | Power State | Networks                           | Image Name                   | Image ID                             | Flavor Name | Flavor ID                            | Availability Zone | Host                   | Properties |
+--------------------------------------+-----------+--------+------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
| 6f9d6d6f-6d32-4350-a217-896ea440fcec | test-7370 | ACTIVE | None       | Running     | private=192.168.100.13, 10.0.0.216 | cirros-0.4.0-x86_64-disk.img | 7f21c88a-daad-4e17-890c-b5ab5fdc07e7 | m1.tiny     | f4c46705-fff1-45b3-ab78-248bb6951e7f | nova              | compute-1.redhat.local |            |
+--------------------------------------+-----------+--------+------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
(qe-Cloud-0) [stack@undercloud-0 ~]$ openstack server migrate test-7370 --block-migration --live-migration
(qe-Cloud-0) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+-----------+-----------+------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
| ID                                   | Name      | Status    | Task State | Power State | Networks                           | Image Name                   | Image ID                             | Flavor Name | Flavor ID                            | Availability Zone | Host                   | Properties |
+--------------------------------------+-----------+-----------+------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
| 6f9d6d6f-6d32-4350-a217-896ea440fcec | test-7370 | MIGRATING | migrating  | Running     | private=192.168.100.13, 10.0.0.216 | cirros-0.4.0-x86_64-disk.img | 7f21c88a-daad-4e17-890c-b5ab5fdc07e7 | m1.tiny     | f4c46705-fff1-45b3-ab78-248bb6951e7f | nova              | compute-1.redhat.local |            |
+--------------------------------------+-----------+-----------+------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
(qe-Cloud-0) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+-----------+--------+------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
| ID                                   | Name      | Status | Task State | Power State | Networks                           | Image Name                   | Image ID                             | Flavor Name | Flavor ID                            | Availability Zone | Host                   | Properties |
+--------------------------------------+-----------+--------+------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
| 6f9d6d6f-6d32-4350-a217-896ea440fcec | test-7370 | ACTIVE | None       | Running     | private=192.168.100.13, 10.0.0.216 | cirros-0.4.0-x86_64-disk.img | 7f21c88a-daad-4e17-890c-b5ab5fdc07e7 | m1.tiny     | f4c46705-fff1-45b3-ab78-248bb6951e7f | nova              | compute-0.redhat.local |            |
+--------------------------------------+-----------+--------+------------+-------------+------------------------------------+------------
# post ffwd2 upgrade
(qe-Cloud-0) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+-----------+--------+--------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
| ID                                   | Name      | Status | Task State   | Power State | Networks                           | Image Name                   | Image ID                             | Flavor Name | Flavor ID                            | Availability Zone | Host                   | Properties |
+--------------------------------------+-----------+--------+--------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
| 6f9d6d6f-6d32-4350-a217-896ea440fcec | test-7370 | ACTIVE | powering-off | Shutdown    | private=192.168.100.13, 10.0.0.216 | cirros-0.4.0-x86_64-disk.img | 7f21c88a-daad-4e17-890c-b5ab5fdc07e7 | m1.tiny     | f4c46705-fff1-45b3-ab78-248bb6951e7f | nova              | compute-0.redhat.local |            |
+--------------------------------------+-----------+--------+--------------+-------------+------------------------------------+------------------------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+
Expected results:
Instances are running with power on

Additional info:

Comment 2 Ollie Walsh 2020-07-08 15:17:33 UTC
Right now this only occurs when UpgradeLevelNovaCompute: auto. There is no reason to do this for FFU so the simple fix in https://bugzilla.redhat.com/show_bug.cgi?id=1849235 is sufficient for now.

Long term we need to rework the nova database config in t-h-t. I've started this in https://review.opendev.org/718552. However this would cause issues for deployments that omit controllers (e.g scale up deployments sometimes do this to speed up the process) so I need to implement an alternative approach for this case.

Comment 13 Ollie Walsh 2020-11-20 14:48:10 UTC
Closing this as the upgrade issue was resolved. Will track the larger tripleo refactor in BZ1899982


Note You need to log in before you can comment on or make changes to this bug.