Bug 1748483 - [Update] Undercloud update failed on db migration
Summary: [Update] Undercloud update failed on db migration
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: Upstream M1
: 17.0
Assignee: melanie witt
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks: 1768673 1768676
TreeView+ depends on / blocked
 
Reported: 2019-09-03 16:53 UTC by mathieu bultel
Modified: 2023-03-21 19:21 UTC (History)
12 users (show)

Fixed In Version: openstack-nova-21.1.0-0.20200624041944.1cae0cd.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1768673 1768676 (view as bug list)
Environment:
Last Closed: 2021-07-07 09:26:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1824435 0 None None None 2019-10-11 21:37:51 UTC
OpenStack gerrit 688206 0 'None' MERGED Remove redundant call to get/create default security group 2020-07-16 22:13:15 UTC
Red Hat Issue Tracker OSP-5949 0 None None None 2022-08-11 10:41:30 UTC

Description mathieu bultel 2019-09-03 16:53:27 UTC
The nova db sync is failing on OSP 15 minor Update on Undercloud update.

More information in the related Launch pad bug is here:
https://bugs.launchpad.net/nova/+bug/1824435

I submitted a fix in Nova:
https://review.opendev.org/678776

But I think another fix that can be merged faster than the nova one can solve the issue.

Comment 2 melanie witt 2019-09-20 01:28:33 UTC
Adding a comment here ahead of the next triage call. This bug has appeared on the triage call twice so far and because this bug is only triggered by running 'nova-manage db sync' twice (instead of only once) we asked deployment specialists to investigate whether the deploy routine (1) runs 'nova-manage db sync' more than once and (2) if so, could it be changed to run 'nova-manage db sync' only once?

Martin Schuppert checked the deployment routines and found that 'nova-manage db sync' is run only once during normal deploys and is run twice only during FFUs. So, it is still unclear how this bug was encountered in this bug report.

Reproduce steps for this bug are described in the upstream bug [1]:

"(2:44:55 PM) imacdonn: mriedem: FWIW, I can reproduce my original problem with this sequence: 1) create an instance 2) run migrations 3) archive 4) run migrations"

and a paste of reproduce steps for devstack is also included on the upstream bug [2]:

http://paste.openstack.org/show/749391

[1] https://bugs.launchpad.net/nova/+bug/1824435
[2] https://bugs.launchpad.net/nova/+bug/1824435/comments/8

Comment 3 Alex Schultz 2019-09-20 02:48:39 UTC
JFYI, the expectation from deployment tooling is that the db sync command should be idempotent. It may run twice if the first execution times out or it may be triggered again (we've saw this historically when the db sync was super slow).  Every other service's db sync can be run any number of times.

Comment 4 melanie witt 2019-09-20 03:37:31 UTC
(In reply to Alex Schultz from comment #3)
> JFYI, the expectation from deployment tooling is that the db sync command
> should be idempotent. It may run twice if the first execution times out or
> it may be triggered again (we've saw this historically when the db sync was
> super slow).  Every other service's db sync can be run any number of times.

Yes, sorry I took for granted that we maintain the same assumption, that 'nova-manage db sync' is intended and expected to be idempotent. My comment 2 was in reference specifically to the idea in comment 0 of quicker, interim workaround while the nova change is being worked on upstream. Once we have the correct approach and test coverage on the nova change, the 'nova-manage db sync' idempotency will be fixed and the interim workaround would be removed.

That said, so far we're not aware that anyone has been able to reproduce this issue with OSP and we have observed that our deployment tooling is not running db sync more than once, so as far as we know, there is nothing to do right now other than work on the nova change upstream to fix the idempotency of the db sync command.

Comment 5 melanie witt 2019-10-11 21:37:51 UTC
I have proposed a different patch for the root cause in nova after investigating the bug in a local test environment.

TL;DR I found that our code for "get or create default security group" is creating a duplicate default group for project_id=NULL (this works despite a unique constraint on project_id because unique constraints are only enforced on non-NULL values [1]) because the group create happens in a separate database transaction but a later read happens in the same/current database transaction. The fix I propose is to do the read in a separate transaction, similar to how the create is done.

[1] https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-unique

Comment 8 melanie witt 2019-10-18 20:48:58 UTC
Patch https://review.opendev.org/688206 has received +2 and is favored over other approach.

Comment 9 melanie witt 2019-11-04 23:22:22 UTC
https://review.opendev.org/688206 has merged upstream.


Note You need to log in before you can comment on or make changes to this bug.