Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1748483

Summary: [Update] Undercloud update failed on db migration
Product: Red Hat OpenStack Reporter: mathieu bultel <mbultel>
Component: openstack-novaAssignee: melanie witt <mwitt>
Status: CLOSED EOL QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: high    
Version: 15.0 (Stein)CC: aschultz, dasmith, eglynn, jhakimra, kchamart, lyarwood, mburns, mwitt, nlevinki, sbauza, sgordon, vromanso
Target Milestone: Upstream M1Keywords: Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-21.1.0-0.20200624041944.1cae0cd.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1768673 1768676 (view as bug list) Environment:
Last Closed: 2021-07-07 09:26:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1768673, 1768676    

Description mathieu bultel 2019-09-03 16:53:27 UTC
The nova db sync is failing on OSP 15 minor Update on Undercloud update.

More information in the related Launch pad bug is here:
https://bugs.launchpad.net/nova/+bug/1824435

I submitted a fix in Nova:
https://review.opendev.org/678776

But I think another fix that can be merged faster than the nova one can solve the issue.

Comment 2 melanie witt 2019-09-20 01:28:33 UTC
Adding a comment here ahead of the next triage call. This bug has appeared on the triage call twice so far and because this bug is only triggered by running 'nova-manage db sync' twice (instead of only once) we asked deployment specialists to investigate whether the deploy routine (1) runs 'nova-manage db sync' more than once and (2) if so, could it be changed to run 'nova-manage db sync' only once?

Martin Schuppert checked the deployment routines and found that 'nova-manage db sync' is run only once during normal deploys and is run twice only during FFUs. So, it is still unclear how this bug was encountered in this bug report.

Reproduce steps for this bug are described in the upstream bug [1]:

"(2:44:55 PM) imacdonn: mriedem: FWIW, I can reproduce my original problem with this sequence: 1) create an instance 2) run migrations 3) archive 4) run migrations"

and a paste of reproduce steps for devstack is also included on the upstream bug [2]:

http://paste.openstack.org/show/749391

[1] https://bugs.launchpad.net/nova/+bug/1824435
[2] https://bugs.launchpad.net/nova/+bug/1824435/comments/8

Comment 3 Alex Schultz 2019-09-20 02:48:39 UTC
JFYI, the expectation from deployment tooling is that the db sync command should be idempotent. It may run twice if the first execution times out or it may be triggered again (we've saw this historically when the db sync was super slow).  Every other service's db sync can be run any number of times.

Comment 4 melanie witt 2019-09-20 03:37:31 UTC
(In reply to Alex Schultz from comment #3)
> JFYI, the expectation from deployment tooling is that the db sync command
> should be idempotent. It may run twice if the first execution times out or
> it may be triggered again (we've saw this historically when the db sync was
> super slow).  Every other service's db sync can be run any number of times.

Yes, sorry I took for granted that we maintain the same assumption, that 'nova-manage db sync' is intended and expected to be idempotent. My comment 2 was in reference specifically to the idea in comment 0 of quicker, interim workaround while the nova change is being worked on upstream. Once we have the correct approach and test coverage on the nova change, the 'nova-manage db sync' idempotency will be fixed and the interim workaround would be removed.

That said, so far we're not aware that anyone has been able to reproduce this issue with OSP and we have observed that our deployment tooling is not running db sync more than once, so as far as we know, there is nothing to do right now other than work on the nova change upstream to fix the idempotency of the db sync command.

Comment 5 melanie witt 2019-10-11 21:37:51 UTC
I have proposed a different patch for the root cause in nova after investigating the bug in a local test environment.

TL;DR I found that our code for "get or create default security group" is creating a duplicate default group for project_id=NULL (this works despite a unique constraint on project_id because unique constraints are only enforced on non-NULL values [1]) because the group create happens in a separate database transaction but a later read happens in the same/current database transaction. The fix I propose is to do the read in a separate transaction, similar to how the create is done.

[1] https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-unique

Comment 8 melanie witt 2019-10-18 20:48:58 UTC
Patch https://review.opendev.org/688206 has received +2 and is favored over other approach.

Comment 9 melanie witt 2019-11-04 23:22:22 UTC
https://review.opendev.org/688206 has merged upstream.