The nova db sync is failing on OSP 15 minor Update on Undercloud update. More information in the related Launch pad bug is here: https://bugs.launchpad.net/nova/+bug/1824435 I submitted a fix in Nova: https://review.opendev.org/678776 But I think another fix that can be merged faster than the nova one can solve the issue.
Adding a comment here ahead of the next triage call. This bug has appeared on the triage call twice so far and because this bug is only triggered by running 'nova-manage db sync' twice (instead of only once) we asked deployment specialists to investigate whether the deploy routine (1) runs 'nova-manage db sync' more than once and (2) if so, could it be changed to run 'nova-manage db sync' only once? Martin Schuppert checked the deployment routines and found that 'nova-manage db sync' is run only once during normal deploys and is run twice only during FFUs. So, it is still unclear how this bug was encountered in this bug report. Reproduce steps for this bug are described in the upstream bug [1]: "(2:44:55 PM) imacdonn: mriedem: FWIW, I can reproduce my original problem with this sequence: 1) create an instance 2) run migrations 3) archive 4) run migrations" and a paste of reproduce steps for devstack is also included on the upstream bug [2]: http://paste.openstack.org/show/749391 [1] https://bugs.launchpad.net/nova/+bug/1824435 [2] https://bugs.launchpad.net/nova/+bug/1824435/comments/8
JFYI, the expectation from deployment tooling is that the db sync command should be idempotent. It may run twice if the first execution times out or it may be triggered again (we've saw this historically when the db sync was super slow). Every other service's db sync can be run any number of times.
(In reply to Alex Schultz from comment #3) > JFYI, the expectation from deployment tooling is that the db sync command > should be idempotent. It may run twice if the first execution times out or > it may be triggered again (we've saw this historically when the db sync was > super slow). Every other service's db sync can be run any number of times. Yes, sorry I took for granted that we maintain the same assumption, that 'nova-manage db sync' is intended and expected to be idempotent. My comment 2 was in reference specifically to the idea in comment 0 of quicker, interim workaround while the nova change is being worked on upstream. Once we have the correct approach and test coverage on the nova change, the 'nova-manage db sync' idempotency will be fixed and the interim workaround would be removed. That said, so far we're not aware that anyone has been able to reproduce this issue with OSP and we have observed that our deployment tooling is not running db sync more than once, so as far as we know, there is nothing to do right now other than work on the nova change upstream to fix the idempotency of the db sync command.
I have proposed a different patch for the root cause in nova after investigating the bug in a local test environment. TL;DR I found that our code for "get or create default security group" is creating a duplicate default group for project_id=NULL (this works despite a unique constraint on project_id because unique constraints are only enforced on non-NULL values [1]) because the group create happens in a separate database transaction but a later read happens in the same/current database transaction. The fix I propose is to do the read in a separate transaction, similar to how the create is done. [1] https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-unique
Patch https://review.opendev.org/688206 has received +2 and is favored over other approach.
https://review.opendev.org/688206 has merged upstream.