Bug 2223997 - [OVN][Trunk][live-migration] Sometimes subport doesn't reach status ACTIVE after live-migration
Summary: [OVN][Trunk][live-migration] Sometimes subport doesn't reach status ACTIVE af...
Keywords:
Status: CLOSED DUPLICATE of bug 2185897
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: z3
: ---
Assignee: Rodolfo Alonso
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-19 14:01 UTC by Rodolfo Alonso
Modified: 2023-10-10 09:10 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-10-10 09:10:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 2027605 0 None None None 2023-07-19 14:01:48 UTC
OpenStack gerrit 892889 0 None NEW [OVN] Skip the port status UP update during a live migration 2023-08-31 16:35:15 UTC
OpenStack gerrit 892890 0 None NEW [OVN][Trunk] Set the subports correct host during live migration 2023-08-31 16:35:16 UTC
Red Hat Issue Tracker OSP-26749 0 None None None 2023-07-19 14:01:49 UTC

Description Rodolfo Alonso 2023-07-19 14:01:16 UTC
This is a duplicate of the U/S bug [1].

Both subport and parent port successfully reach status=ACTIVE when the VM is created. Then, the VM is live-migrated to another compute. The test checks the VM status successfully changes to ACTIVE. Finally, the test fails waiting for the subport status to change to ACTIVE: it never happens, it remains in status DOWN.
This failure doesn't happen always, so it seems it is due to a race condition.

Tempest logs when it fails (this job was retriggered/rechecked and the test passed):
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8ac/887220/4/check/nova-live-migration/8ace3a8/testr_results.html

How often does this fail?
https://zuul.opendev.org/t/openstack/builds?job_name=nova-live-migration&branch=master&skip=0
24 jobs run on 11th and 12th July
7 of them failed - all these failures are due to this bug
i.e. ~30%

neutron server logs:
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8ac/887220/4/check/nova-live-migration/8ace3a8/controller/logs/screen-q-svc.txt

subport: e40925a3-4e71-4fa4-80ea-cc3a464101d6
parent port: 51e121cc-4e64-411e-b16a-e605cf967332
live-migration src host: np0034645395
live-migration dst host: np0034645398

It seems that due to a race condition, the following operations are processed in the wrong order.
The first one sets the parent port status to ACTIVE and changes revision number from 9 to 10: https://paste.opendev.org/show/bbOaHudNMMI3zudMPRhc/

The second operation wrongly sets the parent port status to DOWN again, and changes revision_number from 8 to 9, so apparently it should have been processed before: https://paste.opendev.org/show/bYUtKncFycR6qHbQrz3J/

[1]https://bugs.launchpad.net/neutron/+bug/2027605

Comment 5 Rodolfo Alonso 2023-10-10 09:10:58 UTC

*** This bug has been marked as a duplicate of bug 2185897 ***


Note You need to log in before you can comment on or make changes to this bug.