Bug 2223997

Summary: [OVN][Trunk][live-migration] Sometimes subport doesn't reach status ACTIVE after live-migration
Product: Red Hat OpenStack Reporter: Rodolfo Alonso <ralonsoh>
Component: openstack-neutronAssignee: Rodolfo Alonso <ralonsoh>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: bcafarel, chrisw, ekuris, scohen
Target Milestone: z3Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-10-10 09:10:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rodolfo Alonso 2023-07-19 14:01:16 UTC
This is a duplicate of the U/S bug [1].

Both subport and parent port successfully reach status=ACTIVE when the VM is created. Then, the VM is live-migrated to another compute. The test checks the VM status successfully changes to ACTIVE. Finally, the test fails waiting for the subport status to change to ACTIVE: it never happens, it remains in status DOWN.
This failure doesn't happen always, so it seems it is due to a race condition.

Tempest logs when it fails (this job was retriggered/rechecked and the test passed):
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8ac/887220/4/check/nova-live-migration/8ace3a8/testr_results.html

How often does this fail?
https://zuul.opendev.org/t/openstack/builds?job_name=nova-live-migration&branch=master&skip=0
24 jobs run on 11th and 12th July
7 of them failed - all these failures are due to this bug
i.e. ~30%

neutron server logs:
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8ac/887220/4/check/nova-live-migration/8ace3a8/controller/logs/screen-q-svc.txt

subport: e40925a3-4e71-4fa4-80ea-cc3a464101d6
parent port: 51e121cc-4e64-411e-b16a-e605cf967332
live-migration src host: np0034645395
live-migration dst host: np0034645398

It seems that due to a race condition, the following operations are processed in the wrong order.
The first one sets the parent port status to ACTIVE and changes revision number from 9 to 10: https://paste.opendev.org/show/bbOaHudNMMI3zudMPRhc/

The second operation wrongly sets the parent port status to DOWN again, and changes revision_number from 8 to 9, so apparently it should have been processed before: https://paste.opendev.org/show/bYUtKncFycR6qHbQrz3J/

[1]https://bugs.launchpad.net/neutron/+bug/2027605

Comment 5 Rodolfo Alonso 2023-10-10 09:10:58 UTC

*** This bug has been marked as a duplicate of bug 2185897 ***