Bug 2143972

Summary: [OSP16.1] Live-migration failure during post because of keystone unavailable
Product: Red Hat OpenStack Reporter: ggrimaux
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: NEW --- QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: low Docs Contact:
Priority: low    
Version: 16.1 (Train)CC: alifshit, coldford, dasmith, eglynn, jhakimra, kchamart, rosingh, rribaud, sbauza, sgordon, vromanso
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: rribaud: needinfo? (rosingh)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ggrimaux 2022-11-18 14:56:30 UTC
Description of problem:

Client attempted a live migration and it failed during post section with:

The source of the problem seems to have been related to keystone:
1064:2022-11-17 15:49:15.296 7 INFO nova.compute.resource_tracker [req-1c837c0b-3ccd-4226-8340-af1b12c6fddb - - - - -] [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] Updating resource usage from migration 1976219d-a89d-48f2-893f-39d3e194a4f3
1067:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [req-ac4e513d-a39c-452c-8df7-319d2e764095 - - - - -] [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] Post live migration at destination icmlw-p1-r740-070.itpc.uk.pri.o2.com failed: oslo_messaging.rpc.client.RemoteError: Remote error: ServiceUnavailable The server is currently unavailable. Please try again at a later time.<br /><br />
1073:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] Traceback (most recent call last):
1074:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 7579, in _post_live_migration
1075:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]     instance, block_migration, dest)
1076:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]   File "/usr/lib/python3.6/site-packages/nova/compute/rpcapi.py", line 796, in post_live_migration_at_destination
1077:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]     instance=instance, block_migration=block_migration)
1078:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
1079:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]     transport_options=self.transport_options)
1080:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]   File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
1081:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]     transport_options=transport_options)
1082:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 674, in send
1083:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]     transport_options=transport_options)
1084:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 664, in _send
1085:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef]     raise result
1086:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] oslo_messaging.rpc.client.RemoteError: Remote error: ServiceUnavailable The server is currently unavailable. Please try again at a later time.<br /><br />
1087:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] The Keystone service is temporarily unavailable.


I looked at keystone logs around that time on all 3 controller nodes and couldn't find anything substantial.

Instance was stuck in "MIGRATING" state and nova.instances was pointing still to the source compute.
We had to modify the database to point to the new compute.

I don't know if you can find something with.
Or if we need to enable debug mode in keystone and wait for the next occurrence of this situation.

If you need anything please let me know.

Thank you.

Version-Release number of selected component (if applicable):
OSP16.1.7
Environment is integrated with Contrail.


How reproducible:
Happened once

Steps to Reproduce:
1. Live-migration
2.
3.

Actual results:
Live-migration failed.

Expected results:
Live-migration succeed

Additional info:
have sosreport from compute nodes and controller nodes