Description of problem:
Client attempted a live migration and it failed during post section with:
The source of the problem seems to have been related to keystone:
1064:2022-11-17 15:49:15.296 7 INFO nova.compute.resource_tracker [req-1c837c0b-3ccd-4226-8340-af1b12c6fddb - - - - -] [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] Updating resource usage from migration 1976219d-a89d-48f2-893f-39d3e194a4f3
1067:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [req-ac4e513d-a39c-452c-8df7-319d2e764095 - - - - -] [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] Post live migration at destination icmlw-p1-r740-070.itpc.uk.pri.o2.com failed: oslo_messaging.rpc.client.RemoteError: Remote error: ServiceUnavailable The server is currently unavailable. Please try again at a later time.<br /><br />
1073:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] Traceback (most recent call last):
1074:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 7579, in _post_live_migration
1075:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] instance, block_migration, dest)
1076:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/nova/compute/rpcapi.py", line 796, in post_live_migration_at_destination
1077:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] instance=instance, block_migration=block_migration)
1078:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
1079:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] transport_options=self.transport_options)
1080:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
1081:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] transport_options=transport_options)
1082:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 674, in send
1083:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] transport_options=transport_options)
1084:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 664, in _send
1085:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] raise result
1086:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] oslo_messaging.rpc.client.RemoteError: Remote error: ServiceUnavailable The server is currently unavailable. Please try again at a later time.<br /><br />
1087:2022-11-17 15:49:34.143 7 ERROR nova.compute.manager [instance: 67857f95-d07d-4568-b92f-b37c760c94ef] The Keystone service is temporarily unavailable.
I looked at keystone logs around that time on all 3 controller nodes and couldn't find anything substantial.
Instance was stuck in "MIGRATING" state and nova.instances was pointing still to the source compute.
We had to modify the database to point to the new compute.
I don't know if you can find something with.
Or if we need to enable debug mode in keystone and wait for the next occurrence of this situation.
If you need anything please let me know.
Thank you.
Version-Release number of selected component (if applicable):
OSP16.1.7
Environment is integrated with Contrail.
How reproducible:
Happened once
Steps to Reproduce:
1. Live-migration
2.
3.
Actual results:
Live-migration failed.
Expected results:
Live-migration succeed
Additional info:
have sosreport from compute nodes and controller nodes