+++ This bug was initially created as a clone of Bug #1959627 +++
Description of problem:
Intermittently cold migration will fail in a TLS-E multi-cell environment. The specific error [1] comes from the cell conductor which does not have the auth_type set in its [neutron] configuration in nova.conf. The primary controller and the cell computes do have the the auth_type correctly set. Not sure if this is an issue with TripleO not propagating the configuration to the cell-controller, or if nova should be referencing the config params on the primary controller instead. To highlight what was mentioned earlier this is not 100% reproducible. So far this is happening about 50% of the time running the same suite of tests on the same environment. When it does fail it only fails doing the cold migration revert test [2]. All other migration tests consistently pass. It's important to note that while the cold migration revert test is the test that always fails, the failure happens before the revert.
Relevant Config parameters:
[heat-admin@cell1-compute-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf neutron auth_type
v3password
[heat-admin@cell1-cellcontrol-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova/etc/nova/nova.conf neutron
[heat-admin@cell1-cellcontrol-0 ~]$
[heat-admin@controller-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova/etc/nova/nova.conf neutron auth_type
v3password
Example output below when migration fails:
2021-05-11 16:45:27.998 [nova-cell-conductor.log] 23 ERROR nova.network.neutronv2.api [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/
2021-05-11 16:45:27.999 [nova-cell-conductor.log] 23 WARNING nova.scheduler.utils [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Failed to compute_task_migrate_server: Unknown auth type: None: neutronclient.common.exceptions.Unauthorized: Unknown auth type: None
2021-05-11 16:45:28.004 [nova-cell-conductor.log] 23 WARNING nova.scheduler.utils [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] [instance: 709ed075-6419-4e0b-928f-27050b8910ba] Setting instance to ACTIVE state.: neutronclient.common.exceptions.Unauthorized: Unknown auth type: None
2021-05-11 16:45:28.062 [nova-cell-conductor.log] 23 DEBUG nova.objects.instance [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Lazy-loading 'flavor' on Instance uuid 709ed075-6419-4e0b-928f-27050b8910ba obj_load_attr /usr/lib/python3.6/site-packages/nova/objects/instance.py:1091
2021-05-11 16:45:28.098 [nova-cell-conductor.log] 23 DEBUG nova.objects.instance [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Lazy-loading 'metadata' on Instance uuid 709ed075-6419-4e0b-928f-27050b8910ba obj_load_attr /usr/lib/python3.6/site-packages/nova/objects/instance.py:1091
2021-05-11 16:45:28.136 [nova-cell-conductor.log] 23 DEBUG nova.objects.instance [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Lazy-loading 'info_cache' on Instance uuid 709ed075-6419-4e0b-928f-27050b8910ba obj_load_attr /usr/lib/python3.6/site-packages/nova/objects/instance.py:1091
2021-05-11 16:45:28.223 [nova-cell-conductor.log] 23 ERROR oslo_messaging.rpc.server [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Exception during message handling: neutronclient.common.exceptions.Unauthorized: Unknown auth type: None
2021-05-11 16:45:28.223 [nova-cell-conductor.log] 23 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2021-05-11 16:45:28.223 [nova-cell-conductor.log] 23 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
Version-Release number of selected component (if applicable):
[stack@undercloud-0 ~]$ cat core_puddle_version
RHOS-16.1-RHEL-8-20210506.n.1
[stack@undercloud-0 ~]$ cat /etc/rhosp-release
Red Hat OpenStack Platform release 16.1.6 GA (Train)
How reproducible:
~50% running the same suite of tests on the same environment
Steps to Reproduce:
1. Setup a multi-cell environment with TLS-E
2. Create an instance in the cell1 cluster of the deployment
3. Using admin client, cold migrate a server in and wait until status VERIFY_RESIZE is reached
4. (This might be triggered due to parallel migration tests happening)
Actual results:
Instance fails to cold migrate to new host
Expected results:
Instance should migrate to a new host
Additional info:
Test logs can be found here [3]
Relevant nova logs are merged and attached. The instance id involved in the fail scenario is 709ed075-6419-4e0b-928f-27050b8910ba and the request-uuid assoicate with the migration is req-c002f919-623f-4917-b3a4-e54fc60c063d
[1] https://github.com/openstack/nova/blob/stable/train/nova/network/neutronv2/api.py#L82
[2] https://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_migrations.py#L171
[3] https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-compute-nova-16.1_director-rhel-virthost-1cont_1comp_1cellcont_2cellcomp_1ipa-ipv4-geneve-multi-cell-tls-everywhere-phase3/5/testReport/tempest.api.compute.admin.test_migrations/MigrationsAdminTest/test_revert_cold_migration_id_caa1aa8b_f4ef_4374_be0d_95f001c2ac2d_/
--- Additional comment from on 2021-05-14 13:42:39 UTC ---
looking at the merged logs i can see the conductor does not have the neutron admin credentials
2021-05-11 16:02:05.847 [nova-conductor.log] 7 DEBUG oslo_service.service [-] neutron.auth_section = None log_opt_values /usr/lib/python3.6/site-packages/oslo_config/cfg.py:2589 │
│2021-05-11 16:02:05.847 [nova-conductor.log] 7 DEBUG oslo_service.service [-] neutron.auth_type = v3password log_opt_values /usr/lib/python3.6/site-packages/oslo_config/cfg.py:2589
so i suspect that this is why its failing
this is the relevent log http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-compute-nova-16.1_director-rhel-virthost-1cont_1comp_1cellcont_2cellcomp_1ipa-ipv4-geneve-multi-cell-tls-everywhere-phase3/5/controller-0/var/lib/config-data/nova/etc/nova/nova.conf.gz
Authentication type to load (string value)
# Deprecated group;name - [neutron]/auth_plugin
#auth_type=<None>
auth_type=v3password
# Config Section from which to load plugin specific options (string value)
#auth_section=<None>
# Authentication URL (string value)
#auth_url=<None>
auth_url=https://overcloud.internalapi.redhat.local:5000/v3
# Scope for system operations (string value)
#system_scope=<None>
# Domain ID to scope to (string value)
#domain_id=<None>
# Domain name to scope to (string value)
#domain_name=<None>
# Project ID to scope to (string value)
#project_id=<None>
# Project name to scope to (string value)
#project_name=<None>
project_name=service
# Domain ID containing project (string value)
#project_domain_id=<None>
# Domain name containing project (string value)
#project_domain_name=<None>
project_domain_name=Default
# Trust ID (string value)
#trust_id=<None>
# Optional domain ID to use with v3 and v2 parameters. It will be used for both
# the user and project domain in v3 and ignored in v2 authentication (string
# value)
#default_domain_id=<None>
# Optional domain name to use with v3 API and v2 parameters. It will be used for
# both the user and project domain in v3 and ignored in v2 authentication
# (string value)
#default_domain_name=<None>
# User ID (string value)
#user_id=<None>
# Username (string value)
# Deprecated group;name - [neutron]/user_name
#username=<None>
username=neutron
# User's domain id (string value)
#user_domain_id=<None>
# User's domain name (string value)
#user_domain_name=<None>
user_domain_name=Default
# User's password (string value)
#password=<None>
password=qPe6rrMXgW2sFlD314F0K91v1
so it should be able to conenct the shduler share the same nova.conf i belive btu it seams to think the auth type is none
2021-05-11 16:45:27.998 [nova-cell-conductor.log] 23 ERROR nova.network.neutronv2.api [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/
2021-05-11 16:45:27.999 [nova-cell-conductor.log] 23 WARNING nova.scheduler.utils [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Failed to compute_task_migrate_server: Unknown auth type: None: neutronclient.common.exceptions.Unauthorized: Unknown auth type: None
yet at startup it was detected corrctly as v3password
│2021-05-11 16:02:08.801 [nova-scheduler.log] 7 DEBUG oslo_service.service [req-a383177d-fd3b-49e4-bfef-e844cf46dd83 - - - - -] neutron.auth_type = v3password
so there si obviously some wrong here that is cause the the constuciotn of the neutron client to fail in nova.network.neutronv2.api
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHEA-2021:3483