+++ This bug was initially created as a clone of Bug #1959627 +++ Description of problem: Intermittently cold migration will fail in a TLS-E multi-cell environment. The specific error [1] comes from the cell conductor which does not have the auth_type set in its [neutron] configuration in nova.conf. The primary controller and the cell computes do have the the auth_type correctly set. Not sure if this is an issue with TripleO not propagating the configuration to the cell-controller, or if nova should be referencing the config params on the primary controller instead. To highlight what was mentioned earlier this is not 100% reproducible. So far this is happening about 50% of the time running the same suite of tests on the same environment. When it does fail it only fails doing the cold migration revert test [2]. All other migration tests consistently pass. It's important to note that while the cold migration revert test is the test that always fails, the failure happens before the revert. Relevant Config parameters: [heat-admin@cell1-compute-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf neutron auth_type v3password [heat-admin@cell1-cellcontrol-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova/etc/nova/nova.conf neutron [heat-admin@cell1-cellcontrol-0 ~]$ [heat-admin@controller-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova/etc/nova/nova.conf neutron auth_type v3password Example output below when migration fails: 2021-05-11 16:45:27.998 [nova-cell-conductor.log] 23 ERROR nova.network.neutronv2.api [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/ 2021-05-11 16:45:27.999 [nova-cell-conductor.log] 23 WARNING nova.scheduler.utils [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Failed to compute_task_migrate_server: Unknown auth type: None: neutronclient.common.exceptions.Unauthorized: Unknown auth type: None 2021-05-11 16:45:28.004 [nova-cell-conductor.log] 23 WARNING nova.scheduler.utils [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] [instance: 709ed075-6419-4e0b-928f-27050b8910ba] Setting instance to ACTIVE state.: neutronclient.common.exceptions.Unauthorized: Unknown auth type: None 2021-05-11 16:45:28.062 [nova-cell-conductor.log] 23 DEBUG nova.objects.instance [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Lazy-loading 'flavor' on Instance uuid 709ed075-6419-4e0b-928f-27050b8910ba obj_load_attr /usr/lib/python3.6/site-packages/nova/objects/instance.py:1091 2021-05-11 16:45:28.098 [nova-cell-conductor.log] 23 DEBUG nova.objects.instance [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Lazy-loading 'metadata' on Instance uuid 709ed075-6419-4e0b-928f-27050b8910ba obj_load_attr /usr/lib/python3.6/site-packages/nova/objects/instance.py:1091 2021-05-11 16:45:28.136 [nova-cell-conductor.log] 23 DEBUG nova.objects.instance [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Lazy-loading 'info_cache' on Instance uuid 709ed075-6419-4e0b-928f-27050b8910ba obj_load_attr /usr/lib/python3.6/site-packages/nova/objects/instance.py:1091 2021-05-11 16:45:28.223 [nova-cell-conductor.log] 23 ERROR oslo_messaging.rpc.server [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Exception during message handling: neutronclient.common.exceptions.Unauthorized: Unknown auth type: None 2021-05-11 16:45:28.223 [nova-cell-conductor.log] 23 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2021-05-11 16:45:28.223 [nova-cell-conductor.log] 23 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming Version-Release number of selected component (if applicable): [stack@undercloud-0 ~]$ cat core_puddle_version RHOS-16.1-RHEL-8-20210506.n.1 [stack@undercloud-0 ~]$ cat /etc/rhosp-release Red Hat OpenStack Platform release 16.1.6 GA (Train) How reproducible: ~50% running the same suite of tests on the same environment Steps to Reproduce: 1. Setup a multi-cell environment with TLS-E 2. Create an instance in the cell1 cluster of the deployment 3. Using admin client, cold migrate a server in and wait until status VERIFY_RESIZE is reached 4. (This might be triggered due to parallel migration tests happening) Actual results: Instance fails to cold migrate to new host Expected results: Instance should migrate to a new host Additional info: Test logs can be found here [3] Relevant nova logs are merged and attached. The instance id involved in the fail scenario is 709ed075-6419-4e0b-928f-27050b8910ba and the request-uuid assoicate with the migration is req-c002f919-623f-4917-b3a4-e54fc60c063d [1] https://github.com/openstack/nova/blob/stable/train/nova/network/neutronv2/api.py#L82 [2] https://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_migrations.py#L171 [3] https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-compute-nova-16.1_director-rhel-virthost-1cont_1comp_1cellcont_2cellcomp_1ipa-ipv4-geneve-multi-cell-tls-everywhere-phase3/5/testReport/tempest.api.compute.admin.test_migrations/MigrationsAdminTest/test_revert_cold_migration_id_caa1aa8b_f4ef_4374_be0d_95f001c2ac2d_/ --- Additional comment from on 2021-05-14 13:42:39 UTC --- looking at the merged logs i can see the conductor does not have the neutron admin credentials 2021-05-11 16:02:05.847 [nova-conductor.log] 7 DEBUG oslo_service.service [-] neutron.auth_section = None log_opt_values /usr/lib/python3.6/site-packages/oslo_config/cfg.py:2589 │ │2021-05-11 16:02:05.847 [nova-conductor.log] 7 DEBUG oslo_service.service [-] neutron.auth_type = v3password log_opt_values /usr/lib/python3.6/site-packages/oslo_config/cfg.py:2589 so i suspect that this is why its failing this is the relevent log http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-compute-nova-16.1_director-rhel-virthost-1cont_1comp_1cellcont_2cellcomp_1ipa-ipv4-geneve-multi-cell-tls-everywhere-phase3/5/controller-0/var/lib/config-data/nova/etc/nova/nova.conf.gz Authentication type to load (string value) # Deprecated group;name - [neutron]/auth_plugin #auth_type=<None> auth_type=v3password # Config Section from which to load plugin specific options (string value) #auth_section=<None> # Authentication URL (string value) #auth_url=<None> auth_url=https://overcloud.internalapi.redhat.local:5000/v3 # Scope for system operations (string value) #system_scope=<None> # Domain ID to scope to (string value) #domain_id=<None> # Domain name to scope to (string value) #domain_name=<None> # Project ID to scope to (string value) #project_id=<None> # Project name to scope to (string value) #project_name=<None> project_name=service # Domain ID containing project (string value) #project_domain_id=<None> # Domain name containing project (string value) #project_domain_name=<None> project_domain_name=Default # Trust ID (string value) #trust_id=<None> # Optional domain ID to use with v3 and v2 parameters. It will be used for both # the user and project domain in v3 and ignored in v2 authentication (string # value) #default_domain_id=<None> # Optional domain name to use with v3 API and v2 parameters. It will be used for # both the user and project domain in v3 and ignored in v2 authentication # (string value) #default_domain_name=<None> # User ID (string value) #user_id=<None> # Username (string value) # Deprecated group;name - [neutron]/user_name #username=<None> username=neutron # User's domain id (string value) #user_domain_id=<None> # User's domain name (string value) #user_domain_name=<None> user_domain_name=Default # User's password (string value) #password=<None> password=qPe6rrMXgW2sFlD314F0K91v1 so it should be able to conenct the shduler share the same nova.conf i belive btu it seams to think the auth type is none 2021-05-11 16:45:27.998 [nova-cell-conductor.log] 23 ERROR nova.network.neutronv2.api [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] The [neutron] section of your nova configuration file must be configured for authentication with the networking service endpoint. See the networking service install guide for details: https://docs.openstack.org/neutron/latest/install/ 2021-05-11 16:45:27.999 [nova-cell-conductor.log] 23 WARNING nova.scheduler.utils [req-c002f919-623f-4917-b3a4-e54fc60c063d 637886e2f55840e6a3381c535ba7ec4f 20dd17d014fd442fbac7fdd6b9c006b6 - default default] Failed to compute_task_migrate_server: Unknown auth type: None: neutronclient.common.exceptions.Unauthorized: Unknown auth type: None yet at startup it was detected corrctly as v3password │2021-05-11 16:02:08.801 [nova-scheduler.log] 7 DEBUG oslo_service.service [req-a383177d-fd3b-49e4-bfef-e844cf46dd83 - - - - -] neutron.auth_type = v3password so there si obviously some wrong here that is cause the the constuciotn of the neutron client to fail in nova.network.neutronv2.api
Created attachment 1790014 [details] Compute tempest tests with cold migration tests for verification
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483