Created attachment 1106387 [details] log from journalctl showing the exception that heat got while trying to create a domain Description of problem: The update fails (after ~2 hours of work) on ControllerOvercloudServicesDeployment_Step6. In deployment-show I see: "deploy_stderr": "Warning: nmcli (1.0.6) and NetworkManager (1.0.0) versions don't match. Use --nocheck to suppress the warning.\nWarning: nmcli (1.0.6) and NetworkManager (1.0.0) versions don't match. Use --nocheck to suppress the warning.\nWarning: nmcli (1.0.6) and NetworkManager (1.0.0) versions don't match. Use --nocheck to suppress the warning.\nWarning: nmcli (1.0.6) and NetworkManager (1.0.0) versions don't match. Use --nocheck to suppress the warning.\nWarning: nmcli (1.0.6) and NetworkManager (1.0.0) versions don't match. Use --nocheck to suppress the warning.\nWarning: nmcli (1.0.6) and NetworkManager (1.0.0) versions don't match. Use --nocheck to suppress the warning.\nWarning: nmcli (1.0.6) and NetworkManager (1.0.0) versions don't match. Use --nocheck to suppress the warning.\n\u001b[1;31mWarning: Scope(Class[Keystone]): Execution of db_sync does not depend on $enabled anymore. Please use sync_db instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Glance::Registry]): Execution of db_sync does not depend on $manage_service or $enabled anymore. Please use sync_db instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_host'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_protocol'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_port'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_path'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Concat::Setup]): concat::setup is deprecated as a public API of the concat module and should no longer be directly included in the manifest.\u001b[0m\n\u001b[1;31mWarning: Scope(Swift::Storage::Server[6002]): The default incoming_chmod set to 0644 may yield in error prone directories and will be changed in a later release.\u001b[0m\n\u001b[1;31mWarning: Scope(Swift::Storage::Server[6002]): The default outgoing_chmod set to 0644 may yield in error prone directories and will be changed in a later release.\u001b[0m\n\u001b[1;31mWarning: Scope(Swift::Storage::Server[6001]): The default incoming_chmod set to 0644 may yield in error prone directories and will be changed in a later release.\u001b[0m\n\u001b[1;31mWarning: Scope(Swift::Storage::Server[6001]): The default outgoing_chmod set to 0644 may yield in error prone directories and will be changed in a later release.\u001b[0m\n\u001b[1;31mWarning: Scope(Swift::Storage::Server[6000]): The default incoming_chmod set to 0644 may yield in error prone directories and will be changed in a later release.\u001b[0m\n\u001b[1;31mWarning: Scope(Swift::Storage::Server[6000]): The default outgoing_chmod set to 0644 may yield in error prone directories and will be changed in a later release.\u001b[0m\n\u001b[1;31mError: heat-keystone-setup-domain returned 1 instead of one of [0]\u001b[0m\n\u001b[1;31mError: /Stage[main]/Heat::Keystone::Domain/Exec[heat_domain_create]/returns: change from notrun to 0 failed: heat-keystone-setup-domain returned 1 instead of one of [0]\u001b[0m\n", Note the error in the end: "heat-keystone-setup-domain returned 1 instead of one of [0]" In journalctl of os-collect-config you see an exception that heat got an authorization failure while trying to do its thing. See attachment for the relevant snippet from the log. I tried the command manually on the failed controller, and it worked. However, I am not sure if the command I ran is exactly the same as the update did. Here is the command I ran and the output it returned: [root@overcloud-controller-0 ~]# heat-keystone-setup-domain --stack-domain-admin heat_admin --stack-domain-admin-password rh0s6 --stack-user-domain-name heat Please update your heat.conf with the following in [DEFAULT] stack_user_domain_id=1eb979aba0aa43ae8323a2a7c2135669 stack_domain_admin=heat_admin stack_domain_admin_password=rh0s6 Version-Release number of selected component (if applicable): python-rdomanager-oscplugin-0.0.10-22.el7ost.noarch How reproducible: On my bare metal setup it happens 100% of the times. It's impossible to update. Steps to Reproduce: 1. Deploy 3 controllers and 1 compute on bare metals with osp-d 7.1. 2. Update to 7.2 according to the kbase Actual results: Update fails after ~2 hours Additional info: There is an old bug on something which is a bit similar: https://bugzilla.redhat.com/show_bug.cgi?id=1204866
what was the initial deployment command and the update command? can you attach (or make available somewhere) all the custom environment yaml files being used?
This was due to KeystoneAdmin network changing Notice: /Stage[main]/Keystone/Keystone_config[DEFAULT/admin_bind_host]/value: value changed '172.16.0.35' to '10.35.191.15'