Description of problem: Attempt to update overcloud using custom passwords: 3 controller+1 compute TASK [Run puppet host configuration for step 3] ******************************** Wednesday 21 November 2018 07:41:34 -0500 (0:00:00.238) 0:15:53.747 **** changed: [compute-0] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true} Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log. Overcloud configuration failed. cat overcloud_deploy.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/config_lvm.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e ~/containers-prepare-parameter.yaml \ -e ~/tripleo-overcloud-passwords.yaml \ --log-file overcloud_deployment_82.log cat tripleo-overcloud-passwords.yaml parameter_defaults: NeutronMetadataProxySharedSecret: apassword GlancePassword: apassword NovaPassword: apassword GnocchiPassword: apassword HeatPassword: apassword RedisPassword: apassword CinderPassword: apassword SwiftPassword: apassword AdminToken: apassword HaproxyStatsPassword: apassword NeutronPassword: apassword CeilometerPassword: apassword AdminPassword: apassword MysqlClustercheckPassword: apassword [heat-admin@controller-0 ~]$ sudo docker ps -a |grep "Exited (1)" 64c5cfbc5c0a 192.168.24.1:8787/rhosp14/openstack-glance-api:2018-11-09.3 "/usr/bin/bootstra..." 2 hours ago Exited (1) 2 hours ago glance_api_db_sync [heat-admin@controller-0 ~]$ sudo grep "apassword" /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf connection=mysql+pymysql://glance:apassword.1.19/glance?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf password=apassword keystone_db_sync container ()[root@controller-0 /]# grep mysql /etc/keystone/keystone.conf connection=mysql+pymysql://keystone:apassword.1.19/keystone?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf from /var/log/containers/keystone/keystone.log 2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.script.base [-] Script /usr/lib/python2.7/site-packages/keystone/common/sql/migrate_repo/versions/109_add_password_self_service_column.py loaded successfully __init__ /usr/lib/python2.7/site-packages/migrate/versioning/script/base.py:30 2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.repository [-] Repository /usr/lib/python2.7/site-packages/keystone/common/sql/migrate_repo loaded successfully __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:82 2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.repository [-] Config: OrderedDict([('db_settings', OrderedDict([('__name__', 'db_settings'), ('repository_id', 'keystone'), ('version_table', 'migrate_version'), ('required_dbs', '[]'), ('use_timestamp_numbering', 'False')]))]) __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:83 2018-11-21 12:43:48.342 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-11-21 12:43:58.353 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-11-21 12:44:08.364 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -3 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-11-21 12:44:18.374 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -4 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-11-21 12:44:28.385 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -5 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-9.0.1-0.20181013060879.el7ost.noarch How reproducible: Always Steps to Reproduce: 1.Deploy OSP14 2.Create custom env file with overcloud passwords and append to overcloud_deploy.sh script 3.run overcloud_deploy.sh to perform stack update Actual results: Failed due to timeout, keystone_db_sync container is stuck, glance_api_db_sync failed to start Expected results: update_complete Additional info:
OK quick update, The MysqlClustercheckPassword cannot be update currently as we lack the orchestration machanism to tell the galera resource agent to stop polling the galera database with the old credentials, and start using new ones. However, these credentials are only used to check whether mysql is running, so let's put that aside. The stack redeploy succeeds in updating almost all the passwords mentioned in the bug report, except NovaPassword. This confirms that the general password update mechanism is working. When trying to update the NovaPassword, the following happen in sequence: . docker-puppet regenerates the configs for all the nova services, and store them in /var/lib/config-data/puppet-generated/nova* . container mysql_init_bundle is restarted, and runs a puppet code that updates passwords in the mysql db for users nova and nova_api. . all the nova containers are restarted due to config change. When logging on the env after the update failure, I can see that the mysql password updated was successful: [root@controller-0 e]# mysql -unova -papassword -h'fd00:fd00:fd00:2000::14' Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 647353 Server version: 10.1.20-MariaDB MariaDB Server I also see that nova services got restarted and are successfully running, except nova_api_discover_hosts: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 138619e78ec9 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2 "/usr/bin/bootstra..." 47 hours ago Exited (1) 47 hours ago nova_api_discover_hosts ae84b307bbfa 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_metadata f4cebe8bcda5 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_api 674833dabbf2 192.168.24.1:8787/rhosp14/openstack-nova-scheduler:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_scheduler b81c2fce0a2c 192.168.24.1:8787/rhosp14/openstack-nova-novncproxy:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (unhealthy) nova_vnc_proxy 3b7ae192e44a 192.168.24.1:8787/rhosp14/openstack-nova-consoleauth:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_consoleauth e5436e6b1d5c 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours nova_api_cron 1f4da6336b14 192.168.24.1:8787/rhosp14/openstack-nova-conductor:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_conductor 5cbd618f9d18 192.168.24.1:8787/rhosp14/openstack-nova-placement-api:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_placement All those containers got restarted after the mysql_init_bundle changed the nova passwords: [root@controller-0 e]# docker inspect 2>&1 mysql_init_bundle | grep -i started "StartedAt": "2018-12-04T21:38:27.867555412Z", And I know that the nova containers are using the new credentials successfully to connect to the db: [root@controller-0 e]# docker cp nova_api:/etc/nova/nova.conf - | tar xO | grep ^connection=mysql connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf connection=mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf [root@controller-0 e]# docker cp nova_api_discover_hosts:/etc/nova/nova.conf - | tar xO | grep ^connection=mysql connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf connection=mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf All this points to container nova_api_discover_hosts misbehaving, ultimately yield a failure: "stdout: (cellv2) Running cell_v2 host discovery", "(cellv2) Waiting 600 seconds for hosts to register", "(cellv2) compute node compute-0.localdomain has not registered", "(cellv2) compute node compute-1.localdomain has not registered", "(cellv2) Waiting 597 seconds for hosts to register", "(cellv2) Waiting 565 seconds for hosts to register", "(cellv2) Waiting 532 seconds for hosts to register", "(cellv2) Waiting 500 seconds for hosts to register", "(cellv2) Waiting 467 seconds for hosts to register", "(cellv2) Waiting 435 seconds for hosts to register", "(cellv2) Waiting 402 seconds for hosts to register", "(cellv2) Waiting 370 seconds for hosts to register", "(cellv2) Waiting 338 seconds for hosts to register", "(cellv2) Waiting 305 seconds for hosts to register", "(cellv2) Waiting 273 seconds for hosts to register", "(cellv2) Waiting 240 seconds for hosts to register", "(cellv2) Waiting 208 seconds for hosts to register", "(cellv2) Waiting 176 seconds for hosts to register", "(cellv2) Waiting 143 seconds for hosts to register", "(cellv2) Waiting 111 seconds for hosts to register", "(cellv2) Waiting 78 seconds for hosts to register", "(cellv2) Waiting 46 seconds for hosts to register", "(cellv2) Waiting 14 seconds for hosts to register", "(cellv2) WARNING: timeout waiting for nodes to register, running host discovery regardless", "(cellv2) Expected host list: compute-0.localdomain compute-1.localdomain", "(cellv2) Detected host list:", "(cellv2) Running host discovery...", "Found 2 cell mappings.", "Skipping cell0 since it does not contain hosts.", "Getting computes from cell 'default': d4d1e1c1-bede-4ac8-834b-6bb53f1d4401", "An error has occurred:", "Traceback (most recent call last):", " File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 2310, in main", " ret = fn(*fn_args, **fn_kwargs)", " File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1426, in discover_hosts", " by_service)", " File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 265, in discover_hosts", " File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 221, in _check_and_create_host_mappings", " ctxt, 'nova-compute', include_disabled=True)", " File \"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\", line 184, in wrapper", " result = fn(cls, context, *args, **kwargs)", " File \"/usr/lib/python2.7/site-packages/nova/objects/service.py\", line 586, in get_by_binary", [...]
and with the end of the stack trace: [...] " self.connect()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 932, in connect", " self._request_authentication()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1152, in _request_authentication", " auth_packet = self._read_packet()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1014, in _read_packet", " packet.check_error()", " File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 393, in check_error", " err.raise_mysql_exception(self._data)", " File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 107, in raise_mysql_exception", " raise errorclass(errno, errval)", "OperationalError: (pymysql.err.OperationalError) (1045, u\"Access denied for user 'nova'@'fd00:fd00:fd00:2000::15' (using password: YES)\") (Background on this error at: http://sqlalche.me/e/e3q8)", "stdout: 44ba425004cc4e79bac175f40873a21494d75f92b741426d2b4e344b02e82c67" So nova_api_discover_hosts.sh extracted some invalid credentials that made it try to connect as user 'nova'@'fd00:fd00:fd00:2000::15', for which there's no user defined in the mysql database: MariaDB [(none)]> select user,host from mysql.user where user like 'nova%'; +----------------+-------------------------+ | user | host | +----------------+-------------------------+ | nova | % | | nova_api | % | | nova_placement | % | | nova | fd00:fd00:fd00:2000::12 | | nova_api | fd00:fd00:fd00:2000::12 | | nova_placement | fd00:fd00:fd00:2000::12 | | nova | fd00:fd00:fd00:2000::14 | | nova_api | fd00:fd00:fd00:2000::14 | | nova_placement | fd00:fd00:fd00:2000::14 | +----------------+-------------------------+ if fact this IP fd00:fd00:fd00:2000::15 is that of controller-2, but it should not be used as it may be deleted by galera at any time there a SST synchronization. If nova services wants to access mysql via a local controller-local NIC, then the nova must create three users in the DB at stack creation time, to make sure DB will not hold controller-specific data.
Summary of what we got so far: In the deploy process the database_connection for the cell is not being updated and therefore the discover_hosts command fails to connect to the db as we still have the old password in the database: ()[root@controller-0 /]# su nova -s /bin/bash -c "/usr/bin/nova-manage cell_v2 discover_hosts --by-service --verbose" here we still see the old nova pwd in the cell_mappings table: MariaDB [nova_api]> select * from cell_mappings; +---------------------+------------+----+--------------------------------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ | created_at | updated_at | id | uuid | name | transport_url | database_connection | disabled | +---------------------+------------+----+--------------------------------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ | 2018-12-04 17:30:54 | NULL | 2 | 00000000-0000-0000-0000-000000000000 | cell0 | none:/// | mysql+pymysql://nova:LcJtm4cGAjgWgEFKuAEjPUp2l@[fd00:fd00:fd00:2000::14]/nova_cell0?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo | 0 | | 2018-12-04 17:31:02 | NULL | 5 | d4d1e1c1-bede-4ac8-834b-6bb53f1d4401 | default | rabbit://guest:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672,guest:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672,guest│···························· :r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672/?ssl=0 | mysql+pymysql://nova:LcJtm4cGAjgWgEFKuAEjPUp2l@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf | 0 | +---------------------+------------+----+--------------------------------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ 2 rows in set (0.00 sec) When we update the database_connection the discover_hosts command work: ()[root@controller-0 /]$ nova-manage cell_v2 update_cell --name='default' --database_connection='mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf' ()[root@controller-0 /]$ su nova -s /bin/bash -c "/usr/bin/nova-manage cell_v2 list_cells" +---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------ ---+----------+ | Name | UUID | Transport URL | Database Connection | Disabled | +---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------ ---+----------+ | cell0 | 00000000-0000-0000-0000-000000000000 | none:/ | mysql+pymysql://nova:****@[fd00:fd00:fd00:2000::14]/nova_cell0?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripl eo | False | | default | d4d1e1c1-bede-4ac8-834b-6bb53f1d4401 | rabbit://guest:****@controller-2.internalapi.localdomain:5672/?ssl=0 | mysql+pymysql://nova:****@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf | False | +---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------ ()[root@controller-0 /]$ su nova -s /bin/bash -c "/usr/bin/nova-manage --debug cell_v2 discover_hosts --by-service --verbose" Found 2 cell mappings. Skipping cell0 since it does not contain hosts. Getting computes from cell 'default': d4d1e1c1-bede-4ac8-834b-6bb53f1d4401 Found 0 unmapped computes in cell: d4d1e1c1-bede-4ac8-834b-6bb53f1d4401
note, I was able to successfully change the nova pwd using templated DB cells url from WIP patch in BZ1613949 parameter_defaults: NovaPassword: apassword [root@controller-0 ~]# mysql -u nova -papassword -h 172.17.1.10 nova -e "select * from services;" | wc -l 18
Changing NovaPassword works in OSP14 with the following commit: $ git branch -a --contains 9c4fcade65b12048c43ead134e47063a9facadb8 remotes/gerrit/stable/rocky remotes/openstack/stable/rocky remotes/rhos/rhos-14.0-patches $ git show 9c4fcade65b12048c43ead134e47063a9facadb8 commit 9c4fcade65b12048c43ead134e47063a9facadb8 Author: Rabi Mishra <ramishra> Date: Thu Nov 29 15:07:13 2018 +0530 Mount config-data/puppet-generated/nova for nova_api_ensure_default_cell
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0878