Bug 1652105
| Summary: | [OSP14] cannot update overcloud using custom NovaPassword tripleo password | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Artem Hrechanychenko <ahrechan> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Rajesh Tailor <ratailor> |
| Status: | CLOSED ERRATA | QA Contact: | Archit Modi <amodi> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 14.0 (Rocky) | CC: | ahrechan, aschultz, ccopello, chjones, dciabrin, hrybacki, lyarwood, mbooth, mburns, mschuppe, ratailor, rheslop, rmascena |
| Target Milestone: | z2 | Keywords: | TestOnly, Triaged, ZStream |
| Target Release: | 14.0 (Rocky) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-9.2.1-0.20190119154859.fe11ade.el7ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-04-30 17:51:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
OK quick update,
The MysqlClustercheckPassword cannot be update currently as we lack the orchestration machanism to tell the galera resource agent to stop polling the galera database with the old credentials, and start using new ones. However, these credentials are only used to check whether mysql is running, so let's put that aside.
The stack redeploy succeeds in updating almost all the passwords mentioned in the bug report, except NovaPassword. This confirms that the general password update mechanism is working.
When trying to update the NovaPassword, the following happen in sequence:
. docker-puppet regenerates the configs for all the nova services, and store them in /var/lib/config-data/puppet-generated/nova*
. container mysql_init_bundle is restarted, and runs a puppet code that updates passwords in the mysql db for users nova and nova_api.
. all the nova containers are restarted due to config change.
When logging on the env after the update failure, I can see that the mysql password updated was successful:
[root@controller-0 e]# mysql -unova -papassword -h'fd00:fd00:fd00:2000::14'
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 647353
Server version: 10.1.20-MariaDB MariaDB Server
I also see that nova services got restarted and are successfully running, except nova_api_discover_hosts:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
138619e78ec9 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2 "/usr/bin/bootstra..." 47 hours ago Exited (1) 47 hours ago nova_api_discover_hosts
ae84b307bbfa 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_metadata
f4cebe8bcda5 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_api
674833dabbf2 192.168.24.1:8787/rhosp14/openstack-nova-scheduler:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_scheduler
b81c2fce0a2c 192.168.24.1:8787/rhosp14/openstack-nova-novncproxy:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (unhealthy) nova_vnc_proxy
3b7ae192e44a 192.168.24.1:8787/rhosp14/openstack-nova-consoleauth:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_consoleauth
e5436e6b1d5c 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours nova_api_cron
1f4da6336b14 192.168.24.1:8787/rhosp14/openstack-nova-conductor:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_conductor
5cbd618f9d18 192.168.24.1:8787/rhosp14/openstack-nova-placement-api:2018-11-29.2 "kolla_start" 47 hours ago Up 47 hours (healthy) nova_placement
All those containers got restarted after the mysql_init_bundle changed the nova passwords:
[root@controller-0 e]# docker inspect 2>&1 mysql_init_bundle | grep -i started
"StartedAt": "2018-12-04T21:38:27.867555412Z",
And I know that the nova containers are using the new credentials successfully to connect to the db:
[root@controller-0 e]# docker cp nova_api:/etc/nova/nova.conf - | tar xO | grep ^connection=mysql
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
[root@controller-0 e]# docker cp nova_api_discover_hosts:/etc/nova/nova.conf - | tar xO | grep ^connection=mysql
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
All this points to container nova_api_discover_hosts misbehaving, ultimately yield a failure:
"stdout: (cellv2) Running cell_v2 host discovery",
"(cellv2) Waiting 600 seconds for hosts to register",
"(cellv2) compute node compute-0.localdomain has not registered",
"(cellv2) compute node compute-1.localdomain has not registered",
"(cellv2) Waiting 597 seconds for hosts to register",
"(cellv2) Waiting 565 seconds for hosts to register",
"(cellv2) Waiting 532 seconds for hosts to register",
"(cellv2) Waiting 500 seconds for hosts to register",
"(cellv2) Waiting 467 seconds for hosts to register",
"(cellv2) Waiting 435 seconds for hosts to register",
"(cellv2) Waiting 402 seconds for hosts to register",
"(cellv2) Waiting 370 seconds for hosts to register",
"(cellv2) Waiting 338 seconds for hosts to register",
"(cellv2) Waiting 305 seconds for hosts to register",
"(cellv2) Waiting 273 seconds for hosts to register",
"(cellv2) Waiting 240 seconds for hosts to register",
"(cellv2) Waiting 208 seconds for hosts to register",
"(cellv2) Waiting 176 seconds for hosts to register",
"(cellv2) Waiting 143 seconds for hosts to register",
"(cellv2) Waiting 111 seconds for hosts to register",
"(cellv2) Waiting 78 seconds for hosts to register",
"(cellv2) Waiting 46 seconds for hosts to register",
"(cellv2) Waiting 14 seconds for hosts to register",
"(cellv2) WARNING: timeout waiting for nodes to register, running host discovery regardless",
"(cellv2) Expected host list: compute-0.localdomain compute-1.localdomain",
"(cellv2) Detected host list:",
"(cellv2) Running host discovery...",
"Found 2 cell mappings.",
"Skipping cell0 since it does not contain hosts.",
"Getting computes from cell 'default': d4d1e1c1-bede-4ac8-834b-6bb53f1d4401",
"An error has occurred:",
"Traceback (most recent call last):",
" File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 2310, in main",
" ret = fn(*fn_args, **fn_kwargs)",
" File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1426, in discover_hosts",
" by_service)",
" File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 265, in discover_hosts",
" File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 221, in _check_and_create_host_mappings",
" ctxt, 'nova-compute', include_disabled=True)",
" File \"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\", line 184, in wrapper",
" result = fn(cls, context, *args, **kwargs)",
" File \"/usr/lib/python2.7/site-packages/nova/objects/service.py\", line 586, in get_by_binary",
[...]
and with the end of the stack trace:
[...]
" self.connect()",
" File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 932, in connect",
" self._request_authentication()",
" File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1152, in _request_authentication",
" auth_packet = self._read_packet()",
" File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1014, in _read_packet",
" packet.check_error()",
" File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 393, in check_error",
" err.raise_mysql_exception(self._data)",
" File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 107, in raise_mysql_exception",
" raise errorclass(errno, errval)",
"OperationalError: (pymysql.err.OperationalError) (1045, u\"Access denied for user 'nova'@'fd00:fd00:fd00:2000::15' (using password: YES)\") (Background on this error at: http://sqlalche.me/e/e3q8)",
"stdout: 44ba425004cc4e79bac175f40873a21494d75f92b741426d2b4e344b02e82c67"
So nova_api_discover_hosts.sh extracted some invalid credentials that made it try to connect as user 'nova'@'fd00:fd00:fd00:2000::15', for which there's no user defined in the mysql database:
MariaDB [(none)]> select user,host from mysql.user where user like 'nova%';
+----------------+-------------------------+
| user | host |
+----------------+-------------------------+
| nova | % |
| nova_api | % |
| nova_placement | % |
| nova | fd00:fd00:fd00:2000::12 |
| nova_api | fd00:fd00:fd00:2000::12 |
| nova_placement | fd00:fd00:fd00:2000::12 |
| nova | fd00:fd00:fd00:2000::14 |
| nova_api | fd00:fd00:fd00:2000::14 |
| nova_placement | fd00:fd00:fd00:2000::14 |
+----------------+-------------------------+
if fact this IP fd00:fd00:fd00:2000::15 is that of controller-2, but it should not be used as it may be deleted by galera at any time there a SST synchronization. If nova services wants to access mysql via a local controller-local NIC, then the nova must create three users in the DB at stack creation time, to make sure DB will not hold controller-specific data.
Summary of what we got so far:
In the deploy process the database_connection for the cell is not being updated and therefore the discover_hosts command fails to connect to the db as we still have the old password in the database:
()[root@controller-0 /]# su nova -s /bin/bash -c "/usr/bin/nova-manage cell_v2 discover_hosts --by-service --verbose"
here we still see the old nova pwd in the cell_mappings table:
MariaDB [nova_api]> select * from cell_mappings;
+---------------------+------------+----+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| created_at | updated_at | id | uuid | name | transport_url
| database_connection | disabled |
+---------------------+------------+----+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| 2018-12-04 17:30:54 | NULL | 2 | 00000000-0000-0000-0000-000000000000 | cell0 | none:///
| mysql+pymysql://nova:LcJtm4cGAjgWgEFKuAEjPUp2l@[fd00:fd00:fd00:2000::14]/nova_cell0?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo | 0 |
| 2018-12-04 17:31:02 | NULL | 5 | d4d1e1c1-bede-4ac8-834b-6bb53f1d4401 | default | rabbit://guest:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672,guest:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672,guest│····························
:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672/?ssl=0 | mysql+pymysql://nova:LcJtm4cGAjgWgEFKuAEjPUp2l@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf | 0 |
+---------------------+------------+----+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
2 rows in set (0.00 sec)
When we update the database_connection the discover_hosts command work:
()[root@controller-0 /]$ nova-manage cell_v2 update_cell --name='default' --database_connection='mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf'
()[root@controller-0 /]$ su nova -s /bin/bash -c "/usr/bin/nova-manage cell_v2 list_cells"
+---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------
---+----------+
| Name | UUID | Transport URL | Database Connection
| Disabled |
+---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------
---+----------+
| cell0 | 00000000-0000-0000-0000-000000000000 | none:/ | mysql+pymysql://nova:****@[fd00:fd00:fd00:2000::14]/nova_cell0?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripl
eo | False |
| default | d4d1e1c1-bede-4ac8-834b-6bb53f1d4401 | rabbit://guest:****@controller-2.internalapi.localdomain:5672/?ssl=0 | mysql+pymysql://nova:****@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
| False |
+---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------
()[root@controller-0 /]$ su nova -s /bin/bash -c "/usr/bin/nova-manage --debug cell_v2 discover_hosts --by-service --verbose"
Found 2 cell mappings.
Skipping cell0 since it does not contain hosts.
Getting computes from cell 'default': d4d1e1c1-bede-4ac8-834b-6bb53f1d4401
Found 0 unmapped computes in cell: d4d1e1c1-bede-4ac8-834b-6bb53f1d4401
note, I was able to successfully change the nova pwd using templated DB cells url from WIP patch in BZ1613949 parameter_defaults: NovaPassword: apassword [root@controller-0 ~]# mysql -u nova -papassword -h 172.17.1.10 nova -e "select * from services;" | wc -l 18 Changing NovaPassword works in OSP14 with the following commit:
$ git branch -a --contains 9c4fcade65b12048c43ead134e47063a9facadb8
remotes/gerrit/stable/rocky
remotes/openstack/stable/rocky
remotes/rhos/rhos-14.0-patches
$ git show 9c4fcade65b12048c43ead134e47063a9facadb8
commit 9c4fcade65b12048c43ead134e47063a9facadb8
Author: Rabi Mishra <ramishra>
Date: Thu Nov 29 15:07:13 2018 +0530
Mount config-data/puppet-generated/nova for nova_api_ensure_default_cell
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0878 |
Description of problem: Attempt to update overcloud using custom passwords: 3 controller+1 compute TASK [Run puppet host configuration for step 3] ******************************** Wednesday 21 November 2018 07:41:34 -0500 (0:00:00.238) 0:15:53.747 **** changed: [compute-0] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true} Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log. Overcloud configuration failed. cat overcloud_deploy.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/config_lvm.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e ~/containers-prepare-parameter.yaml \ -e ~/tripleo-overcloud-passwords.yaml \ --log-file overcloud_deployment_82.log cat tripleo-overcloud-passwords.yaml parameter_defaults: NeutronMetadataProxySharedSecret: apassword GlancePassword: apassword NovaPassword: apassword GnocchiPassword: apassword HeatPassword: apassword RedisPassword: apassword CinderPassword: apassword SwiftPassword: apassword AdminToken: apassword HaproxyStatsPassword: apassword NeutronPassword: apassword CeilometerPassword: apassword AdminPassword: apassword MysqlClustercheckPassword: apassword [heat-admin@controller-0 ~]$ sudo docker ps -a |grep "Exited (1)" 64c5cfbc5c0a 192.168.24.1:8787/rhosp14/openstack-glance-api:2018-11-09.3 "/usr/bin/bootstra..." 2 hours ago Exited (1) 2 hours ago glance_api_db_sync [heat-admin@controller-0 ~]$ sudo grep "apassword" /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf connection=mysql+pymysql://glance:apassword.1.19/glance?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf password=apassword keystone_db_sync container ()[root@controller-0 /]# grep mysql /etc/keystone/keystone.conf connection=mysql+pymysql://keystone:apassword.1.19/keystone?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf from /var/log/containers/keystone/keystone.log 2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.script.base [-] Script /usr/lib/python2.7/site-packages/keystone/common/sql/migrate_repo/versions/109_add_password_self_service_column.py loaded successfully __init__ /usr/lib/python2.7/site-packages/migrate/versioning/script/base.py:30 2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.repository [-] Repository /usr/lib/python2.7/site-packages/keystone/common/sql/migrate_repo loaded successfully __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:82 2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.repository [-] Config: OrderedDict([('db_settings', OrderedDict([('__name__', 'db_settings'), ('repository_id', 'keystone'), ('version_table', 'migrate_version'), ('required_dbs', '[]'), ('use_timestamp_numbering', 'False')]))]) __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:83 2018-11-21 12:43:48.342 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-11-21 12:43:58.353 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-11-21 12:44:08.364 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -3 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-11-21 12:44:18.374 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -4 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-11-21 12:44:28.385 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -5 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-9.0.1-0.20181013060879.el7ost.noarch How reproducible: Always Steps to Reproduce: 1.Deploy OSP14 2.Create custom env file with overcloud passwords and append to overcloud_deploy.sh script 3.run overcloud_deploy.sh to perform stack update Actual results: Failed due to timeout, keystone_db_sync container is stuck, glance_api_db_sync failed to start Expected results: update_complete Additional info: