Bug 1652105 - [OSP14] cannot update overcloud using custom NovaPassword tripleo password
Summary: [OSP14] cannot update overcloud using custom NovaPassword tripleo password
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z2
: 14.0 (Rocky)
Assignee: Rajesh Tailor
QA Contact: Archit Modi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-21 15:34 UTC by Artem Hrechanychenko
Modified: 2019-04-30 17:51 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.2.1-0.20190119154859.fe11ade.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-30 17:51:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1805803 0 None None None 2019-02-21 12:13:30 UTC
OpenStack gerrit 623138 0 'None' MERGED Mount config-data/puppet-generated/nova for nova_api_ensure_default_cell 2020-12-29 10:32:30 UTC
Red Hat Product Errata RHBA-2019:0878 0 None None None 2019-04-30 17:51:25 UTC

Description Artem Hrechanychenko 2018-11-21 15:34:06 UTC
Description of problem:
Attempt to update overcloud using custom passwords:

3 controller+1 compute

TASK [Run puppet host configuration for step 3] ********************************
Wednesday 21 November 2018  07:41:34 -0500 (0:00:00.238)       0:15:53.747 **** 
changed: [compute-0] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}

Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.
Overcloud configuration failed.


cat overcloud_deploy.sh 
#!/bin/bash

openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /home/stack/virt/config_lvm.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e ~/containers-prepare-parameter.yaml \
-e ~/tripleo-overcloud-passwords.yaml \
--log-file overcloud_deployment_82.log

cat tripleo-overcloud-passwords.yaml 
parameter_defaults:
  NeutronMetadataProxySharedSecret: apassword
  GlancePassword: apassword
  NovaPassword: apassword
  GnocchiPassword: apassword
  HeatPassword: apassword
  RedisPassword: apassword
  CinderPassword: apassword
  SwiftPassword: apassword
  AdminToken: apassword
  HaproxyStatsPassword: apassword
  NeutronPassword: apassword
  CeilometerPassword: apassword
  AdminPassword: apassword
  MysqlClustercheckPassword: apassword



[heat-admin@controller-0 ~]$ sudo docker ps -a |grep "Exited (1)"
64c5cfbc5c0a        192.168.24.1:8787/rhosp14/openstack-glance-api:2018-11-09.3                  "/usr/bin/bootstra..."   2 hours ago         Exited (1) 2 hours ago                       glance_api_db_sync

[heat-admin@controller-0 ~]$ sudo grep "apassword" /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf
connection=mysql+pymysql://glance:apassword.1.19/glance?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
password=apassword


keystone_db_sync container
()[root@controller-0 /]# grep mysql /etc/keystone/keystone.conf 

connection=mysql+pymysql://keystone:apassword.1.19/keystone?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf



from /var/log/containers/keystone/keystone.log 
2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.script.base [-] Script /usr/lib/python2.7/site-packages/keystone/common/sql/migrate_repo/versions/109_add_password_self_service_column.py loaded successfully __init__ /usr/lib/python2.7/site-packages/migrate/versioning/script/base.py:30
2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.repository [-] Repository /usr/lib/python2.7/site-packages/keystone/common/sql/migrate_repo loaded successfully __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:82
2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.repository [-] Config: OrderedDict([('db_settings', OrderedDict([('__name__', 'db_settings'), ('repository_id', 'keystone'), ('version_table', 'migrate_version'), ('required_dbs', '[]'), ('use_timestamp_numbering', 'False')]))]) __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:83
2018-11-21 12:43:48.342 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-11-21 12:43:58.353 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-11-21 12:44:08.364 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -3 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-11-21 12:44:18.374 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -4 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-11-21 12:44:28.385 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -5 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-9.0.1-0.20181013060879.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1.Deploy OSP14 
2.Create custom env file with overcloud passwords and append to overcloud_deploy.sh script
3.run overcloud_deploy.sh to perform stack update

Actual results:
Failed due to timeout, keystone_db_sync container is stuck, glance_api_db_sync failed to start

Expected results:
update_complete

Additional info:

Comment 11 Damien Ciabrini 2018-12-06 21:42:22 UTC
OK quick update,

The MysqlClustercheckPassword cannot be update currently as we lack the orchestration machanism to tell the galera resource agent to stop polling the galera database with the old credentials, and start using new ones. However, these credentials are only used to check whether mysql is running, so let's put that aside.

The stack redeploy succeeds in updating almost all the passwords mentioned in the bug report, except NovaPassword. This confirms that the general password update mechanism is working.

When trying to update the NovaPassword, the following happen in sequence:
  . docker-puppet regenerates the configs for all the nova services, and store them in /var/lib/config-data/puppet-generated/nova*
  . container mysql_init_bundle is restarted, and runs a puppet code that updates passwords in the mysql db for users nova and nova_api.
  . all the nova containers are restarted due to config change.

When logging on the env after the update failure, I can see that the mysql password updated was successful:

[root@controller-0 e]# mysql -unova -papassword -h'fd00:fd00:fd00:2000::14'                                                                                                                                                      
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 647353
Server version: 10.1.20-MariaDB MariaDB Server

I also see that nova services got restarted and are successfully running, except nova_api_discover_hosts:

CONTAINER ID        IMAGE                                                                 COMMAND                  CREATED             STATUS                    PORTS               NAMES
138619e78ec9        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2             "/usr/bin/bootstra..."   47 hours ago        Exited (1) 47 hours ago                       nova_api_discover_hosts
ae84b307bbfa        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2             "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_metadata
f4cebe8bcda5        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2             "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_api
674833dabbf2        192.168.24.1:8787/rhosp14/openstack-nova-scheduler:2018-11-29.2       "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_scheduler
b81c2fce0a2c        192.168.24.1:8787/rhosp14/openstack-nova-novncproxy:2018-11-29.2      "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       nova_vnc_proxy
3b7ae192e44a        192.168.24.1:8787/rhosp14/openstack-nova-consoleauth:2018-11-29.2     "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_consoleauth
e5436e6b1d5c        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2             "kolla_start"            47 hours ago        Up 47 hours                                   nova_api_cron
1f4da6336b14        192.168.24.1:8787/rhosp14/openstack-nova-conductor:2018-11-29.2       "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_conductor
5cbd618f9d18        192.168.24.1:8787/rhosp14/openstack-nova-placement-api:2018-11-29.2   "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_placement


All those containers got restarted after the mysql_init_bundle changed the nova passwords:

[root@controller-0 e]# docker inspect 2>&1 mysql_init_bundle | grep -i started                                                                                                                                                                                        
            "StartedAt": "2018-12-04T21:38:27.867555412Z",

And I know that the nova containers are using the new credentials successfully to connect to the db:

[root@controller-0 e]# docker cp nova_api:/etc/nova/nova.conf - | tar xO | grep ^connection=mysql
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf

[root@controller-0 e]# docker cp nova_api_discover_hosts:/etc/nova/nova.conf - | tar xO | grep ^connection=mysql                                                                                                                                                      
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf

All this points to container nova_api_discover_hosts misbehaving, ultimately yield a failure:

        "stdout: (cellv2) Running cell_v2 host discovery",
        "(cellv2) Waiting 600 seconds for hosts to register",
        "(cellv2) compute node compute-0.localdomain has not registered",
        "(cellv2) compute node compute-1.localdomain has not registered",
        "(cellv2) Waiting 597 seconds for hosts to register",
        "(cellv2) Waiting 565 seconds for hosts to register",
        "(cellv2) Waiting 532 seconds for hosts to register",
        "(cellv2) Waiting 500 seconds for hosts to register",
        "(cellv2) Waiting 467 seconds for hosts to register",
        "(cellv2) Waiting 435 seconds for hosts to register",
        "(cellv2) Waiting 402 seconds for hosts to register",
        "(cellv2) Waiting 370 seconds for hosts to register",
        "(cellv2) Waiting 338 seconds for hosts to register",
        "(cellv2) Waiting 305 seconds for hosts to register",
        "(cellv2) Waiting 273 seconds for hosts to register",
        "(cellv2) Waiting 240 seconds for hosts to register",
        "(cellv2) Waiting 208 seconds for hosts to register",
        "(cellv2) Waiting 176 seconds for hosts to register",
        "(cellv2) Waiting 143 seconds for hosts to register",
        "(cellv2) Waiting 111 seconds for hosts to register",
        "(cellv2) Waiting 78 seconds for hosts to register",
        "(cellv2) Waiting 46 seconds for hosts to register",
        "(cellv2) Waiting 14 seconds for hosts to register",
        "(cellv2) WARNING: timeout waiting for nodes to register, running host discovery regardless",
        "(cellv2) Expected host list: compute-0.localdomain compute-1.localdomain",
        "(cellv2) Detected host list:",
        "(cellv2) Running host discovery...",
        "Found 2 cell mappings.",
        "Skipping cell0 since it does not contain hosts.",
        "Getting computes from cell 'default': d4d1e1c1-bede-4ac8-834b-6bb53f1d4401",
        "An error has occurred:",
        "Traceback (most recent call last):",
        "  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 2310, in main",
        "    ret = fn(*fn_args, **fn_kwargs)",
        "  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1426, in discover_hosts",
        "    by_service)",
        "  File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 265, in discover_hosts",
        "  File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 221, in _check_and_create_host_mappings",
        "    ctxt, 'nova-compute', include_disabled=True)",
        "  File \"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\", line 184, in wrapper",
        "    result = fn(cls, context, *args, **kwargs)",
        "  File \"/usr/lib/python2.7/site-packages/nova/objects/service.py\", line 586, in get_by_binary",
[...]

Comment 12 Damien Ciabrini 2018-12-06 22:05:08 UTC
and with the end of the stack trace:

[...]
        "    self.connect()",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 932, in connect",
        "    self._request_authentication()",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1152, in _request_authentication",
        "    auth_packet = self._read_packet()",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1014, in _read_packet",
        "    packet.check_error()",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 393, in check_error",
        "    err.raise_mysql_exception(self._data)",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 107, in raise_mysql_exception",
        "    raise errorclass(errno, errval)",
        "OperationalError: (pymysql.err.OperationalError) (1045, u\"Access denied for user 'nova'@'fd00:fd00:fd00:2000::15' (using password: YES)\") (Background on this error at: http://sqlalche.me/e/e3q8)",
        "stdout: 44ba425004cc4e79bac175f40873a21494d75f92b741426d2b4e344b02e82c67"


So nova_api_discover_hosts.sh extracted some invalid credentials that made it try to connect as user 'nova'@'fd00:fd00:fd00:2000::15', for which there's no user defined in the mysql database:

MariaDB [(none)]> select user,host from mysql.user where user like 'nova%';
+----------------+-------------------------+
| user           | host                    |
+----------------+-------------------------+
| nova           | %                       |
| nova_api       | %                       |
| nova_placement | %                       |
| nova           | fd00:fd00:fd00:2000::12 |
| nova_api       | fd00:fd00:fd00:2000::12 |
| nova_placement | fd00:fd00:fd00:2000::12 |
| nova           | fd00:fd00:fd00:2000::14 |
| nova_api       | fd00:fd00:fd00:2000::14 |
| nova_placement | fd00:fd00:fd00:2000::14 |
+----------------+-------------------------+

if fact this IP fd00:fd00:fd00:2000::15 is that of controller-2, but it should not be used as it may be deleted by galera at any time there a SST synchronization. If nova services wants to access mysql via a local controller-local NIC, then the nova must create three users in the DB at stack creation time, to make sure DB will not hold controller-specific data.

Comment 13 Martin Schuppert 2018-12-07 14:54:25 UTC
Summary of what we got so far:

In the deploy process the database_connection for the cell is not being updated and therefore the discover_hosts command fails to connect to the db as we still have the old password in the database:

 ()[root@controller-0 /]# su nova -s /bin/bash -c "/usr/bin/nova-manage cell_v2 discover_hosts --by-service --verbose"

here we still see the old nova pwd in the cell_mappings table:

MariaDB [nova_api]> select * from cell_mappings;
+---------------------+------------+----+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+        
| created_at          | updated_at | id | uuid                                 | name    | transport_url
                                                                            | database_connection                                                                                                                                        | disabled |        
+---------------------+------------+----+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+        
| 2018-12-04 17:30:54 | NULL       |  2 | 00000000-0000-0000-0000-000000000000 | cell0   | none:///                                                                                                                                                          
                                                                            | mysql+pymysql://nova:LcJtm4cGAjgWgEFKuAEjPUp2l@[fd00:fd00:fd00:2000::14]/nova_cell0?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo |        0 |        
| 2018-12-04 17:31:02 | NULL       |  5 | d4d1e1c1-bede-4ac8-834b-6bb53f1d4401 | default | rabbit://guest:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672,guest:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672,guest│····························
:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672/?ssl=0 | mysql+pymysql://nova:LcJtm4cGAjgWgEFKuAEjPUp2l@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf       |        0 |        
+---------------------+------------+----+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+        
2 rows in set (0.00 sec)   

When we update the database_connection the discover_hosts command work:
()[root@controller-0 /]$ nova-manage cell_v2 update_cell --name='default' --database_connection='mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf'

()[root@controller-0 /]$ su nova -s /bin/bash -c "/usr/bin/nova-manage cell_v2 list_cells"
+---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------
---+----------+                                                                                                                                                                                                                                              
|   Name  |                 UUID                 |                            Transport URL                             |                                                          Database Connection                                                       
   | Disabled |                                                                                                                                                                                                                                              
+---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------
---+----------+
|  cell0  | 00000000-0000-0000-0000-000000000000 |                                none:/                                | mysql+pymysql://nova:****@[fd00:fd00:fd00:2000::14]/nova_cell0?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripl
eo |  False   |
| default | d4d1e1c1-bede-4ac8-834b-6bb53f1d4401 | rabbit://guest:****@controller-2.internalapi.localdomain:5672/?ssl=0 |    mysql+pymysql://nova:****@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf 
   |  False   |   
+---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------


()[root@controller-0 /]$ su nova -s /bin/bash -c "/usr/bin/nova-manage --debug cell_v2 discover_hosts --by-service --verbose"                                                                                                                                
Found 2 cell mappings.                                                                                                                                                                                                                                       
Skipping cell0 since it does not contain hosts.                                                                                                                                                                                                              
Getting computes from cell 'default': d4d1e1c1-bede-4ac8-834b-6bb53f1d4401                                                                                                                                                                                   
Found 0 unmapped computes in cell: d4d1e1c1-bede-4ac8-834b-6bb53f1d4401

Comment 14 Martin Schuppert 2018-12-13 13:33:27 UTC
note, I was able to successfully change the nova pwd using templated DB cells url from WIP patch in BZ1613949

parameter_defaults:
  NovaPassword: apassword

[root@controller-0 ~]# mysql -u nova -papassword -h 172.17.1.10 nova -e "select * from services;" | wc -l
18

Comment 16 Martin Schuppert 2019-02-21 12:11:22 UTC
Changing NovaPassword works in OSP14 with the following commit:

$ git branch -a --contains 9c4fcade65b12048c43ead134e47063a9facadb8
remotes/gerrit/stable/rocky
remotes/openstack/stable/rocky
remotes/rhos/rhos-14.0-patches

$ git show 9c4fcade65b12048c43ead134e47063a9facadb8
commit 9c4fcade65b12048c43ead134e47063a9facadb8
Author: Rabi Mishra <ramishra>
Date:   Thu Nov 29 15:07:13 2018 +0530
     Mount config-data/puppet-generated/nova for nova_api_ensure_default_cell

Comment 28 errata-xmlrpc 2019-04-30 17:51:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0878


Note You need to log in before you can comment on or make changes to this bug.