Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1652105

Summary: [OSP14] cannot update overcloud using custom NovaPassword tripleo password
Product: Red Hat OpenStack Reporter: Artem Hrechanychenko <ahrechan>
Component: openstack-tripleo-heat-templatesAssignee: Rajesh Tailor <ratailor>
Status: CLOSED ERRATA QA Contact: Archit Modi <amodi>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: ahrechan, aschultz, ccopello, chjones, dciabrin, hrybacki, lyarwood, mbooth, mburns, mschuppe, ratailor, rheslop, rmascena
Target Milestone: z2Keywords: TestOnly, Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-9.2.1-0.20190119154859.fe11ade.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-30 17:51:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Artem Hrechanychenko 2018-11-21 15:34:06 UTC
Description of problem:
Attempt to update overcloud using custom passwords:

3 controller+1 compute

TASK [Run puppet host configuration for step 3] ********************************
Wednesday 21 November 2018  07:41:34 -0500 (0:00:00.238)       0:15:53.747 **** 
changed: [compute-0] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}

Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.
Overcloud configuration failed.


cat overcloud_deploy.sh 
#!/bin/bash

openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /home/stack/virt/config_lvm.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e ~/containers-prepare-parameter.yaml \
-e ~/tripleo-overcloud-passwords.yaml \
--log-file overcloud_deployment_82.log

cat tripleo-overcloud-passwords.yaml 
parameter_defaults:
  NeutronMetadataProxySharedSecret: apassword
  GlancePassword: apassword
  NovaPassword: apassword
  GnocchiPassword: apassword
  HeatPassword: apassword
  RedisPassword: apassword
  CinderPassword: apassword
  SwiftPassword: apassword
  AdminToken: apassword
  HaproxyStatsPassword: apassword
  NeutronPassword: apassword
  CeilometerPassword: apassword
  AdminPassword: apassword
  MysqlClustercheckPassword: apassword



[heat-admin@controller-0 ~]$ sudo docker ps -a |grep "Exited (1)"
64c5cfbc5c0a        192.168.24.1:8787/rhosp14/openstack-glance-api:2018-11-09.3                  "/usr/bin/bootstra..."   2 hours ago         Exited (1) 2 hours ago                       glance_api_db_sync

[heat-admin@controller-0 ~]$ sudo grep "apassword" /var/lib/config-data/puppet-generated/glance_api/etc/glance/glance-api.conf
connection=mysql+pymysql://glance:apassword.1.19/glance?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
password=apassword


keystone_db_sync container
()[root@controller-0 /]# grep mysql /etc/keystone/keystone.conf 

connection=mysql+pymysql://keystone:apassword.1.19/keystone?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf



from /var/log/containers/keystone/keystone.log 
2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.script.base [-] Script /usr/lib/python2.7/site-packages/keystone/common/sql/migrate_repo/versions/109_add_password_self_service_column.py loaded successfully __init__ /usr/lib/python2.7/site-packages/migrate/versioning/script/base.py:30
2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.repository [-] Repository /usr/lib/python2.7/site-packages/keystone/common/sql/migrate_repo loaded successfully __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:82
2018-11-21 12:43:48.328 26 DEBUG migrate.versioning.repository [-] Config: OrderedDict([('db_settings', OrderedDict([('__name__', 'db_settings'), ('repository_id', 'keystone'), ('version_table', 'migrate_version'), ('required_dbs', '[]'), ('use_timestamp_numbering', 'False')]))]) __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:83
2018-11-21 12:43:48.342 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-11-21 12:43:58.353 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-11-21 12:44:08.364 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -3 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-11-21 12:44:18.374 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -4 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
2018-11-21 12:44:28.385 26 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -5 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-9.0.1-0.20181013060879.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1.Deploy OSP14 
2.Create custom env file with overcloud passwords and append to overcloud_deploy.sh script
3.run overcloud_deploy.sh to perform stack update

Actual results:
Failed due to timeout, keystone_db_sync container is stuck, glance_api_db_sync failed to start

Expected results:
update_complete

Additional info:

Comment 11 Damien Ciabrini 2018-12-06 21:42:22 UTC
OK quick update,

The MysqlClustercheckPassword cannot be update currently as we lack the orchestration machanism to tell the galera resource agent to stop polling the galera database with the old credentials, and start using new ones. However, these credentials are only used to check whether mysql is running, so let's put that aside.

The stack redeploy succeeds in updating almost all the passwords mentioned in the bug report, except NovaPassword. This confirms that the general password update mechanism is working.

When trying to update the NovaPassword, the following happen in sequence:
  . docker-puppet regenerates the configs for all the nova services, and store them in /var/lib/config-data/puppet-generated/nova*
  . container mysql_init_bundle is restarted, and runs a puppet code that updates passwords in the mysql db for users nova and nova_api.
  . all the nova containers are restarted due to config change.

When logging on the env after the update failure, I can see that the mysql password updated was successful:

[root@controller-0 e]# mysql -unova -papassword -h'fd00:fd00:fd00:2000::14'                                                                                                                                                      
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 647353
Server version: 10.1.20-MariaDB MariaDB Server

I also see that nova services got restarted and are successfully running, except nova_api_discover_hosts:

CONTAINER ID        IMAGE                                                                 COMMAND                  CREATED             STATUS                    PORTS               NAMES
138619e78ec9        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2             "/usr/bin/bootstra..."   47 hours ago        Exited (1) 47 hours ago                       nova_api_discover_hosts
ae84b307bbfa        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2             "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_metadata
f4cebe8bcda5        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2             "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_api
674833dabbf2        192.168.24.1:8787/rhosp14/openstack-nova-scheduler:2018-11-29.2       "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_scheduler
b81c2fce0a2c        192.168.24.1:8787/rhosp14/openstack-nova-novncproxy:2018-11-29.2      "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       nova_vnc_proxy
3b7ae192e44a        192.168.24.1:8787/rhosp14/openstack-nova-consoleauth:2018-11-29.2     "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_consoleauth
e5436e6b1d5c        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-11-29.2             "kolla_start"            47 hours ago        Up 47 hours                                   nova_api_cron
1f4da6336b14        192.168.24.1:8787/rhosp14/openstack-nova-conductor:2018-11-29.2       "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_conductor
5cbd618f9d18        192.168.24.1:8787/rhosp14/openstack-nova-placement-api:2018-11-29.2   "kolla_start"            47 hours ago        Up 47 hours (healthy)                         nova_placement


All those containers got restarted after the mysql_init_bundle changed the nova passwords:

[root@controller-0 e]# docker inspect 2>&1 mysql_init_bundle | grep -i started                                                                                                                                                                                        
            "StartedAt": "2018-12-04T21:38:27.867555412Z",

And I know that the nova containers are using the new credentials successfully to connect to the db:

[root@controller-0 e]# docker cp nova_api:/etc/nova/nova.conf - | tar xO | grep ^connection=mysql
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf

[root@controller-0 e]# docker cp nova_api_discover_hosts:/etc/nova/nova.conf - | tar xO | grep ^connection=mysql                                                                                                                                                      
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf
connection=mysql+pymysql://nova_api:apassword@[fd00:fd00:fd00:2000::14]/nova_api?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf

All this points to container nova_api_discover_hosts misbehaving, ultimately yield a failure:

        "stdout: (cellv2) Running cell_v2 host discovery",
        "(cellv2) Waiting 600 seconds for hosts to register",
        "(cellv2) compute node compute-0.localdomain has not registered",
        "(cellv2) compute node compute-1.localdomain has not registered",
        "(cellv2) Waiting 597 seconds for hosts to register",
        "(cellv2) Waiting 565 seconds for hosts to register",
        "(cellv2) Waiting 532 seconds for hosts to register",
        "(cellv2) Waiting 500 seconds for hosts to register",
        "(cellv2) Waiting 467 seconds for hosts to register",
        "(cellv2) Waiting 435 seconds for hosts to register",
        "(cellv2) Waiting 402 seconds for hosts to register",
        "(cellv2) Waiting 370 seconds for hosts to register",
        "(cellv2) Waiting 338 seconds for hosts to register",
        "(cellv2) Waiting 305 seconds for hosts to register",
        "(cellv2) Waiting 273 seconds for hosts to register",
        "(cellv2) Waiting 240 seconds for hosts to register",
        "(cellv2) Waiting 208 seconds for hosts to register",
        "(cellv2) Waiting 176 seconds for hosts to register",
        "(cellv2) Waiting 143 seconds for hosts to register",
        "(cellv2) Waiting 111 seconds for hosts to register",
        "(cellv2) Waiting 78 seconds for hosts to register",
        "(cellv2) Waiting 46 seconds for hosts to register",
        "(cellv2) Waiting 14 seconds for hosts to register",
        "(cellv2) WARNING: timeout waiting for nodes to register, running host discovery regardless",
        "(cellv2) Expected host list: compute-0.localdomain compute-1.localdomain",
        "(cellv2) Detected host list:",
        "(cellv2) Running host discovery...",
        "Found 2 cell mappings.",
        "Skipping cell0 since it does not contain hosts.",
        "Getting computes from cell 'default': d4d1e1c1-bede-4ac8-834b-6bb53f1d4401",
        "An error has occurred:",
        "Traceback (most recent call last):",
        "  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 2310, in main",
        "    ret = fn(*fn_args, **fn_kwargs)",
        "  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1426, in discover_hosts",
        "    by_service)",
        "  File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 265, in discover_hosts",
        "  File \"/usr/lib/python2.7/site-packages/nova/objects/host_mapping.py\", line 221, in _check_and_create_host_mappings",
        "    ctxt, 'nova-compute', include_disabled=True)",
        "  File \"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\", line 184, in wrapper",
        "    result = fn(cls, context, *args, **kwargs)",
        "  File \"/usr/lib/python2.7/site-packages/nova/objects/service.py\", line 586, in get_by_binary",
[...]

Comment 12 Damien Ciabrini 2018-12-06 22:05:08 UTC
and with the end of the stack trace:

[...]
        "    self.connect()",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 932, in connect",
        "    self._request_authentication()",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1152, in _request_authentication",
        "    auth_packet = self._read_packet()",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1014, in _read_packet",
        "    packet.check_error()",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 393, in check_error",
        "    err.raise_mysql_exception(self._data)",
        "  File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 107, in raise_mysql_exception",
        "    raise errorclass(errno, errval)",
        "OperationalError: (pymysql.err.OperationalError) (1045, u\"Access denied for user 'nova'@'fd00:fd00:fd00:2000::15' (using password: YES)\") (Background on this error at: http://sqlalche.me/e/e3q8)",
        "stdout: 44ba425004cc4e79bac175f40873a21494d75f92b741426d2b4e344b02e82c67"


So nova_api_discover_hosts.sh extracted some invalid credentials that made it try to connect as user 'nova'@'fd00:fd00:fd00:2000::15', for which there's no user defined in the mysql database:

MariaDB [(none)]> select user,host from mysql.user where user like 'nova%';
+----------------+-------------------------+
| user           | host                    |
+----------------+-------------------------+
| nova           | %                       |
| nova_api       | %                       |
| nova_placement | %                       |
| nova           | fd00:fd00:fd00:2000::12 |
| nova_api       | fd00:fd00:fd00:2000::12 |
| nova_placement | fd00:fd00:fd00:2000::12 |
| nova           | fd00:fd00:fd00:2000::14 |
| nova_api       | fd00:fd00:fd00:2000::14 |
| nova_placement | fd00:fd00:fd00:2000::14 |
+----------------+-------------------------+

if fact this IP fd00:fd00:fd00:2000::15 is that of controller-2, but it should not be used as it may be deleted by galera at any time there a SST synchronization. If nova services wants to access mysql via a local controller-local NIC, then the nova must create three users in the DB at stack creation time, to make sure DB will not hold controller-specific data.

Comment 13 Martin Schuppert 2018-12-07 14:54:25 UTC
Summary of what we got so far:

In the deploy process the database_connection for the cell is not being updated and therefore the discover_hosts command fails to connect to the db as we still have the old password in the database:

 ()[root@controller-0 /]# su nova -s /bin/bash -c "/usr/bin/nova-manage cell_v2 discover_hosts --by-service --verbose"

here we still see the old nova pwd in the cell_mappings table:

MariaDB [nova_api]> select * from cell_mappings;
+---------------------+------------+----+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+        
| created_at          | updated_at | id | uuid                                 | name    | transport_url
                                                                            | database_connection                                                                                                                                        | disabled |        
+---------------------+------------+----+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+        
| 2018-12-04 17:30:54 | NULL       |  2 | 00000000-0000-0000-0000-000000000000 | cell0   | none:///                                                                                                                                                          
                                                                            | mysql+pymysql://nova:LcJtm4cGAjgWgEFKuAEjPUp2l@[fd00:fd00:fd00:2000::14]/nova_cell0?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo |        0 |        
| 2018-12-04 17:31:02 | NULL       |  5 | d4d1e1c1-bede-4ac8-834b-6bb53f1d4401 | default | rabbit://guest:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672,guest:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672,guest│····························
:r9BJD5J4VybUBgZGIVOJe89uK.localdomain:5672/?ssl=0 | mysql+pymysql://nova:LcJtm4cGAjgWgEFKuAEjPUp2l@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf       |        0 |        
+---------------------+------------+----+--------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+        
2 rows in set (0.00 sec)   

When we update the database_connection the discover_hosts command work:
()[root@controller-0 /]$ nova-manage cell_v2 update_cell --name='default' --database_connection='mysql+pymysql://nova:apassword@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf'

()[root@controller-0 /]$ su nova -s /bin/bash -c "/usr/bin/nova-manage cell_v2 list_cells"
+---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------
---+----------+                                                                                                                                                                                                                                              
|   Name  |                 UUID                 |                            Transport URL                             |                                                          Database Connection                                                       
   | Disabled |                                                                                                                                                                                                                                              
+---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------
---+----------+
|  cell0  | 00000000-0000-0000-0000-000000000000 |                                none:/                                | mysql+pymysql://nova:****@[fd00:fd00:fd00:2000::14]/nova_cell0?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripl
eo |  False   |
| default | d4d1e1c1-bede-4ac8-834b-6bb53f1d4401 | rabbit://guest:****@controller-2.internalapi.localdomain:5672/?ssl=0 |    mysql+pymysql://nova:****@[fd00:fd00:fd00:2000::14]/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf 
   |  False   |   
+---------+--------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------


()[root@controller-0 /]$ su nova -s /bin/bash -c "/usr/bin/nova-manage --debug cell_v2 discover_hosts --by-service --verbose"                                                                                                                                
Found 2 cell mappings.                                                                                                                                                                                                                                       
Skipping cell0 since it does not contain hosts.                                                                                                                                                                                                              
Getting computes from cell 'default': d4d1e1c1-bede-4ac8-834b-6bb53f1d4401                                                                                                                                                                                   
Found 0 unmapped computes in cell: d4d1e1c1-bede-4ac8-834b-6bb53f1d4401

Comment 14 Martin Schuppert 2018-12-13 13:33:27 UTC
note, I was able to successfully change the nova pwd using templated DB cells url from WIP patch in BZ1613949

parameter_defaults:
  NovaPassword: apassword

[root@controller-0 ~]# mysql -u nova -papassword -h 172.17.1.10 nova -e "select * from services;" | wc -l
18

Comment 16 Martin Schuppert 2019-02-21 12:11:22 UTC
Changing NovaPassword works in OSP14 with the following commit:

$ git branch -a --contains 9c4fcade65b12048c43ead134e47063a9facadb8
remotes/gerrit/stable/rocky
remotes/openstack/stable/rocky
remotes/rhos/rhos-14.0-patches

$ git show 9c4fcade65b12048c43ead134e47063a9facadb8
commit 9c4fcade65b12048c43ead134e47063a9facadb8
Author: Rabi Mishra <ramishra>
Date:   Thu Nov 29 15:07:13 2018 +0530
     Mount config-data/puppet-generated/nova for nova_api_ensure_default_cell

Comment 28 errata-xmlrpc 2019-04-30 17:51:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0878