Description of problem: OSP10 -> OSP11 upgrade: after running major-upgrade-composable-steps.yaml nova-api cannot connect to MySQL on 2/3 controllers. This results in timeouts and 500 ERROR (ClientException): Unknown Error (HTTP 504) when calling the nova api, making it very difficult to manage the nova instances prior to upgrading the compute nodes. Even after running the upgrade converge step 2/3 controllers cannot reach MySQL leaving the upgraded environment in a semi working state. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-6.0.0-0.3.0rc2.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10 with 3 ctrl, 2 computes, 3 ceph nodes 2. Upgrde to OSP11 Actual results: controller-1 and controller-2 report in /var/log/nova/nova-api.log: 2017-03-22 17:25:43.189 377885 WARNING oslo_db.sqlalchemy.engines [req-41cd5904-8031-48d1-9e12-56e73c62f5cd - - - - -] SQL connection failed. -123 attempts left. Expected results: nova api on all controllers is in a functional state. Additional info: Attaching the sosreports.
Hi, cannot find anything obvious. VIP is on ctl-2. On ctl-2 and ctl-1 we have this kind of error: /var/log/nova/nova-api.log:2017-03-22 18:49:12.394 195971 DEBUG nova.api.openstack.wsgi [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] Calling method '<bound method ServersController.detail of <nova.api.openstack.compute.servers.ServersController object at 0x79097d0>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:626 /var/log/nova/nova-api.log:2017-03-22 18:49:12.401 195971 DEBUG nova.compute.api [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] Searching by: {'deleted': False} get_all /usr/lib/python2.7/site-packages/nova/compute/api.py:2344 /var/log/nova/nova-api.log:2017-03-22 18:49:12.414 195971 DEBUG oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:261 .... /var/log/nova/nova-api.log:2017-03-22 18:55:12.787 195971 WARNING oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] SQL connection failed. -37 attempts left. /var/log/nova/nova-api.log:2017-03-22 18:55:22.794 195971 WARNING oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] SQL connection failed. -38 attempts left. /var/log/nova/nova-api.log:2017-03-22 18:55:32.913 195971 WARNING oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] SQL connection failed. -39 attempts left. /var/log/nova/nova-api.log:2017-03-22 18:55:42.924 195971 WARNING oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] SQL connection failed. -40 attempts left. and: /var/log/nova/nova-api.log:2017-03-22 18:59:12.404 195967 DEBUG nova.api.openstack.wsgi [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] Calling method '<bound method ServersController.detail of <nova.api.openstack.compute.servers.ServersController object at 0x79097d0>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:626 /var/log/nova/nova-api.log:2017-03-22 18:59:12.411 195967 DEBUG nova.compute.api [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] Searching by: {'deleted': False} get_all /usr/lib/python2.7/site-packages/nova/compute/api.py:2344 /var/log/nova/nova-api.log:2017-03-22 18:59:12.424 195967 DEBUG oslo_db.sqlalchemy.engines [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:261 /var/log/nova/nova-api.log:2017-03-22 18:59:12.437 195967 WARNING oslo_db.sqlalchemy.engines [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] SQL connection failed. -1 attempts left. /var/log/nova/nova-api.log:2017-03-22 18:59:22.446 195967 WARNING oslo_db.sqlalchemy.engines [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] SQL connection failed. -2 attempts left. Using the mysql client we can connect from all the node to the db all right and the grants looks fine. What is strange is the negative counter for the attempt. Will ask for nova help if they can provide assistance.
So just adding that runnnig mysql -u nova_api -p -h <galera_vip> -e 'show grants;' works fine from every nodes in the cluster, so no network connectivity issue. Adding computes from visibility.
Hi, So after a discussion with Damien and Michele, the problem was found. It appears that when the nova cell is create it hardcode into the database the connection parameter present in /etc/nova/nova.conf of the node where it's run. Running it on controller0 for instance will give you this in the database: mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19nova?bind_address=10.0.0.20' This is later used to create a connection to the database for nova cell information. This obviously fails on 2 other node as they don't have the 10.0.0.20 address. To prevent this issue, this workaround have been done: https://review.openstack.org/#/c/436192/ removing the bind_address parameter from the configuration line. The sequence of event on the seems correct: 1. update hiera data; 2. create nova cell with database option; From the journalctl logs: Mar 22 19:23:44 overcloud-controller-0.localdomain os-collect-config[4197]: [2017-03-22 19:23:44,806] (heat-config) [DEBUG] Running /usr/libexec/heat-config/hooks/hiera < /var/lib/heat-config/deployed/e4e9bd8e-4b7b-41da-b040-d4f563f2fd48.json Mar 22 19:23:44 overcloud-controller-0.localdomain os-collect-config[4197]: [2017-03-22 19:23:44,852] (heat-config) [DEBUG] Running heat-config-notify /var/lib/heat-config/deployed/e4e9bd8e-4b7b-41da-b040-d4f563f2fd48.json < /var/lib/heat-config/deployed/e4e9bd8e-4b7b-41da-b040-d4f563f2fd48.notify.json $ grep nova::database_connection /var/lib/heat-config/deployed/e4e9bd8e-4b7b-41da-b040-d4f563f2fd48.json "nova::database_connection": "mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo", [root@overcloud-controller-0 e]# journalctl | grep 'nova-manage cell_v2' Mar 22 19:39:52 overcloud-controller-0.localdomain ansible-command[440226]: Invoked with warn=True executable=None _uses_shell=False _raw_params=nova-manage cell_v2 map_cell0 removes=None creates=None chdir=None Mar 22 19:39:55 overcloud-controller-0.localdomain ansible-command[440632]: Invoked with warn=True executable=None _uses_shell=True _raw_params=nova-manage cell_v2 create_cell --name='default' --database_connection=$(hiera nova::database_connection) removes=None creates=None chdir=None Mar 22 19:40:09 overcloud-controller-0.localdomain ansible-command[443480]: Invoked with warn=True executable=None _uses_shell=False _raw_params=nova-manage cell_v2 map_cell_and_hosts removes=None creates=None chdir=None Mar 22 19:40:12 overcloud-controller-0.localdomain ansible-command[443950]: Invoked with warn=True executable=None _uses_shell=True _raw_params=nova-manage cell_v2 list_cells | sed -e '1,3d' -e '$d' | awk -F ' *| *' '$2 == "default" {print $4}' removes=None creates=None chdir=None Mar 22 19:40:15 overcloud-controller-0.localdomain ansible-command[444382]: Invoked with warn=True executable=None _uses_shell=False _raw_params=nova-manage cell_v2 map_instances --cell_uuid 7f04f00d-4b9d-478d-941f-d93a7be145e7 removes=None creates=None chdir=None INSERT INTO `cell_mappings` VALUES ('2017-03-22 19:39:54',NULL,2,'00000000-0000-0000-0000-000000000000','cell0','none:///','mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova_cell0?bind_address=10.0.0.20'), ('2017-03-22 19:39:58',NULL,5,'7f04f00d-4b9d-478d-941f-d93a7be145e7','default','rabbit://guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672,guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672,guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672/?ssl=0','mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo'), ('2017-03-22 19:40:12',NULL,8,'785eaa63-2ac1-4f2f-83e0-cb92e9045b9f',NULL,'rabbit://guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672,guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672,guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672/?ssl=0','mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?bind_address=10.0.0.20'); But in the end we can see that the database is not in the correct state: root@overcloud-controller-0 e]# mysql -e 'select name,database_connection from nova_api.cell_mappings' +---------+--------------------------------------------------------------------------------------------------------------------------------------+ | name | database_connection | +---------+--------------------------------------------------------------------------------------------------------------------------------------+ | cell0 | mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova_cell0?bind_address=10.0.0.20 | | default | mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo | | NULL | mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?bind_address=10.0.0.20 | +---------+--------------------------------------------------------------------------------------------------------------------------------------+ 1. cell0 is pointing to the wrong connection 2. cell NULL ??? what is that ? For 1. we're missing --database=$(hiera nova::database_connection) in this command: nova-manage cell_v2 map_cell0 For NULL cell creation I don't know yet. Making upstream bz, and review.
Put the gerrit patch to point to the ocata branch.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245