Bug 1434955
Summary: | OSP10 -> OSP11 upgrade: after running major-upgrade-composable-steps.yaml nova-api cannot connect to Galera on 2/3 controllers | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> |
Component: | openstack-tripleo-heat-templates | Assignee: | Sofer Athlan-Guyot <sathlang> |
Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 11.0 (Ocata) | CC: | dbecker, jcoufal, jschluet, mburns, morazi, rhel-osp-director-maint, sasha, sathlang |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | 11.0 (Ocata) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-heat-templates-6.0.0-0.11.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-05-17 20:11:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1394019 |
Description
Marius Cornea
2017-03-22 17:51:09 UTC
Hi, cannot find anything obvious. VIP is on ctl-2. On ctl-2 and ctl-1 we have this kind of error: /var/log/nova/nova-api.log:2017-03-22 18:49:12.394 195971 DEBUG nova.api.openstack.wsgi [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] Calling method '<bound method ServersController.detail of <nova.api.openstack.compute.servers.ServersController object at 0x79097d0>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:626 /var/log/nova/nova-api.log:2017-03-22 18:49:12.401 195971 DEBUG nova.compute.api [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] Searching by: {'deleted': False} get_all /usr/lib/python2.7/site-packages/nova/compute/api.py:2344 /var/log/nova/nova-api.log:2017-03-22 18:49:12.414 195971 DEBUG oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:261 .... /var/log/nova/nova-api.log:2017-03-22 18:55:12.787 195971 WARNING oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] SQL connection failed. -37 attempts left. /var/log/nova/nova-api.log:2017-03-22 18:55:22.794 195971 WARNING oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] SQL connection failed. -38 attempts left. /var/log/nova/nova-api.log:2017-03-22 18:55:32.913 195971 WARNING oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] SQL connection failed. -39 attempts left. /var/log/nova/nova-api.log:2017-03-22 18:55:42.924 195971 WARNING oslo_db.sqlalchemy.engines [req-7681616d-f43d-490f-9333-b86a9bbf3dff - - - - -] SQL connection failed. -40 attempts left. and: /var/log/nova/nova-api.log:2017-03-22 18:59:12.404 195967 DEBUG nova.api.openstack.wsgi [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] Calling method '<bound method ServersController.detail of <nova.api.openstack.compute.servers.ServersController object at 0x79097d0>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:626 /var/log/nova/nova-api.log:2017-03-22 18:59:12.411 195967 DEBUG nova.compute.api [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] Searching by: {'deleted': False} get_all /usr/lib/python2.7/site-packages/nova/compute/api.py:2344 /var/log/nova/nova-api.log:2017-03-22 18:59:12.424 195967 DEBUG oslo_db.sqlalchemy.engines [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:261 /var/log/nova/nova-api.log:2017-03-22 18:59:12.437 195967 WARNING oslo_db.sqlalchemy.engines [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] SQL connection failed. -1 attempts left. /var/log/nova/nova-api.log:2017-03-22 18:59:22.446 195967 WARNING oslo_db.sqlalchemy.engines [req-dbc1f1d7-a700-409a-b0ea-6364636ac8a6 - - - - -] SQL connection failed. -2 attempts left. Using the mysql client we can connect from all the node to the db all right and the grants looks fine. What is strange is the negative counter for the attempt. Will ask for nova help if they can provide assistance. So just adding that runnnig mysql -u nova_api -p -h <galera_vip> -e 'show grants;' works fine from every nodes in the cluster, so no network connectivity issue. Adding computes from visibility. Hi, So after a discussion with Damien and Michele, the problem was found. It appears that when the nova cell is create it hardcode into the database the connection parameter present in /etc/nova/nova.conf of the node where it's run. Running it on controller0 for instance will give you this in the database: mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19nova?bind_address=10.0.0.20' This is later used to create a connection to the database for nova cell information. This obviously fails on 2 other node as they don't have the 10.0.0.20 address. To prevent this issue, this workaround have been done: https://review.openstack.org/#/c/436192/ removing the bind_address parameter from the configuration line. The sequence of event on the seems correct: 1. update hiera data; 2. create nova cell with database option; From the journalctl logs: Mar 22 19:23:44 overcloud-controller-0.localdomain os-collect-config[4197]: [2017-03-22 19:23:44,806] (heat-config) [DEBUG] Running /usr/libexec/heat-config/hooks/hiera < /var/lib/heat-config/deployed/e4e9bd8e-4b7b-41da-b040-d4f563f2fd48.json Mar 22 19:23:44 overcloud-controller-0.localdomain os-collect-config[4197]: [2017-03-22 19:23:44,852] (heat-config) [DEBUG] Running heat-config-notify /var/lib/heat-config/deployed/e4e9bd8e-4b7b-41da-b040-d4f563f2fd48.json < /var/lib/heat-config/deployed/e4e9bd8e-4b7b-41da-b040-d4f563f2fd48.notify.json $ grep nova::database_connection /var/lib/heat-config/deployed/e4e9bd8e-4b7b-41da-b040-d4f563f2fd48.json "nova::database_connection": "mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo", [root@overcloud-controller-0 e]# journalctl | grep 'nova-manage cell_v2' Mar 22 19:39:52 overcloud-controller-0.localdomain ansible-command[440226]: Invoked with warn=True executable=None _uses_shell=False _raw_params=nova-manage cell_v2 map_cell0 removes=None creates=None chdir=None Mar 22 19:39:55 overcloud-controller-0.localdomain ansible-command[440632]: Invoked with warn=True executable=None _uses_shell=True _raw_params=nova-manage cell_v2 create_cell --name='default' --database_connection=$(hiera nova::database_connection) removes=None creates=None chdir=None Mar 22 19:40:09 overcloud-controller-0.localdomain ansible-command[443480]: Invoked with warn=True executable=None _uses_shell=False _raw_params=nova-manage cell_v2 map_cell_and_hosts removes=None creates=None chdir=None Mar 22 19:40:12 overcloud-controller-0.localdomain ansible-command[443950]: Invoked with warn=True executable=None _uses_shell=True _raw_params=nova-manage cell_v2 list_cells | sed -e '1,3d' -e '$d' | awk -F ' *| *' '$2 == "default" {print $4}' removes=None creates=None chdir=None Mar 22 19:40:15 overcloud-controller-0.localdomain ansible-command[444382]: Invoked with warn=True executable=None _uses_shell=False _raw_params=nova-manage cell_v2 map_instances --cell_uuid 7f04f00d-4b9d-478d-941f-d93a7be145e7 removes=None creates=None chdir=None INSERT INTO `cell_mappings` VALUES ('2017-03-22 19:39:54',NULL,2,'00000000-0000-0000-0000-000000000000','cell0','none:///','mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova_cell0?bind_address=10.0.0.20'), ('2017-03-22 19:39:58',NULL,5,'7f04f00d-4b9d-478d-941f-d93a7be145e7','default','rabbit://guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672,guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672,guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672/?ssl=0','mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo'), ('2017-03-22 19:40:12',NULL,8,'785eaa63-2ac1-4f2f-83e0-cb92e9045b9f',NULL,'rabbit://guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672,guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672,guest:ZB6fp92XTbxGXzhDYnWrkN4Qq.localdomain:5672/?ssl=0','mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?bind_address=10.0.0.20'); But in the end we can see that the database is not in the correct state: root@overcloud-controller-0 e]# mysql -e 'select name,database_connection from nova_api.cell_mappings' +---------+--------------------------------------------------------------------------------------------------------------------------------------+ | name | database_connection | +---------+--------------------------------------------------------------------------------------------------------------------------------------+ | cell0 | mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova_cell0?bind_address=10.0.0.20 | | default | mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo | | NULL | mysql+pymysql://nova:c2cdagE8PyAbnpers3AD88Hge.0.19/nova?bind_address=10.0.0.20 | +---------+--------------------------------------------------------------------------------------------------------------------------------------+ 1. cell0 is pointing to the wrong connection 2. cell NULL ??? what is that ? For 1. we're missing --database=$(hiera nova::database_connection) in this command: nova-manage cell_v2 map_cell0 For NULL cell creation I don't know yet. Making upstream bz, and review. Put the gerrit patch to point to the ocata branch. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245 |