Created attachment 1021784 [details] Output from heat stack create and instack-deploy-overcloud script Description of problem: Keystone auth fails after bare-metal deployment via instack-deploy-overcloud --tuskar. heat stack creation completes successfully. automated overcloud customization fails with keystne auth errors. Example: Creating user-role assignment for user ec2, role admin, tenant service /usr/lib/python2.7/site-packages/keystoneclient/shell.py:65: DeprecationWarning: The keystone CLI is deprecated in favor of python-openstackclient. For a Python library, continue using python-keystoneclient. 'python-keystoneclient.', DeprecationWarning) Authorization Failed: Unable to establish connection to http://172.16.2.6:5000/v2.0/tokens [root@rhos0 ~]# heat stack-list +--------------------------------------+------------+-----------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+-----------------+----------------------+ | 798160e3-db9b-4743-b53c-2967471e1f04 | overcloud | CREATE_COMPLETE | 2015-05-04T14:49:31Z | +--------------------------------------+------------+-----------------+----------------------+ Version-Release number of selected component (if applicable): [root@rhos0 ~]# rpm -qa | grep openstack | grep -E 'tripleo|ironic' openstack-ironic-common-2015.1-dev682.el7.centos.noarch openstack-tripleo-heat-templates-0.8.4-post33.el7.centos.noarch openstack-ironic-discoverd-1.1.0-0.99.20150429.1425git.el7.centos.noarch openstack-tripleo-puppet-elements-0.0.0-post56.el7.centos.noarch openstack-tripleo-image-elements-0.9.4-post7.el7.centos.noarch openstack-ironic-conductor-2015.1-dev682.el7.centos.noarch openstack-tripleo-0.0.6-dev1698.el7.centos.noarch openstack-ironic-api-2015.1-dev682.el7.centos.noarch How reproducible: Every time Steps to Reproduce: 1. Deploy undercloud 2. Configure deploy-overcloudrc 3. Discover bare metal servers 4. instack-deploy-overcloud --tuskar Actual results: Fails to customize environment Expected results: Customization completes successfully Additional info: If I ssh to overcloud as heat-admin post deploy and restart all openstack services it temporarily works in a read-only fashion. If I attempt to create anything (IE -- neutron network, glane image) the auth errors return and services must be restarted.
(on the undercloud, after sourcing overcloudrc)... can't talk to overcloud services (as above connection issues). restarting haproxy on overcloud controller reliably fixes the service connectivity. still investigating
discussion/investigation ongoing... after tip from derekh we increased max_conn in both haproxy and mysqld (haproxy was logging > 150 which was max_con previously/default). currently stable at ~185 for last 25 mins ish.
astapor seems to be setting the limits at 4000 for haproxy and 1024 for galera, Jason can you confirm so we port the same values to tripleo?
The key here was that connections to keystone through the VIP and haproxy were still working even when keystone commands were displaying a problem this particular curl command was responding immediately $ curl http://10.8.147.22:5000/v2.0/tokens {"error": {"message": "The resource could not be found.", "code": 404, "title": "Not Found"}} the difference being that the call to curl command can't authenticate and responds before keystone attempts to connect to the database this points us at a problem with db connections. This probably wont happen in a virt env because the number of sql connections on this baremetal env is higher as a lot of our processes scale based on the number of CPU's. The baremetal host in question had 24 cpus.
Haproxy should have maxconn = 10000 Galera needs: $limit_no_file ="16384", (this in both config and passed into pcs RA) $max_connections = "1024", $open_files_limit = '-1', These galera settings should also be configurable, as different hardware may have different needs
half of the fix in https://review.openstack.org/#/c/183044/1
the other half https://review.openstack.org/#/c/183046/ or third until we make this configurable and pass needed options to the pacemaker resource agent as well
I saw this merged on May 14, are these fixes incorporated into the latest code base? IE -- How can I test on baremetal?
CI job fr Dell hw is running green atm. So thanks to gfidente, you should be able to test this out.
Verified , the --tuskar replaced with --plan , deployment successfully with HA/non-HA with virt-env/Bare-Metal . instack-undercloud-2.1.2-22.el7ost.noarch openstack-tuskar-0.4.18-3.el7ost.noarch python-tuskarclient-0.1.18-3.el7ost.noarch openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch openstack-tuskar-ui-0.3.0-13.el7ost.noarch openstack-puppet-modules-2015.1.8-8.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549