Bug 1218322 - Keystone auth fails after bare-metal deployment via instack-deploy-overcloud --tuskar
Summary: Keystone auth fails after bare-metal deployment via instack-deploy-overcloud ...
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
Target Milestone: ga
: Director
Assignee: Jay Dobies
QA Contact: Amit Ugol
Depends On:
TreeView+ depends on / blocked
Reported: 2015-05-04 15:27 UTC by jliberma@redhat.com
Modified: 2015-08-05 13:51 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2015-08-05 13:51:20 UTC
Target Upstream Version:

Attachments (Terms of Use)
Output from heat stack create and instack-deploy-overcloud script (100.50 KB, text/plain)
2015-05-04 15:27 UTC, jliberma@redhat.com
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2015:1549 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 17:49:10 UTC

Description jliberma@redhat.com 2015-05-04 15:27:05 UTC
Created attachment 1021784 [details]
Output from heat stack create and instack-deploy-overcloud script

Description of problem: Keystone auth fails after bare-metal deployment via instack-deploy-overcloud --tuskar. heat stack creation completes successfully. automated overcloud customization fails with keystne auth errors. 

Creating user-role assignment for user ec2, role admin, tenant service
/usr/lib/python2.7/site-packages/keystoneclient/shell.py:65: DeprecationWarning: The keystone CLI is deprecated in favor of python-openstackclient. For a Python library, continue using python-keystoneclient.
  'python-keystoneclient.', DeprecationWarning)
Authorization Failed: Unable to establish connection to
[root@rhos0 ~]# heat stack-list
| id                                   | stack_name | stack_status    | creation_time        |
| 798160e3-db9b-4743-b53c-2967471e1f04 | overcloud  | CREATE_COMPLETE | 2015-05-04T14:49:31Z |

Version-Release number of selected component (if applicable):
[root@rhos0 ~]# rpm -qa | grep openstack | grep -E 'tripleo|ironic'

How reproducible: Every time

Steps to Reproduce:
1. Deploy undercloud
2. Configure deploy-overcloudrc
3. Discover bare metal servers
4. instack-deploy-overcloud --tuskar

Actual results: Fails to customize environment

Expected results: Customization completes successfully

Additional info: If I ssh to overcloud as heat-admin post deploy and restart all openstack services it temporarily works in a read-only fashion. If I attempt to create anything (IE -- neutron network, glane image) the auth errors return and services must be restarted.

Comment 3 Marios Andreou 2015-05-14 09:56:18 UTC
(on the undercloud, after sourcing overcloudrc)... can't talk to overcloud services (as above connection issues). restarting haproxy on overcloud controller reliably fixes the service connectivity. still investigating

Comment 4 Marios Andreou 2015-05-14 11:33:01 UTC
discussion/investigation ongoing... after tip from derekh we increased max_conn in both haproxy and mysqld (haproxy was logging > 150 which was max_con previously/default). currently stable at ~185 for last 25 mins ish.

Comment 5 Giulio Fidente 2015-05-14 11:45:41 UTC
astapor seems to be setting the limits at 4000 for haproxy and 1024 for galera, Jason can you confirm so we port the same values to tripleo?

Comment 6 Derek Higgins 2015-05-14 11:49:33 UTC
The key here was that connections to keystone through the VIP and haproxy were still working even when keystone commands were displaying a problem this particular curl command was responding immediately

$ curl
{"error": {"message": "The resource could not be found.", "code": 404, "title": "Not Found"}}

the difference being that the call to curl command can't authenticate and responds before keystone attempts to connect to the database
this points us at a problem with db connections.

This probably wont happen in a virt env because the number of sql connections on this baremetal env is higher as a lot of our processes scale based on the number of CPU's. The baremetal host in question had 24 cpus.

Comment 7 Jason Guiditta 2015-05-14 13:02:55 UTC
Haproxy should have maxconn = 10000
Galera needs:
$limit_no_file            ="16384", (this in both config and passed into pcs RA)
$max_connections         = "1024",
$open_files_limit        = '-1',

These galera settings should also be configurable, as different hardware may have different needs

Comment 8 Marios Andreou 2015-05-14 13:43:12 UTC
half of the fix in https://review.openstack.org/#/c/183044/1

Comment 9 Giulio Fidente 2015-05-14 14:31:19 UTC
the other half https://review.openstack.org/#/c/183046/

or third until we make this configurable and pass needed options to the pacemaker resource agent as well

Comment 10 jliberma@redhat.com 2015-05-21 16:23:43 UTC
I saw this merged on May 14, are these fixes incorporated into the latest code base?  IE -- How can I test on baremetal?

Comment 11 Ronelle Landy 2015-05-22 10:44:07 UTC
CI job fr Dell hw is running green atm.
So thanks to gfidente, you should be able to test this out.

Comment 14 Omri Hochman 2015-07-30 19:25:16 UTC
Verified , the --tuskar replaced with --plan , deployment successfully with HA/non-HA with virt-env/Bare-Metal . 

Comment 16 errata-xmlrpc 2015-08-05 13:51:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.