Bug 1173970

Summary: keystone doesn't flush tokens when installing with tripleo
Product: Red Hat OpenStack Reporter: Udi Kalifon <ukalifon>
Component: rhosp-directorAssignee: Jiri Stransky <jstransk>
Status: CLOSED ERRATA QA Contact: Udi Kalifon <ukalifon>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: ayoung, calfonso, dmacpher, dmaley, dyocum, emilien.macchi, glambert, jslagle, jstransk, mburns, michele, mmagr, nauvray, nbarcet, nkinder, rhel-osp-director-maint, sgordon, vcojot, yeylon
Target Milestone: y2Keywords: ZStream
Target Release: 7.0 (Kilo)Flags: jkulina: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-77.el7ost Doc Type: Bug Fix
Doc Text:
No regular database maintenance process existed in previous versions of the director. As a result, the director's database grew without limit. This fix adds a cronjob to flush expired tokens from the database. This cleans the database periodically and reduces its size.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-21 16:53:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Udi Kalifon 2014-12-14 14:19:24 UTC
Description of problem:
The sql database grows with no limit. There should be a cron job to flush the expired tokens every minute:
*/1 * * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1


How reproducible:
100%


Steps to Reproduce:
1. Install tripleo in a virtual environment according to the instructions in: https://openstack.redhat.com/Deploying_RDO_on_a_Virtual_Machine_Environment_using_Instack
2. Deploy an undercloud according to: https://openstack.redhat.com/Deploying_an_RDO_Undercloud_with_Instack
3. Deploy an overcloud with instack-deploy-overcloud according to: https://openstack.redhat.com/Deploying_an_RDO_Overcloud_with_Instack
4. ssh to the controller machine, for example: ssh heat-admin.2.9
5. Become root: sudo -i
6. Check the token table:
   mysql
   use keystone;
   select count(*) from token;
   select id, expires from token;
7. Now log out of sql, run the command "keystone-manage token_flush", and repeat step 6...


Actual results:
Running the token_flush manually reduces the size of the table by flushing all expired tokens.


Expected results:
There should be a cron job that flushes expired tokens every minute:
*/1 * * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1

Comment 2 Emilien Macchi 2014-12-18 13:20:51 UTC
We had the same problem in Puppet, it's maybe because Keystone user does not have a shell. We fixed it by enforcing a shell when we run the crontab.

Hope that helps, let me know.

Comment 5 Udi Kalifon 2015-06-01 09:28:23 UTC
See also bug 1212126

Comment 6 Udi Kalifon 2015-06-04 14:46:16 UTC
There is nothing in the cron tables of heat-admin, keystone and root. I also waited for the tokens table to fill up and saw that expired tokens don't get flushed. Only when I ran the flush job manually, did all the expired tokens finally get flushed - the bugs is still valid.

Comment 9 Mike Burns 2015-07-07 20:39:03 UTC
After thinking about this, I really think this should be part of keystone packaging and not the responsibility of the installer/deployment tool.  Bug 1212126 referenced in comment 5 asks to add this to keystone packaging which strikes me as the right place.

Comment 10 Stephen Gordon 2015-07-08 12:26:55 UTC
Hi James,

Do you see this as a blocker?

Thanks,

Steve

Comment 11 Nathan Kinder 2015-07-08 17:16:16 UTC
(In reply to Mike Burns from comment #9)
> After thinking about this, I really think this should be part of keystone
> packaging and not the responsibility of the installer/deployment tool.  Bug
> 1212126 referenced in comment 5 asks to add this to keystone packaging which
> strikes me as the right place.

This has been brought up before.  The problem is, the openstack-keystone package can not determine what token format soemone is going to use/configure.  Multiple token formats are supported, and newer formats (like fernet tokens) do not need a flush job since tokens are not kept in a table in the database.  It's going to need to be dealt with by the deployment tool since that is where knowledge will exist about configuring keystone.

Comment 13 Mike Burns 2015-07-15 10:14:54 UTC
*** Bug 1243332 has been marked as a duplicate of this bug. ***

Comment 17 Jiri Stransky 2015-09-17 11:58:11 UTC
Done upstream. Can be backported once we get acks.

Comment 18 James Slagle 2015-10-02 16:11:17 UTC
*** Bug 1267952 has been marked as a duplicate of this bug. ***

Comment 19 Vincent S. Cojot 2015-10-19 14:55:30 UTC
Hello,
We (RCIP delivery team) are deploying OSP7 with director for customers.
We just hit an issue on the OSP7 cloud we had deployed for RHQE (RedHat QE).
Everything had become very slow over the course of the last few weeks/days.
I had to flush the tokens manually to solve their issue.
I am concerned we will hit this on -all- OSP7 deployments and this will most likely not look good for the customers, this needs prioritization.

There seems to be some provision for doing this in:
/etc/puppet/modules/keystone/manifests/cron/token_flush.pp:


class keystone::cron::token_flush (
  $ensure   = present,
  $minute   = 1,
  $hour     = 0,
  $monthday = '*',
  $month    = '*',
  $weekday  = '*',
  $maxdelay = 0,
) {


But this file/class is never getting used in the base templates:

[stack@instack-prv ~]$ grep -r keystone::cron::token_flush /usr/share/openstack-tripleo-heat-templates

[stack@instack-prv ~]$ rpm -qf /usr/share/openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch

Can we please get this prioritized?

Thank you

Comment 20 Martin Magr 2015-11-18 20:54:35 UTC
*** Bug 1283388 has been marked as a duplicate of this bug. ***

Comment 22 Udi Kalifon 2015-11-29 07:31:24 UTC
I ran "sudo crontab -l -u keystone" and got:
1 0 * * * sleep `expr ${RANDOM} \% 3600`; keystone-manage token_flush >>/var/log/keystone/keystone-tokenflush.log 2>&1

This means that the job will run only daily, at 0:01 (one minute past midnight), after waiting a random number of seconds up to 3600... We know from experience with customers that there could be such a large number of expired tokens to flush, and the job locks the database for a very long time (and takes the cloud offline) if there are too many deletions needed. Therefor, we recommend running the job every minute to avoid the database getting too big.

Has that recommendation changed?

Comment 25 Nicolas Auvray 2015-12-02 13:46:08 UTC
We hit the same problem with Keystone tokens never purged on undercloud node. Not sure if I have to open a new bug about this.

Comment 26 Jiri Stransky 2015-12-03 16:33:35 UTC
Yes please, if you can report a separate bug for the undercloud, that would be great. The overcloud fix is already backported and tested, and the undercloud fix would go into a different repo/package.

Comment 29 errata-xmlrpc 2015-12-21 16:53:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651