Bug 1321667 - Customer reporting growth of sessions table to an enormous size and postgresql logs don't indicate any auto-vacuum activity is happening
Summary: Customer reporting growth of sessions table to an enormous size and postgresq...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Performance
Version: 5.3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: GA
: 5.7.0
Assignee: Gregg Tanzillo
QA Contact: Pradeep Kumar Surisetty
URL:
Whiteboard: database:perf
Depends On:
Blocks: 1364501
TreeView+ depends on / blocked
 
Reported: 2016-03-28 20:33 UTC by Thomas Hennessy
Modified: 2020-04-15 14:26 UTC (History)
16 users (show)

Fixed In Version: 5.7.0.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1364501 (view as bug list)
Environment:
Last Closed: 2017-03-12 02:18:20 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:


Attachments (Terms of Use)

Description Thomas Hennessy 2016-03-28 20:33:04 UTC
Description of problem:Sessions table grows to > 16GB and needs to be handled manually to remove instances and vacuum the table.


Version-Release number of selected component (if applicable): 5.3.4.2 but there is no good reason to believe the handling has changed


How reproducible:


Steps to Reproduce:
1. create an environment where the UI applaince and other backend appliances are separate
2. alter the UI appliance (and others) to store credentials in the database instead of appliance memory
3. login to the appliance and wait for the session to be timedout.
4. review the sessions table to determine if the session information persists


Actual results:


Expected results:


Additional info:
Customer has an environment which has been running for more than a year.  In the past two months the sessions table has become so large that it had to be manually reduced in size and manually vacuumed to bring the table size down to a reasonable value.

The customer is using load banancers with CFME so the use of the sessions object is expected.

what is not understood (or apparently implemented) is any means by which the information about sessions that are no longer active is removed from the sessions table.  Retaining this information indefinitely is creating performance issues in the environment and represents a potential security threat if the session information is retained for months and months.

Comment 5 CFME Bot 2016-08-01 21:35:49 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/1f83006aa6f850492ba3a36650e6e8f8080afa06

commit 1f83006aa6f850492ba3a36650e6e8f8080afa06
Author:     Gregg Tanzillo <gtanzill>
AuthorDate: Tue Apr 26 16:46:43 2016 -0400
Commit:     Gregg Tanzillo <gtanzill>
CommitDate: Tue Jul 26 11:19:09 2016 -0400

    Perform Session purge in batches.
    
    - Also, eliminated an n+1 query on users
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1321667

 app/models/session.rb       | 34 ++++++++++++++++++++++++----------
 spec/factories/session.rb   |  3 +++
 spec/models/session_spec.rb | 26 ++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 10 deletions(-)
 create mode 100644 spec/factories/session.rb
 create mode 100644 spec/models/session_spec.rb

Comment 12 Archit Sharma 2017-01-16 14:28:26 UTC
=======================================================================

vmdb_production=#  select id,session_id,updated_at from sessions ORDER BY updated_at DESC limit 10;
       id        |            session_id            |         updated_at         
-----------------+----------------------------------+----------------------------
 999000000179131 | 49ab119dff5af978d5d899be662814d1 | 2017-01-16 14:15:50.415532
 999000000179132 | 21276c2f8588cbd060a786b1e2ae482e | 2017-01-16 14:15:50.044644
 999000000179130 | f2d984034e1152e06f736e31a29b17ad | 2017-01-16 14:15:34.610368
 999000000179129 | 6f4b9459ee3c74af44493fa12caebdc8 | 2017-01-16 14:14:34.617708
 999000000179128 | e7a2c0b6a0c2ad98371c9b3969799622 | 2017-01-16 14:13:34.615359
 999000000179127 | 9d797d2418fa439ab73d6a39a7e06d4e | 2017-01-16 14:12:35.225762
 999000000179126 | 9b7eaee8662dc4c3e4d99c54bd772519 | 2017-01-16 14:11:34.613336
 999000000179125 | 7e4ff8a09a1767dd997cbe4d639da319 | 2017-01-16 14:10:34.615896
 999000000179124 | 6f0a4c762aabb783213c19e9eb73f4ce | 2017-01-16 14:09:34.618933
 999000000179123 | 49e6e5b9cb3461c8dfd378276989e825 | 2017-01-16 14:08:34.614581
(10 rows)

vmdb_production=#  select count(id) from sessions;
 count 
-------
   182
(1 row)

=======================================================================
##### Note: Session logged out (I waited more than 60 minutes), but I was waiting for UI to automatically redirect to logout page, which it doesn't. Only when I clicked a route (say, -> configuration), did it redirect to login page. 

##### Anyway, So the last count of unique session ids was 182 and increasing, until I clicked a route (which triggered it to redirect to login page). Then it stopped at 182. I waiting for a couple of minutes, and logged back in from UI, this increased the counter to 185 (see below) and as you could spot, a difference of 5 minutes in last 10 'updated_at' items, meaning the previous information was retained. I don't know if the purge happens in a while (some postgresql setting?), but I'm waiting for it and it hasn't yet. Been more than 2 hours.. 

========================================================================
vmdb_production=#  select id,session_id,updated_at from sessions ORDER BY updated_at DESC limit 10;
       id        |            session_id            |         updated_at         
-----------------+----------------------------------+----------------------------
 999000000179135 | 16ad05064423068dd0c96fede8c8a6fa | 2017-01-16 14:19:57.287294
 999000000179131 | 49ab119dff5af978d5d899be662814d1 | 2017-01-16 14:19:56.118619
 999000000179134 | a9da5ffce1f893998cd3a73164cbcc76 | 2017-01-16 14:19:53.754537
 999000000179133 | f91ba8ade729cb1eaec263b63d62dc29 | 2017-01-16 14:19:53.427114
 999000000179132 | 21276c2f8588cbd060a786b1e2ae482e | 2017-01-16 14:15:50.044644
 999000000179130 | f2d984034e1152e06f736e31a29b17ad | 2017-01-16 14:15:34.610368
 999000000179129 | 6f4b9459ee3c74af44493fa12caebdc8 | 2017-01-16 14:14:34.617708
 999000000179128 | e7a2c0b6a0c2ad98371c9b3969799622 | 2017-01-16 14:13:34.615359
 999000000179127 | 9d797d2418fa439ab73d6a39a7e06d4e | 2017-01-16 14:12:35.225762
 999000000179126 | 9b7eaee8662dc4c3e4d99c54bd772519 | 2017-01-16 14:11:34.613336
(10 rows)

vmdb_production=#  select count(id) from sessions;
 count 
-------
   185
(1 row)

========================================================================

Comment 14 Archit Sharma 2017-01-16 14:56:13 UTC
Hi, 

I changed 'session_store: cache' to 'session_store: sql' in Advanced settings, for each of the appliances and so the sessions table started populating after 1st login from UI. Thereafter, as you could see in comment #c12, I waited and signed back in, and I still saw the sessions information persist in the db.   
--------------------

Couple of queries:

# 1.)

I see 'batch_size = 100' in the patch [1]. And a setting in Advanced tab of CFME UI as follows:
:session:
  :interval: 60
  :memcache_server: 127.0.0.1:11211
  :memcache_server_opts: "-l 127.0.0.1 -I 1M"
  :show_login_info: true
  :timeout: 3600

I assume memcache becomes irrelevant, once session is changed to 'sql', although rest of the settings persist, like interval:60 and timeout:3600 etc..

And then, I also see a :database: section, but I don't see a 'session' specific 'purge' setting.

Do you think 'batch_size' is being passed a separate value that trumps the default 100? If not, is the fact that count(id) went way up above 100 an indicator that the problem has still not been fixed?

# 2.) 
Also, after it timeout out, after logging back in, according to #4 "steps to reproduce" section in the description of this BZ, it should've discarded previous info, which it didn't. Again, you think the problem persists? (I didn't setup a load balancer as per customer env, but pretty much tried to check the logic of the patch) 


Refs: 
[1] https://github.com/ManageIQ/manageiq/commit/1f83006aa6f850492ba3a36650e6e8f8080afa06?diff=split#diff-a5520e44c2e541c1da8dae178b1b8996R14

Comment 17 Gregg Tanzillo 2017-03-01 23:12:15 UTC
Hi Archit,

The :timeout: under :session: is the session purge duration. That means you have to wait at least an hour before an inactive session will get purged.

Comment 19 Archit Sharma 2017-03-07 10:37:03 UTC
I ran this again on the same appliance. But this time session_id information came back to 0 when I checked the count after an hour. But my VPN got disconnected in between so I'm not sure if that's what triggered the 'disconnect' with logged in client. 

I'm running tests again just to be sure.

Comment 20 Archit Sharma 2017-03-07 11:50:04 UTC
Ok so yeah I think the fix works. I guess I had UI worker enabled on DB appliance as well apart from a dedicated UI appliance, hence the confusion.

from evm.log on UI worker appliance:

[----] I, [2017-03-07T06:44:18.987386 #3040:b6d154]  INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [999000038168224], MiqWorker id: [999000000591761], Zone: [default], Role: [], Server: [5928c21c-d76c-11e6-9682-001a4a22390a], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Session.check_session_timeout], Timeout: [600], Priority: [90], State: [dequeue], Deliver On: [], Data: [], Args: [], Dequeued in: [5.592408564] seconds

[----] I, [2017-03-07T06:44:18.987513 #3040:b6d154]  INFO -- : MIQ(MiqQueue#deliver) Message id: [999000038168224], Delivering...

[----] I, [2017-03-07T06:44:18.996218 #3040:b6d154]  INFO -- : MIQ(Session.purge) purged stale session data, 1 entries deleted


--------

This is happening really slow but is happening. Count on session_id came down from 63 to 55 in about last 5-10 minutes.


Note You need to log in before you can comment on or make changes to this bug.