| Summary: | Customer reporting growth of sessions table to an enormous size and postgresql logs don't indicate any auto-vacuum activity is happening | |||
|---|---|---|---|---|
| Product: | Red Hat CloudForms Management Engine | Reporter: | Thomas Hennessy <thenness> | |
| Component: | Performance | Assignee: | Gregg Tanzillo <gtanzill> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Pradeep Kumar Surisetty <psuriset> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 5.3.0 | CC: | arcsharm, benglish, cpelland, dajohnso, gblomqui, gtanzill, jdeubel, jhardy, jocarter, jprause, jrafanie, kseifried, mfeifer, obarenbo, simaishi, thenness | |
| Target Milestone: | GA | Keywords: | Reopened, TestOnly, ZStream | |
| Target Release: | 5.7.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | database:perf | |||
| Fixed In Version: | 5.7.0.0 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1364501 (view as bug list) | Environment: | ||
| Last Closed: | 2017-03-12 02:18:20 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 1364501 | |||
|
Description
Thomas Hennessy
2016-03-28 20:33:04 UTC
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/1f83006aa6f850492ba3a36650e6e8f8080afa06 commit 1f83006aa6f850492ba3a36650e6e8f8080afa06 Author: Gregg Tanzillo <gtanzill> AuthorDate: Tue Apr 26 16:46:43 2016 -0400 Commit: Gregg Tanzillo <gtanzill> CommitDate: Tue Jul 26 11:19:09 2016 -0400 Perform Session purge in batches. - Also, eliminated an n+1 query on users https://bugzilla.redhat.com/show_bug.cgi?id=1321667 app/models/session.rb | 34 ++++++++++++++++++++++++---------- spec/factories/session.rb | 3 +++ spec/models/session_spec.rb | 26 ++++++++++++++++++++++++++ 3 files changed, 53 insertions(+), 10 deletions(-) create mode 100644 spec/factories/session.rb create mode 100644 spec/models/session_spec.rb =======================================================================
vmdb_production=# select id,session_id,updated_at from sessions ORDER BY updated_at DESC limit 10;
id | session_id | updated_at
-----------------+----------------------------------+----------------------------
999000000179131 | 49ab119dff5af978d5d899be662814d1 | 2017-01-16 14:15:50.415532
999000000179132 | 21276c2f8588cbd060a786b1e2ae482e | 2017-01-16 14:15:50.044644
999000000179130 | f2d984034e1152e06f736e31a29b17ad | 2017-01-16 14:15:34.610368
999000000179129 | 6f4b9459ee3c74af44493fa12caebdc8 | 2017-01-16 14:14:34.617708
999000000179128 | e7a2c0b6a0c2ad98371c9b3969799622 | 2017-01-16 14:13:34.615359
999000000179127 | 9d797d2418fa439ab73d6a39a7e06d4e | 2017-01-16 14:12:35.225762
999000000179126 | 9b7eaee8662dc4c3e4d99c54bd772519 | 2017-01-16 14:11:34.613336
999000000179125 | 7e4ff8a09a1767dd997cbe4d639da319 | 2017-01-16 14:10:34.615896
999000000179124 | 6f0a4c762aabb783213c19e9eb73f4ce | 2017-01-16 14:09:34.618933
999000000179123 | 49e6e5b9cb3461c8dfd378276989e825 | 2017-01-16 14:08:34.614581
(10 rows)
vmdb_production=# select count(id) from sessions;
count
-------
182
(1 row)
=======================================================================
##### Note: Session logged out (I waited more than 60 minutes), but I was waiting for UI to automatically redirect to logout page, which it doesn't. Only when I clicked a route (say, -> configuration), did it redirect to login page.
##### Anyway, So the last count of unique session ids was 182 and increasing, until I clicked a route (which triggered it to redirect to login page). Then it stopped at 182. I waiting for a couple of minutes, and logged back in from UI, this increased the counter to 185 (see below) and as you could spot, a difference of 5 minutes in last 10 'updated_at' items, meaning the previous information was retained. I don't know if the purge happens in a while (some postgresql setting?), but I'm waiting for it and it hasn't yet. Been more than 2 hours..
========================================================================
vmdb_production=# select id,session_id,updated_at from sessions ORDER BY updated_at DESC limit 10;
id | session_id | updated_at
-----------------+----------------------------------+----------------------------
999000000179135 | 16ad05064423068dd0c96fede8c8a6fa | 2017-01-16 14:19:57.287294
999000000179131 | 49ab119dff5af978d5d899be662814d1 | 2017-01-16 14:19:56.118619
999000000179134 | a9da5ffce1f893998cd3a73164cbcc76 | 2017-01-16 14:19:53.754537
999000000179133 | f91ba8ade729cb1eaec263b63d62dc29 | 2017-01-16 14:19:53.427114
999000000179132 | 21276c2f8588cbd060a786b1e2ae482e | 2017-01-16 14:15:50.044644
999000000179130 | f2d984034e1152e06f736e31a29b17ad | 2017-01-16 14:15:34.610368
999000000179129 | 6f4b9459ee3c74af44493fa12caebdc8 | 2017-01-16 14:14:34.617708
999000000179128 | e7a2c0b6a0c2ad98371c9b3969799622 | 2017-01-16 14:13:34.615359
999000000179127 | 9d797d2418fa439ab73d6a39a7e06d4e | 2017-01-16 14:12:35.225762
999000000179126 | 9b7eaee8662dc4c3e4d99c54bd772519 | 2017-01-16 14:11:34.613336
(10 rows)
vmdb_production=# select count(id) from sessions;
count
-------
185
(1 row)
========================================================================
Hi, I changed 'session_store: cache' to 'session_store: sql' in Advanced settings, for each of the appliances and so the sessions table started populating after 1st login from UI. Thereafter, as you could see in comment #c12, I waited and signed back in, and I still saw the sessions information persist in the db. -------------------- Couple of queries: # 1.) I see 'batch_size = 100' in the patch [1]. And a setting in Advanced tab of CFME UI as follows: :session: :interval: 60 :memcache_server: 127.0.0.1:11211 :memcache_server_opts: "-l 127.0.0.1 -I 1M" :show_login_info: true :timeout: 3600 I assume memcache becomes irrelevant, once session is changed to 'sql', although rest of the settings persist, like interval:60 and timeout:3600 etc.. And then, I also see a :database: section, but I don't see a 'session' specific 'purge' setting. Do you think 'batch_size' is being passed a separate value that trumps the default 100? If not, is the fact that count(id) went way up above 100 an indicator that the problem has still not been fixed? # 2.) Also, after it timeout out, after logging back in, according to #4 "steps to reproduce" section in the description of this BZ, it should've discarded previous info, which it didn't. Again, you think the problem persists? (I didn't setup a load balancer as per customer env, but pretty much tried to check the logic of the patch) Refs: [1] https://github.com/ManageIQ/manageiq/commit/1f83006aa6f850492ba3a36650e6e8f8080afa06?diff=split#diff-a5520e44c2e541c1da8dae178b1b8996R14 Hi Archit, The :timeout: under :session: is the session purge duration. That means you have to wait at least an hour before an inactive session will get purged. I ran this again on the same appliance. But this time session_id information came back to 0 when I checked the count after an hour. But my VPN got disconnected in between so I'm not sure if that's what triggered the 'disconnect' with logged in client. I'm running tests again just to be sure. Ok so yeah I think the fix works. I guess I had UI worker enabled on DB appliance as well apart from a dedicated UI appliance, hence the confusion. from evm.log on UI worker appliance: [----] I, [2017-03-07T06:44:18.987386 #3040:b6d154] INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [999000038168224], MiqWorker id: [999000000591761], Zone: [default], Role: [], Server: [5928c21c-d76c-11e6-9682-001a4a22390a], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Session.check_session_timeout], Timeout: [600], Priority: [90], State: [dequeue], Deliver On: [], Data: [], Args: [], Dequeued in: [5.592408564] seconds [----] I, [2017-03-07T06:44:18.987513 #3040:b6d154] INFO -- : MIQ(MiqQueue#deliver) Message id: [999000038168224], Delivering... [----] I, [2017-03-07T06:44:18.996218 #3040:b6d154] INFO -- : MIQ(Session.purge) purged stale session data, 1 entries deleted -------- This is happening really slow but is happening. Count on session_id came down from 63 to 55 in about last 5-10 minutes. |