Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 699782

Summary:	Default max_fsm_pages setting and cumin vacuum interval is not suitable for med/large scale
Product:	Red Hat Enterprise MRG	Reporter:	Trevor McKay <tmckay>
Component:	cumin	Assignee:	Trevor McKay <tmckay>
Status:	CLOSED CURRENTRELEASE	QA Contact:	MRG Quality Engineering <mrgqe-bugs>
Severity:	high	Docs Contact:
Priority:	medium
Version:	Development	CC:	iboverma, jross, jsarenik, matt
Target Milestone:	2.0
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	cumin-0.1.4746-1.el5	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-06-23 13:15:54 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Trevor McKay 2011-04-26 15:42:27 UTC

Description of problem:

We need to reset defaults and/or provide instructions to end users on how to set the cumin vacuum interval and the postgres parameter max_fsm_pages.

In overnight testing, with 100+ submissions per second, around 4000 slots, we found that the free space in postgres was not being managed effectively.  This caused the database to "leak", since more space was needed per vacuum interval than could be tracked by postgres (so postgres went to disk for more).

Shortening the vaccuum interval to 15 minutes and increasing the max_fsm_pages value to 256K seems to be effective, but we're not sure if there is a useful heuristic at this point.  These numbers will be relative to submissions/completions, etc.

Comment 1 Trevor McKay 2011-04-26 17:32:26 UTC

*** Bug 697640 has been marked as a duplicate of this bug. ***

Comment 2 Trevor McKay 2011-04-26 19:06:22 UTC

The plan is to address this in two ways:

1) change the "out of the box" configuration, which includes multiple cumin-data instances for medium scale and up, to run vacuuming and sample expiration from a single thread with a 15 minute interval.

2) include a Release Note which covers setting the max_fsm_pages postgres parameter, a suggested value, and how to run a SQL command that will indicate whether or not the current value is appropriate. (BZ699859)

Comment 3 Trevor McKay 2011-04-26 20:00:40 UTC

Default config file fixed in revision 4741.

To test, do something like:

1) Run cumin

2) grep -l "is enabled" data.*.log
data.grid.log

3) grep -l "is disabled" data.*.log
data.grid-slots.log
data.grid-submissions.log
data.sesame.log

4) Wait 15 minutes

5) grep -l "Starting vacuum" data.*.log
data.grid.log

6) grep -l "Starting expire" data.*.log
data.grid.log