Bug 699782

Summary: Default max_fsm_pages setting and cumin vacuum interval is not suitable for med/large scale
Product: Red Hat Enterprise MRG Reporter: Trevor McKay <tmckay>
Component: cuminAssignee: Trevor McKay <tmckay>
Status: CLOSED CURRENTRELEASE QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: DevelopmentCC: iboverma, jross, jsarenik, matt
Target Milestone: 2.0   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cumin-0.1.4746-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-23 13:15:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Trevor McKay 2011-04-26 15:42:27 UTC
Description of problem:

We need to reset defaults and/or provide instructions to end users on how to set the cumin vacuum interval and the postgres parameter max_fsm_pages.

In overnight testing, with 100+ submissions per second, around 4000 slots, we found that the free space in postgres was not being managed effectively.  This caused the database to "leak", since more space was needed per vacuum interval than could be tracked by postgres (so postgres went to disk for more).

Shortening the vaccuum interval to 15 minutes and increasing the max_fsm_pages value to 256K seems to be effective, but we're not sure if there is a useful heuristic at this point.  These numbers will be relative to submissions/completions, etc.

Comment 1 Trevor McKay 2011-04-26 17:32:26 UTC
*** Bug 697640 has been marked as a duplicate of this bug. ***

Comment 2 Trevor McKay 2011-04-26 19:06:22 UTC
The plan is to address this in two ways:

1) change the "out of the box" configuration, which includes multiple cumin-data instances for medium scale and up, to run vacuuming and sample expiration from a single thread with a 15 minute interval.

2) include a Release Note which covers setting the max_fsm_pages postgres parameter, a suggested value, and how to run a SQL command that will indicate whether or not the current value is appropriate. (BZ699859)

Comment 3 Trevor McKay 2011-04-26 20:00:40 UTC
Default config file fixed in revision 4741.

To test, do something like:

1) Run cumin

2) grep -l "is enabled" data.*.log
data.grid.log

3) grep -l "is disabled" data.*.log
data.grid-slots.log
data.grid-submissions.log
data.sesame.log

4) Wait 15 minutes

5) grep -l "Starting vacuum" data.*.log
data.grid.log

6) grep -l "Starting expire" data.*.log
data.grid.log