705935 – Idle transactions kept open forever make VACUUM useless

Bug 705935 - Idle transactions kept open forever make VACUUM useless

Summary: Idle transactions kept open forever make VACUUM useless

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Spacewalk
Classification:	Community
Component:	Server
Sub Component:
Version:	1.4
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Jan Pazdziora (Red Hat)
QA Contact:	Red Hat Satellite QA List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	space16
TreeView+	depends on / blocked

Reported:	2011-05-18 22:34 UTC by Andreas Rogge
Modified:	2011-09-16 10:08 UTC (History)
CC List:	2 users (show)
Fixed In Version:	osad-5.10.16-1
Clone Of:
Environment:
Last Closed:	2011-09-16 10:08:23 UTC
Embargoed:

Attachments	(Terms of Use)
Patch that adds rhnSQL.commit() into the main loop of osa-dispatcher (733 bytes, patch) 2011-06-02 23:00 UTC, Andreas Rogge	no flags	Details \| Diff
Add a rhnSQL.commit() just after self._req_processor.process() (419 bytes, patch) 2011-06-02 23:36 UTC, Andreas Rogge	no flags	Details \| Diff
add rhnSQL.commit() after process() to flush dangling transaction (879 bytes, patch) 2011-06-03 04:00 UTC, Andreas Rogge	no flags	Details \| Diff
introduce rhnSQL.commit() after the both SELECT statements that seem to be the main loop (1.27 KB, patch) 2011-06-03 04:01 UTC, Andreas Rogge	no flags	Details \| Diff
Show Obsolete (2) View All

Description Andreas Rogge 2011-05-18 22:34:03 UTC

Description of problem:
Seems like Spacewalk is keeping database transactions open which leads to unremovable dead rows in tables.

Version-Release number of selected component (if applicable):
spacewalk-postgresql-1.3.3-1.el5
postgresql84-server-8.4.5-1.el5_5.1

How reproducible:
always

Steps to Reproduce:
1. Restart spacewalk (spacewalk-service restart)

2. Run VACUUM VERBOSE rhnchannelnewestpackage
[...]
INFO: "rhnchannelnewestpackage": found 661960 removable, 11794 nonremovable row versions in 5616 out of 5616 pages
[...]

3. Run explain analyze select * from rhn_channel.refresh_newest_package(101, E'Test'); several times. You'll notice somewhat increasing query runtimes (hence the "explain analyze")

4. Run VACUUM VERBOSE rhnchannelnewestpackage again
will look like the following:
[...]
INFO: "rhnchannelnewestpackage": found 0 removable, 96818 nonremovable row versions in 838 out of 5616 pages
[...]

5. Restart spacewalk again

6. Run VACUUM VERBOSE rhnchannelnewestpackage once again
[...]
INFO: "rhnchannelnewestpackage": found 88504 removable, 8314 nonremovable row versions in 838 out of 5616 pages
[...]

Actual results:
Because of PostgreSQL's MVCC rows that might be in use by a running transaction will not be removed (i.e. garbage collectd) by VACCUM.
As spacewalk keeps transactions idle for a long time (see SELECT * FROM pg_stat_activity ODER BY xact_start) only rows that are older than the oldest running transaction will be collected.

Expected results:
Spacewalk should not keep idle transactions. A transaction should only ever be started when there's work to do and should be comitted or rolled back immediately after the work is done.

Additional info:
Especially the example given here (rhnchannelnewestpackage) is a problem. Before I noticed this bug my table grew to more than 1 GB. After a VACUUM FULL with no concurrent transactions it is now around 40 MB.

As rhn_channel.refresh_newest_packages() rewrites large parts of the table each time it is called the number of dead rows grows really quick.

Possible Workaround:
Schedule a spacewalk restart every hour or so

Comment 1 Andreas Rogge 2011-06-01 08:53:26 UTC

Setting to high priority as this is somewhat like a memory leak. 
The database leaks diskspace until you restart the software. That's at least as bad as a memory leak.

Comment 2 Andreas Rogge 2011-06-02 22:57:00 UTC

I've identified three sources of these dangling transactions.

1. osa-dispatcher
2. task-o-matic
3. the /XMLRPC handler

"fixing" osa-dispatcher is simple. Adding rhnSQL.commit() calls after the two queries appearing in the main-loop makes the transactions close.

As osa-dispatcher is the "bad guy" (i.e. the component that keeps transactions open for days and weeks) this already helps a lot.

Comment 3 Andreas Rogge 2011-06-02 23:00:03 UTC

Created attachment 502679 [details]
Patch that adds rhnSQL.commit() into the main loop of osa-dispatcher

Please have a look at the attached patch and tell me wether that is a feasible workaround.

I can also provide a patch for git master if that helps.

Comment 4 Andreas Rogge 2011-06-02 23:36:18 UTC

Created attachment 502691 [details]
Add a rhnSQL.commit() just after self._req_processor.process()

This adds a rhnSQL.commit() just after the request has been processed. 
It does not look like it is a great place for that, but it is where rhnSQL.rollback() is done in case the request threw an exception, so I guess it should be ok.

Comment 5 Andreas Rogge 2011-06-03 04:00:35 UTC

Created attachment 502711 [details]
add rhnSQL.commit() after process() to flush dangling transaction

New patch against git master

Comment 6 Andreas Rogge 2011-06-03 04:01:32 UTC

Created attachment 502712 [details]
introduce rhnSQL.commit() after the both SELECT statements that seem to be the main loop

New patch against git master

Comment 7 Jan Pazdziora (Red Hat) 2011-06-07 11:11:30 UTC

(In reply to comment #6)
> Created attachment 502712 [details]
> introduce rhnSQL.commit() after the both SELECT statements that seem to be the
> main loop
> 
> New patch against git master

Committed to Spacewalk master as 5ac15a79d42a4a334bf162957a3de5f8c0536f50.

Comment 8 Jan Pazdziora (Red Hat) 2011-07-20 11:48:35 UTC

Aligning under space16.

Comment 9 Jan Pazdziora (Red Hat) 2011-09-16 10:03:41 UTC

Taking.

Comment 10 Jan Pazdziora (Red Hat) 2011-09-16 10:08:23 UTC

I'm going to move this bugzilla to CLOSED CURRENTRELEASE as the osa-dispatcher patch was implemented and released.

As for the apacheHandler.py patch, I am not too fond of the commit. There was more discussion about it in thread

http://www.redhat.com/archives/spacewalk-devel/2011-June/msg00004.html

and we will need to find out what is it that leaves the transactions behind. If we have more information or patch which would help us to find that out, please either open new bugzilla or followup in that thread.

We did not have any patch for the taskomatic side -- again, if anyone comes with an investigation or patch, please file new bugzilla or post to spacewalk-devel mailing list.

Note You need to log in before you can comment on or make changes to this bug.