Description of problem: Seems like Spacewalk is keeping database transactions open which leads to unremovable dead rows in tables. Version-Release number of selected component (if applicable): spacewalk-postgresql-1.3.3-1.el5 postgresql84-server-8.4.5-1.el5_5.1 How reproducible: always Steps to Reproduce: 1. Restart spacewalk (spacewalk-service restart) 2. Run VACUUM VERBOSE rhnchannelnewestpackage [...] INFO: "rhnchannelnewestpackage": found 661960 removable, 11794 nonremovable row versions in 5616 out of 5616 pages [...] 3. Run explain analyze select * from rhn_channel.refresh_newest_package(101, E'Test'); several times. You'll notice somewhat increasing query runtimes (hence the "explain analyze") 4. Run VACUUM VERBOSE rhnchannelnewestpackage again will look like the following: [...] INFO: "rhnchannelnewestpackage": found 0 removable, 96818 nonremovable row versions in 838 out of 5616 pages [...] 5. Restart spacewalk again 6. Run VACUUM VERBOSE rhnchannelnewestpackage once again [...] INFO: "rhnchannelnewestpackage": found 88504 removable, 8314 nonremovable row versions in 838 out of 5616 pages [...] Actual results: Because of PostgreSQL's MVCC rows that might be in use by a running transaction will not be removed (i.e. garbage collectd) by VACCUM. As spacewalk keeps transactions idle for a long time (see SELECT * FROM pg_stat_activity ODER BY xact_start) only rows that are older than the oldest running transaction will be collected. Expected results: Spacewalk should not keep idle transactions. A transaction should only ever be started when there's work to do and should be comitted or rolled back immediately after the work is done. Additional info: Especially the example given here (rhnchannelnewestpackage) is a problem. Before I noticed this bug my table grew to more than 1 GB. After a VACUUM FULL with no concurrent transactions it is now around 40 MB. As rhn_channel.refresh_newest_packages() rewrites large parts of the table each time it is called the number of dead rows grows really quick. Possible Workaround: Schedule a spacewalk restart every hour or so
Setting to high priority as this is somewhat like a memory leak. The database leaks diskspace until you restart the software. That's at least as bad as a memory leak.
I've identified three sources of these dangling transactions. 1. osa-dispatcher 2. task-o-matic 3. the /XMLRPC handler "fixing" osa-dispatcher is simple. Adding rhnSQL.commit() calls after the two queries appearing in the main-loop makes the transactions close. As osa-dispatcher is the "bad guy" (i.e. the component that keeps transactions open for days and weeks) this already helps a lot.
Created attachment 502679 [details] Patch that adds rhnSQL.commit() into the main loop of osa-dispatcher Please have a look at the attached patch and tell me wether that is a feasible workaround. I can also provide a patch for git master if that helps.
Created attachment 502691 [details] Add a rhnSQL.commit() just after self._req_processor.process() This adds a rhnSQL.commit() just after the request has been processed. It does not look like it is a great place for that, but it is where rhnSQL.rollback() is done in case the request threw an exception, so I guess it should be ok.
Created attachment 502711 [details] add rhnSQL.commit() after process() to flush dangling transaction New patch against git master
Created attachment 502712 [details] introduce rhnSQL.commit() after the both SELECT statements that seem to be the main loop New patch against git master
(In reply to comment #6) > Created attachment 502712 [details] > introduce rhnSQL.commit() after the both SELECT statements that seem to be the > main loop > > New patch against git master Committed to Spacewalk master as 5ac15a79d42a4a334bf162957a3de5f8c0536f50.
Aligning under space16.
Taking.
I'm going to move this bugzilla to CLOSED CURRENTRELEASE as the osa-dispatcher patch was implemented and released. As for the apacheHandler.py patch, I am not too fond of the commit. There was more discussion about it in thread http://www.redhat.com/archives/spacewalk-devel/2011-June/msg00004.html and we will need to find out what is it that leaves the transactions behind. If we have more information or patch which would help us to find that out, please either open new bugzilla or followup in that thread. We did not have any patch for the taskomatic side -- again, if anyone comes with an investigation or patch, please file new bugzilla or post to spacewalk-devel mailing list.