Bug 752693 - DB performance (PostgreSQL)
Summary: DB performance (PostgreSQL)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Server
Version: 1.5
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Michael Mráka
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: space17
TreeView+ depends on / blocked
 
Reported: 2011-11-10 06:42 UTC by Andrey Lyarskiy
Modified: 2012-03-07 09:55 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-03-07 09:55:51 UTC
Embargoed:


Attachments (Terms of Use)

Description Andrey Lyarskiy 2011-11-10 06:42:42 UTC
Description of problem:
Slow db performance during repos are syncing.

Version-Release number of selected component (if applicable):
Spacewalk 1.5, PostgreSQL 8.4.9, OS - SL 6.1 x86_64.

How reproducible:
Always

Steps to Reproduce:
1. Install Spacewalk
2. Add repo (for example EPEL6)
3. Start repo sync
  
Actual results:
PostgreSQL log:
LOG:  duration: 26505.178 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')
LOG:  duration: 26864.733 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')
LOG:  duration: 25160.499 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')


Expected results:
Faster execution of function.

Additional info:
It seems there is no index on channel_id column. Adding index improving performance a bit:
LOG:  duration: 40137.836 ms  statement: CREATE INDEX rhn_cnp_chid_idx
           ON rhnchannelnewestpackage (channel_id ASC NULLS LAST);
        
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "spaceschema.public.rhnchannelnewestpackage"
LOG:  duration: 58944.584 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')
LOG:  duration: 63254.151 ms  statement: ANALYZE VERBOSE rhnchannelnewestpackage
LOG:  duration: 47682.030 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')
LOG:  duration: 10813.587 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')
LOG:  duration: 1245.523 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')
LOG:  duration: 1477.135 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')
LOG:  duration: 2471.544 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')
LOG:  duration: 2346.093 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')
LOG:  duration: 1384.573 ms  statement: SELECT rhn_channel.refresh_newest_package(102, E'server.app.yumreposync')

Comment 1 Pavel Siderov 2012-02-08 13:20:53 UTC
The issue is related to function rhn_channel.refresh_newest_package. It is executed after a packet from a repo is synced. If you have 3000 thousand packages in a repo the function is executed 3000 times. The function deletes all records for this channel from rhnChannelNewestPackage every time it is executed. Then selecting all packages synced till this moment using rhnchannelnewestpackageview and inserting them again in rhnChannelNewestPackage. This way time needed increases linearly. 

Here goes the fix. I have removed the delete and insert only the latest package info in rhnChannelNewestPackage.



DROP FUNCTION rhn_channel.refresh_newest_package(numeric, character varying);

CREATE OR REPLACE FUNCTION rhn_channel.refresh_newest_package(channel_id_in numeric, caller_in character varying DEFAULT '(unknown)'::character varying)
  RETURNS void AS
$BODY$
        begin
            insert into rhnChannelNewestPackage
            ( channel_id, name_id, evr_id, package_id, package_arch_id ) 
        ( select channel_id_in, dve.name_id, dve.evr_id, edno.package_id, dve.package_arch_id from rhnChannelPackage edno, rhnpackage dve where (edno.package_id=dve.id) and edno.channel_id=channel_id_in order by edno.created desc limit 1);
                insert into rhnChannelNewestPackageAudit (channel_id, caller)
                    values (channel_id_in, caller_in);
                update rhnChannel 
                    set last_modified = greatest(current_timestamp, last_modified + interval '1 second')
                    where id = channel_id_in; 
        end$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
ALTER FUNCTION rhn_channel.refresh_newest_package(numeric, character varying) OWNER TO spaceuser;



Tested and works great for me. The time needed for a repo sync with lots of packages decreased to less then 20 minutes. Actually now it depends on the bandwith available to the repo.

Please someone from RH to check it.

Cheers,
Pavel Siderov

Comment 2 Michael Mráka 2012-02-08 13:57:14 UTC
Hello Pavel,

this particular issue has been fixed in spacewalk-backend-1.7.6-1, commits

commit df4ab6bd0216725c2414915e03de10a2b5fd6c47
    use name_ids to speedup reposync
commit e6dff1684f1c4233541dfb1642ab0bc28c901401
    enhanced procedure refresh_newest_package

Comment 3 Jan Pazdziora 2012-03-07 09:55:51 UTC
Spacewalk 1.7 has been released:

https://fedorahosted.org/spacewalk/wiki/ReleaseNotes17


Note You need to log in before you can comment on or make changes to this bug.