Bug 1023557 - Satellite 5.6 postgres takes a long time to satellite-sync for initial sync
Satellite 5.6 postgres takes a long time to satellite-sync for initial sync
Status: CLOSED CURRENTRELEASE
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Server (Show other bugs)
570
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Stephen Herr
Pavel Studeník
:
Depends On:
Blocks: sat570-blockers 1139391
  Show dependency treegraph
 
Reported: 2013-10-25 14:13 EDT by Stephen Herr
Modified: 2015-01-13 05:25 EST (History)
9 users (show)

See Also:
Fixed In Version: spacewalk-backend-2.3.3-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1139391 (view as bug list)
Environment:
Last Closed: 2015-01-13 05:25:32 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Text file of notes taken during my investigation, looking for slow Satellite-Sync of 5.5 vs 5.6 (27.88 KB, text/plain)
2013-11-07 10:13 EST, Clifford Perry
no flags Details

  None (edit)
Description Stephen Herr 2013-10-25 14:13:03 EDT
Description of problem:
satellite-syncing on a Sat 5.6 machine with the embedded postrgesql database takes a very long time. Specifically, importing the package metadata takes a long time.

Version-Release number of selected component (if applicable):
5.6 postgres

How reproducible:
unkown

Steps to Reproduce:
1. Install Sat 5.6 with embedded postgresql database, activate it
2. 'satellite-sync --no-rpms' a channel

Actual results:
It takes a very long time (many hours) to sync a single base channel's metadata 

Expected results:
Metadata importing should be about as fast as it used to be.

Additional info:
By setting:
log_min_duration_statement = 250
in /var/lib/pgsql/data/postgresql.conf
to log queries that take over a quarter of a second, we see many instances of:

2013-10-25 14:01:12.822 EDT LOG:  duration: 310.587 ms  statement: insert into rhnChecksum (id, checksum_type_id, checksum) values (42538, (select id from rhnChecksumType where label = 'sha256'), '8c22773f84750285b6cc6bd5cefdd2282dec1963aa6526563332f0bc04abeace')
2013-10-25 14:01:12.823 EDT LOG:  duration: 325.094 ms  statement: select lookup_checksum(E'sha256', E'8c22773f84750285b6cc6bd5cefdd2282dec1963aa6526563332f0bc04abeace') id from dual
Comment 18 Clifford Perry 2013-11-07 10:13:43 EST
Created attachment 821210 [details]
Text file of notes taken during my investigation, looking for slow Satellite-Sync of 5.5 vs 5.6
Comment 19 Stephen Herr 2014-07-11 08:45:52 EDT
The root problem is that Postgresql does not support autonomous transactions, which we require. We work around this problem by using dblink_exec to execute the sql that should be autonomous in a separate connection. This works fine for the most part, but satellite-sync can require millions of autonomous transactions for an inital base channel sync, and the overhead of creating all those connections to the database was making it very slow. Note that these connections are only required when inserting new package capabilities or checksum information (or other things as well, but less commonly), so only the initial sync is affected.

I have filed pull request https://github.com/spacewalkproject/spacewalk/pull/120 for review that should (mostly) solve this issue. Instead of creating millions of separate connections to the db, this change would create one additional connection and reuse it for the most commonly used batches of autonomous sql. This has the effect of reducing the time required to the initial metadata import from a base channel from 20 hours 35 minutes to 5 hours 21 minutes, a 4x speed improvement. Sync logs below.

Before changes:
satellite-sync -c rhel-x86_64-server-6 --no-rpms --no-kickstarts
00:04:35 Spacewalk - live synchronization
00:04:35    url: https://satellite.rhn.redhat.com
00:04:35    debug/output level: 1
00:04:35    db:  spaceuser/<password>@spaceschema
00:04:35
00:04:35 Retrieving / parsing channel-families data
00:04:51 channel-families data complete
00:04:51
00:04:51 Retrieving / parsing product names data
00:04:51 product names data complete
00:04:55
00:04:55 Retrieving / parsing arches data
00:04:56 arches data complete
00:04:56
00:04:56 Retrieving / parsing additional arches data
00:04:56 additional arches data complete
00:04:56
00:04:56 Retrieving / parsing channel data
00:07:22    p = previously imported/synced channel
00:07:22    . = channel not yet imported/synced
00:07:22    base-channels:
00:07:22       p rhel-x86_64-server-6                     12659
00:07:22
00:07:23 Channel data complete
00:07:23
00:07:23 Retrieving short package metadata (used for indexing)
00:07:23    Retrieving / parsing short package metadata: rhel-x86_64-server-6 (12659)
00:07:32 Diffing package metadata (what's missing locally?): rhel-x86_64-server-6
            ________________________________________
Diffing:    ######################################## - complete
00:09:47
00:09:47 Downloading package metadata
00:11:53    Retrieving / parsing *relevant* package metadata: rhel-x86_64-server-6 (NONE RELEVANT)
00:13:56
00:13:56 Importing package metadata
00:13:56    Importing *relevant* package metadata: rhel-x86_64-server-6 (12659)
            ________________________________________
Importing:  ######################################## - complete
20:18:21
20:18:21 Linking packages to channels
20:19:34
20:19:34 Downloading errata data
20:19:34    Retrieving / parsing errata data: rhel-x86_64-server-6 (NONE RELEVANT)
20:19:34 Downloading errata data complete
20:19:34
20:19:34 Importing channel errata
20:20:06    Importing *relevant* errata: rhel-x86_64-server-6 (2407)
            ________________________________________
Downloading:######################################## - complete
    Import complete:
        Begin time: Thu Jul 10 00:04:35 2014
        End time:   Thu Jul 10 20:40:20 2014
        Elapsed:    20 hours, 35 minutes, 44 seconds



After changes:
satellite-sync -c rhel-x86_64-server-6 --no-rpms --no-kickstarts
17:50:46 Spacewalk - live synchronization
17:50:46    url: https://satellite.rhn.redhat.com
17:50:46    debug/output level: 1
17:50:47    db:  spaceuser/<password>@spaceschema
17:50:47
17:50:47 Retrieving / parsing channel-families data
17:51:03 channel-families data complete
17:51:03
17:51:03 Retrieving / parsing product names data
17:51:04 product names data complete
17:51:08
17:51:08 Retrieving / parsing arches data
17:51:08 arches data complete
17:51:08
17:51:08 Retrieving / parsing additional arches data
17:51:09 additional arches data complete
17:51:09
17:51:09 Retrieving / parsing channel data
17:53:35    p = previously imported/synced channel
17:53:35    . = channel not yet imported/synced
17:53:35    base-channels:
17:53:35       p rhel-x86_64-server-6                     12659
17:53:35
17:53:35 Channel data complete
17:53:35
17:53:35 Retrieving short package metadata (used for indexing)
17:53:36    Retrieving / parsing short package metadata: rhel-x86_64-server-6 (12659)
17:53:44 Diffing package metadata (what's missing locally?): rhel-x86_64-server-6
            ________________________________________
Diffing:    ######################################## - complete
17:55:59
17:55:59 Downloading package metadata
17:58:05    Retrieving / parsing *relevant* package metadata: rhel-x86_64-server-6 (NONE RELEVANT)
18:00:11
18:00:11 Importing package metadata
18:00:11    Importing *relevant* package metadata: rhel-x86_64-server-6 (12659)
            ________________________________________
Importing:  ######################################## - complete
23:02:21
23:02:21 Linking packages to channels
23:03:37
23:03:37 Downloading errata data
23:03:37    Retrieving / parsing errata data: rhel-x86_64-server-6 (2407)
            ________________________________________
Downloading:######################################## - complete
23:05:48 Downloading errata data complete
23:05:48
23:05:48 Importing channel errata
23:06:20    Importing *relevant* errata: rhel-x86_64-server-6 (2407)
            ________________________________________
Downloading:######################################## - complete
    Import complete:
        Begin time: Wed Jul  9 17:50:46 2014
        End time:   Wed Jul  9 23:11:50 2014
        Elapsed:    5 hours, 21 minutes, 4 seconds
Comment 20 Stephen Herr 2014-07-14 08:39:19 EDT
There were some comments about how this might possibly be related to the Postgresql optimizer not being initialized well since if it has no data yet and you import a large channel, so I did an additional test to prove that is not the problem.

Test setup is:
1) Install clean Spacewalk-nightly server
2) satellite-sync rhel-86_64-server-6 to get the optimizer warmed up
3) create a db backup
4) satellite-sync rhel-86_64-server-5 (--no-rpms and --no-kickstarts, we're only concerned with the metadata importing)
5) restore db to previous state
6) clear satellite-sync metadata cache
7) sync again with updated code

Sync times in step 4 and 7 are compareble with the ones in comment 19. The major difference is that in this test we had to download the package metadata each time, whereas in comment 19 the metadata was already downloaded but not imported. Downloading the package metadata takes about 40 minutes, which when combined with the fact that we're syncing RHEL 5 now instead of RHEL 6 explains the difference between this test and the previous times.

Current sat-sync code:
    Import complete:
        Begin time: Sat Jul 12 05:26:35 2014
        End time:   Sun Jul 13 07:54:39 2014
        Elapsed:    26 hours, 28 minutes, 4 seconds

Updated code:
    Import complete:
        Begin time: Mon Jul 14 00:15:35 2014
        End time:   Mon Jul 14 07:00:28 2014
        Elapsed:    6 hours, 44 minutes, 53 seconds
Comment 21 Stephen Herr 2014-07-17 15:45:40 EDT
Some additional data points:

I do not observe the same initial-sync-slowness on stock satellite-sync running on an Oracle database. The same setup as the first test in comment 19 results in:
    Import complete:
        Begin time: Thu Jul 17 08:20:28 2014
        End time:   Thu Jul 17 11:17:35 2014
        Elapsed:    2 hours, 57 minutes, 6 seconds


There was a suggestion made on the pull request to try modifying the stored proceedures a different way to reuse the connection instead of creating a secondary permanent connection. This did speed up satellite-sync execution, but not nearly as much as the initial implementation did. The benefit this approach has is that it's much simpler, and therefore safer, however I think that 12 hours is still too long.

Implementation details can be seen here:
https://github.com/sdherr/spacewalk/commit/f1c84601a9be6436eeb3e9fef93b18aff4a36647

Satellite-sync results that are comparible with the tests in comment 20:
    Import complete:
        Begin time: Wed Jul 16 16:09:28 2014
        End time:   Thu Jul 17 04:21:44 2014
        Elapsed:    12 hours, 12 minutes, 15 seconds
Comment 22 Stephen Herr 2014-07-21 15:00:51 EDT
Committing to Spacewalk master:
3e1d60d08eca129e60607dcca460bcf201856cef
Comment 25 Clifford Perry 2015-01-13 05:25:32 EST
With the release of Red Hat Satellite 5.7 on January 12th 2015 this bug is
being moved to a Closed Current Release state. 

The Satellite 5.7 GA Errata:
 - https://rhn.redhat.com/errata/RHSA-2015-0033.html 

Satellite 5.7 Release Notes:
 -
https://access.redhat.com/documentation/en-US/Red_Hat_Satellite/5.7/html-single/Release_Notes/index.html

Satellite Customer Portal Blog announcement for release:
 - https://access.redhat.com/blogs/1169563/posts/1315743 

Cliff

NOTE: This bug has not been re-verified (moved to RELEASE_PENDING) prior to release. We assume that the bug has indeed been fixed and not regressed since we initially verified it. Please re-open in the future if needed.

Note You need to log in before you can comment on or make changes to this bug.