Bug 1023557
Summary: | Satellite 5.6 postgres takes a long time to satellite-sync for initial sync | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite 5 | Reporter: | Stephen Herr <sherr> | ||||
Component: | Server | Assignee: | Stephen Herr <sherr> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Pavel StudenÃk <pstudeni> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 570 | CC: | cperry, djuran, jfenal, jhutar, lpramuk, mzazrivec, pablo.iranzo, pstudeni, xdmoon | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | spacewalk-backend-2.3.3-1 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1139391 (view as bug list) | Environment: | |||||
Last Closed: | 2015-01-13 10:25:32 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1127215, 1139391 | ||||||
Attachments: |
|
Description
Stephen Herr
2013-10-25 18:13:03 UTC
Created attachment 821210 [details]
Text file of notes taken during my investigation, looking for slow Satellite-Sync of 5.5 vs 5.6
The root problem is that Postgresql does not support autonomous transactions, which we require. We work around this problem by using dblink_exec to execute the sql that should be autonomous in a separate connection. This works fine for the most part, but satellite-sync can require millions of autonomous transactions for an inital base channel sync, and the overhead of creating all those connections to the database was making it very slow. Note that these connections are only required when inserting new package capabilities or checksum information (or other things as well, but less commonly), so only the initial sync is affected. I have filed pull request https://github.com/spacewalkproject/spacewalk/pull/120 for review that should (mostly) solve this issue. Instead of creating millions of separate connections to the db, this change would create one additional connection and reuse it for the most commonly used batches of autonomous sql. This has the effect of reducing the time required to the initial metadata import from a base channel from 20 hours 35 minutes to 5 hours 21 minutes, a 4x speed improvement. Sync logs below. Before changes: satellite-sync -c rhel-x86_64-server-6 --no-rpms --no-kickstarts 00:04:35 Spacewalk - live synchronization 00:04:35 url: https://satellite.rhn.redhat.com 00:04:35 debug/output level: 1 00:04:35 db: spaceuser/<password>@spaceschema 00:04:35 00:04:35 Retrieving / parsing channel-families data 00:04:51 channel-families data complete 00:04:51 00:04:51 Retrieving / parsing product names data 00:04:51 product names data complete 00:04:55 00:04:55 Retrieving / parsing arches data 00:04:56 arches data complete 00:04:56 00:04:56 Retrieving / parsing additional arches data 00:04:56 additional arches data complete 00:04:56 00:04:56 Retrieving / parsing channel data 00:07:22 p = previously imported/synced channel 00:07:22 . = channel not yet imported/synced 00:07:22 base-channels: 00:07:22 p rhel-x86_64-server-6 12659 00:07:22 00:07:23 Channel data complete 00:07:23 00:07:23 Retrieving short package metadata (used for indexing) 00:07:23 Retrieving / parsing short package metadata: rhel-x86_64-server-6 (12659) 00:07:32 Diffing package metadata (what's missing locally?): rhel-x86_64-server-6 ________________________________________ Diffing: ######################################## - complete 00:09:47 00:09:47 Downloading package metadata 00:11:53 Retrieving / parsing *relevant* package metadata: rhel-x86_64-server-6 (NONE RELEVANT) 00:13:56 00:13:56 Importing package metadata 00:13:56 Importing *relevant* package metadata: rhel-x86_64-server-6 (12659) ________________________________________ Importing: ######################################## - complete 20:18:21 20:18:21 Linking packages to channels 20:19:34 20:19:34 Downloading errata data 20:19:34 Retrieving / parsing errata data: rhel-x86_64-server-6 (NONE RELEVANT) 20:19:34 Downloading errata data complete 20:19:34 20:19:34 Importing channel errata 20:20:06 Importing *relevant* errata: rhel-x86_64-server-6 (2407) ________________________________________ Downloading:######################################## - complete Import complete: Begin time: Thu Jul 10 00:04:35 2014 End time: Thu Jul 10 20:40:20 2014 Elapsed: 20 hours, 35 minutes, 44 seconds After changes: satellite-sync -c rhel-x86_64-server-6 --no-rpms --no-kickstarts 17:50:46 Spacewalk - live synchronization 17:50:46 url: https://satellite.rhn.redhat.com 17:50:46 debug/output level: 1 17:50:47 db: spaceuser/<password>@spaceschema 17:50:47 17:50:47 Retrieving / parsing channel-families data 17:51:03 channel-families data complete 17:51:03 17:51:03 Retrieving / parsing product names data 17:51:04 product names data complete 17:51:08 17:51:08 Retrieving / parsing arches data 17:51:08 arches data complete 17:51:08 17:51:08 Retrieving / parsing additional arches data 17:51:09 additional arches data complete 17:51:09 17:51:09 Retrieving / parsing channel data 17:53:35 p = previously imported/synced channel 17:53:35 . = channel not yet imported/synced 17:53:35 base-channels: 17:53:35 p rhel-x86_64-server-6 12659 17:53:35 17:53:35 Channel data complete 17:53:35 17:53:35 Retrieving short package metadata (used for indexing) 17:53:36 Retrieving / parsing short package metadata: rhel-x86_64-server-6 (12659) 17:53:44 Diffing package metadata (what's missing locally?): rhel-x86_64-server-6 ________________________________________ Diffing: ######################################## - complete 17:55:59 17:55:59 Downloading package metadata 17:58:05 Retrieving / parsing *relevant* package metadata: rhel-x86_64-server-6 (NONE RELEVANT) 18:00:11 18:00:11 Importing package metadata 18:00:11 Importing *relevant* package metadata: rhel-x86_64-server-6 (12659) ________________________________________ Importing: ######################################## - complete 23:02:21 23:02:21 Linking packages to channels 23:03:37 23:03:37 Downloading errata data 23:03:37 Retrieving / parsing errata data: rhel-x86_64-server-6 (2407) ________________________________________ Downloading:######################################## - complete 23:05:48 Downloading errata data complete 23:05:48 23:05:48 Importing channel errata 23:06:20 Importing *relevant* errata: rhel-x86_64-server-6 (2407) ________________________________________ Downloading:######################################## - complete Import complete: Begin time: Wed Jul 9 17:50:46 2014 End time: Wed Jul 9 23:11:50 2014 Elapsed: 5 hours, 21 minutes, 4 seconds There were some comments about how this might possibly be related to the Postgresql optimizer not being initialized well since if it has no data yet and you import a large channel, so I did an additional test to prove that is not the problem. Test setup is: 1) Install clean Spacewalk-nightly server 2) satellite-sync rhel-86_64-server-6 to get the optimizer warmed up 3) create a db backup 4) satellite-sync rhel-86_64-server-5 (--no-rpms and --no-kickstarts, we're only concerned with the metadata importing) 5) restore db to previous state 6) clear satellite-sync metadata cache 7) sync again with updated code Sync times in step 4 and 7 are compareble with the ones in comment 19. The major difference is that in this test we had to download the package metadata each time, whereas in comment 19 the metadata was already downloaded but not imported. Downloading the package metadata takes about 40 minutes, which when combined with the fact that we're syncing RHEL 5 now instead of RHEL 6 explains the difference between this test and the previous times. Current sat-sync code: Import complete: Begin time: Sat Jul 12 05:26:35 2014 End time: Sun Jul 13 07:54:39 2014 Elapsed: 26 hours, 28 minutes, 4 seconds Updated code: Import complete: Begin time: Mon Jul 14 00:15:35 2014 End time: Mon Jul 14 07:00:28 2014 Elapsed: 6 hours, 44 minutes, 53 seconds Some additional data points: I do not observe the same initial-sync-slowness on stock satellite-sync running on an Oracle database. The same setup as the first test in comment 19 results in: Import complete: Begin time: Thu Jul 17 08:20:28 2014 End time: Thu Jul 17 11:17:35 2014 Elapsed: 2 hours, 57 minutes, 6 seconds There was a suggestion made on the pull request to try modifying the stored proceedures a different way to reuse the connection instead of creating a secondary permanent connection. This did speed up satellite-sync execution, but not nearly as much as the initial implementation did. The benefit this approach has is that it's much simpler, and therefore safer, however I think that 12 hours is still too long. Implementation details can be seen here: https://github.com/sdherr/spacewalk/commit/f1c84601a9be6436eeb3e9fef93b18aff4a36647 Satellite-sync results that are comparible with the tests in comment 20: Import complete: Begin time: Wed Jul 16 16:09:28 2014 End time: Thu Jul 17 04:21:44 2014 Elapsed: 12 hours, 12 minutes, 15 seconds Committing to Spacewalk master: 3e1d60d08eca129e60607dcca460bcf201856cef With the release of Red Hat Satellite 5.7 on January 12th 2015 this bug is being moved to a Closed Current Release state. The Satellite 5.7 GA Errata: - https://rhn.redhat.com/errata/RHSA-2015-0033.html Satellite 5.7 Release Notes: - https://access.redhat.com/documentation/en-US/Red_Hat_Satellite/5.7/html-single/Release_Notes/index.html Satellite Customer Portal Blog announcement for release: - https://access.redhat.com/blogs/1169563/posts/1315743 Cliff NOTE: This bug has not been re-verified (moved to RELEASE_PENDING) prior to release. We assume that the bug has indeed been fixed and not regressed since we initially verified it. Please re-open in the future if needed. |