Bug 818204

Summary: Sync silently "cancels" on some (very large?) repos
Product: Red Hat Satellite Reporter: Corey Welton <cwelton>
Component: WebUIAssignee: Ivan Necas <inecas>
Status: CLOSED ERRATA QA Contact: Corey Welton <cwelton>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0.1CC: achan, asettle, cpelland, dmacpher, inecas, lzap, mmccune, omaciel, pkilambi
Target Milestone: UnspecifiedKeywords: Triaged, ZStream
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Repository sync or promotion on large repositories caused the sync to cancel without warning. The createrepo --update command loads the entire xml into memory which results in memory running out and stopping the sync abruptly. This fix redirects the command to use the SQLite database instead of XML files. Repository synchronization and promotion works with reduces memory usage.
Story Points: ---
Clone Of:
: 828977 (view as bug list) Environment:
Last Closed: 2012-12-04 19:45:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 828977    
Attachments:
Description Flags
screenshot showing "Cancelled" sync none

Description Corey Welton 2012-05-02 12:47:15 UTC
Description of problem:
Trying to sync a large repo, I get about 98-99% complete; at some point afterwards looking at the status in the UI just shows "Cancelled." for the sync process. No errors are thrown in a notification bubble nor in the notifications pane. 

I suspect it is taking place during the distro portion of the sync.

Version-Release number of selected component (if applicable):


How reproducible:
I don't know how easily this is reproducible, other than the notion that I tried syncing three times and had the same result.

Steps to Reproduce:
1.  Create provider "Fedora_provider", Product "Fedora_product" and Repo "Fedora_repo_0", with a repourl of http://download.fedoraproject.org/pub/fedora/linux/releases/16/Everything/x86_64/os/
2. Begin syncing of repo.  Note that this is a large repo (25G+) so it might take a while.
3.  Watch progress up as far as it goes.
  
Actual results:
Repo gets mostly synced, but will eventually just show "Cancelled." on the UI. No notification pops up, nor any errors thrown in the notification pane.

Expected results:
Repo can sync and/or a proper notification that it failed occurs.

Additional info:

The particular testcase I was running required a repo which contains distributions. After syncing this far and looking at a promotions or templates creation view, Nothing shows as available under Distributions, so it's likely those did not yet get synced. Per Dev, distributions are the last things synced, so that could be part of the problem here.

I think this is actually two bugs:
* I suspect pulp is hitting some sort of timeout while attempting to download distributions.
* The UI is not catching this error and is just silently cancelling.

Comment 1 Corey Welton 2012-05-02 12:48:42 UTC
Created attachment 581612 [details]
screenshot showing "Cancelled" sync

Comment 2 Corey Welton 2012-05-02 12:50:53 UTC
Note that I was successfully able to pull in a RHEL product and associated Distributions, but not this Fedora third-party repo.

Comment 4 Lukas Zapletal 2012-05-31 10:05:30 UTC
Sidenote - bug like this would be nice with katello-debug output. Particulary grinder.log. Any chance you can attach it, @Corey?

Comment 7 Ivan Necas 2012-05-31 12:24:17 UTC
The core of the problem is Pulp can get very greedy on memory when computing metadata. It depends on the size of repo and available swap whether the task can be completed or not. Apparently for the whole fedora repo 2GB RAM and 4GB swap is just not big enough. I can experiment on how much swap is enough for the task to be completed.

This also might occur when promoting repositioties (both custom and RH)

Comment 12 Corey Welton 2012-09-17 19:42:42 UTC
Now seeing a failure when syncing this repo.  Pulp log tells me:


2012-09-17 13:59:07,803 26471:140373334677248: pulp.server.tasking.task:ERROR: task:472 Task failed: Task 32282535-00de-11e2-85b4-525400a5f611: _sync(Test_Org_1347895089-Fedora_16-Fedora_16_-_x86_64, synchronizer=<pulp.server.api.synchronizers.YumSynchronizer object at 0x7fab40197590>, skip={}, max_speed=None, threads=4, progress_callback=<bound method RepoSyncTask.progress_callback of <pulp.server.api.repo_sync_task.RepoSyncTask object at 0x7fab40197710>>)
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 418, in run
    result = self.callable(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 283, in _sync
    progress_callback, synchronizer, max_speed, threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 379, in fetch_content
    added_errataids = synchronizer.import_metadata(repo_dir, repo_id, skip_dict)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/synchronizers.py", line 416, in import_metadata
    self.repo_api.collection.save(repo, safe=True)
  File "/usr/lib/python2.6/site-packages/pulp/server/db/connection.py", line 80, in retry
    return method(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 237, in save
    manipulate, safe, _check_keys=True, **kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/db/connection.py", line 80, in retry
    return method(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 411, in update
    _check_keys, self.__uuid_subtype), safe)
InvalidDocument: key 'openoffice.org-voikko' must not contain '.'


Not sure if this is related or a completely different bug....?

Comment 13 Ivan Necas 2012-09-18 07:55:39 UTC
what is the version of pymongo on your system? This is known bug for Katello on epel (with pymongo-2.0) but was not seen so far with pymongo-1.9 that is shipped with CFSE. If it's pymongo-2.0 what is the repository you get it from?

Comment 14 Corey Welton 2012-09-18 13:05:11 UTC
right, resolved this yesterday, accidentally left an EPEL repo enabled, hence pymongo was grabbed from there.  Ignore comment #12

Comment 15 Corey Welton 2012-09-18 19:27:04 UTC
I suppose this is fixed.  Currently the sync ui is giving me an "Error syncing!" message but the notifications UI is telling me "Repository 'Fedora 16 - x86_64' finished syncing successfully.". There are some minor errors in the grinder logs (about packages not being the expected size), but overall I don't see a lot in the way of errors anywhere.

So maybe all that is a different issue. I will poke at this a bit more, but I'm inclined to consider this one passed -- we're no longer silently cancelling.

Comment 16 Corey Welton 2012-09-19 14:30:25 UTC
QE Verified.  Entered new bug #858398 for behavior referenced in comment #15

Comment 18 errata-xmlrpc 2012-12-04 19:45:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1543.html

Comment 19 Mike McCune 2013-08-16 17:55:54 UTC
getting rid of 6.0.0 version since that doesn't exist