Bug 748812 - Many errors in a sync raises: InvalidDocument: document too large - BSON documents are limited to 4 M
Summary: Many errors in a sync raises: InvalidDocument: document too large - BSON docu...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: user-experience
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: John Matthews
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-25 12:55 UTC by John Matthews
Modified: 2013-12-23 21:50 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-23 21:50:35 UTC
Embargoed:


Attachments (Terms of Use)
log file 1 (401.38 KB, application/x-bzip)
2011-10-25 15:11 UTC, Lukas Zapletal
no flags Details
log file 2 (149.16 KB, application/x-bzip)
2011-10-25 15:11 UTC, Lukas Zapletal
no flags Details

Description John Matthews 2011-10-25 12:55:42 UTC
Description of problem:
This bz represents a problem we have with a repo sync reporting the BSON object was too large.  My guess is this is triggered because we are capturing every error that occurs and for this sync, 2768 errors occurred. 


lzap saw an error from a sync.
He based the sync off of local content from an ISO.

Looking at his logs we see:
2011-10-25 12:15:31,412 15460:-1373635728: pulp.server.api.synchronizers:INFO: synchronizers:616 YumSynchronizer reported 5 successes, 0 downloads, 2768 errors


lzap's logs:
2011-10-25 12:13:43,399 15460:-1278223504: pulp.server.webservices.controllers.repositories:INFO: repositories:615 sync timeout passed : None
2011-10-25 12:13:44,116 15460:-1373635728: pulp.server.api.repo_sync:INFO: repo_sync:244 Sync invoked for repo <ACME_Corporation-isos-isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso>
2011-10-25 12:13:44,746 15460:-1373635728: pulp.server.api.repo_sync:INFO: repo_sync:258 Sync of ACME_Corporation-isos-isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso starting, skip_dict = {}
2011-10-25 12:13:45,544 15460:-1373635728: pulp.server.api.synchronizers:INFO: synchronizers:540 cacert = <None>, cert = <None>
2011-10-25 12:13:45,546 15460:-1373635728: pulp.server.api.synchronizers:INFO: synchronizers:588 Fetching repo to </var/lib/pulp//repos/ACME_Corporation/Locker/isos/isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso> with verify_options <{'checksum': True, 'size': True}>
2011-10-25 12:15:26,957 15460:-1373635728: pulp.server.api.synchronizers:INFO: synchronizers:608 Running createrepo, this may take a few minutes to complete.
2011-10-25 12:15:26,957 15460:-1373635728: pulp.server.util:INFO: util:462 metadata found; taking backup.
2011-10-25 12:15:26,988 15460:-1373635728: pulp.server.util:INFO: util:445 started repo metadata update: ['createrepo', '--database', '--checksum', 'sha256', '-g', '/var/lib/pulp//repos/ACME_Corporation/Locker/isos/isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso/repodata/1d46a2073097bdb6b7b6f6fba60e4525ca0bf2d1165026a3e1c87b0b2abaf07f-Fedora-14-comps.xml', '--update', '/var/lib/pulp//repos/ACME_Corporation/Locker/isos/isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso']
2011-10-25 12:15:31,297 15460:-1373635728: pulp.server.util:INFO: util:495 createrepo on /var/lib/pulp//repos/ACME_Corporation/Locker/isos/isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso finished
2011-10-25 12:15:31,411 15460:-1373635728: pulp.server.api.synchronizers:INFO: synchronizers:614 Createrepo finished in 4.45665597916 seconds
2011-10-25 12:15:31,412 15460:-1373635728: pulp.server.api.synchronizers:INFO: synchronizers:616 YumSynchronizer reported 5 successes, 0 downloads, 2768 errors
2011-10-25 12:15:32,822 15460:-1373635728: pulp.server.api.distribution:INFO: distribution:39 Distribution with id ks-ACME_Corporation-isos-isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso-noarch already exists
2011-10-25 12:15:33,809 15460:-1373635728: pulp.server.api.synchronizers:INFO: synchronizers:275 No metadata for 'File Sync' present; no files to import to repo..
2011-10-25 12:15:34,792 15460:-1373635728: pulp.server.api.synchronizers:INFO: synchronizers:246 Loading comps group info from: /var/lib/pulp//repos/ACME_Corporation/Locker/isos/isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso/repodata/1d46a2073097bdb6b7b6f6fba60e4525ca0bf2d1165026a3e1c87b0b2abaf07f-1d46a2073097bdb6b7b6f6fba60e4525ca0bf2d1165026a3e1c87b0b2abaf07f-Fedora-14-comps.xml
2011-10-25 12:15:35,015 15460:-1373635728: pulp.server.api.synchronizers:INFO: synchronizers:267 Skipping errata imports from sync process
2011-10-25 12:15:35,505 15460:-1373635728: pulp.server.api.repo_sync:INFO: repo_sync:265 Sync returned 0 packages, 0 errata in 110.758918047 seconds
2011-10-25 12:15:35,875 15460:-1373635728: pulp.server.api.repo_sync:INFO: repo_sync:276 0 old packages to process, 0 new packages to process
2011-10-25 12:15:37,533 15460:-1373635728: pulp.server.api.repo_sync:INFO: repo_sync:291 Examining 0 errata from repo ACME_Corporation-isos-isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso
2011-10-25 12:15:37,853 15460:-1373635728: pulp.server.api.repo_sync:INFO: repo_sync:295 Removing 0 old errata from repo ACME_Corporation-isos-isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso
2011-10-25 12:15:40,433 15460:-1373635728: pulp.server.api.repo_sync:INFO: repo_sync:304 Adding 0 new errata to repo ACME_Corporation-isos-isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso
2011-10-25 12:15:40,891 15460:-1373635728: pulp.server.tasking.task:INFO: task:376 Task succeeded: Task 04f7d6ae-fef2-11e0-83c0-f0def1032187: _sync(ACME_Corporation-isos-isos_pub_isos_fuse_Fedora-14-x86_64-DVD_iso, synchronizer=<pulp.server.api.synchronizers.YumSynchronizer object at 0xaf636f2c>, skip_dict={}, max_speed=None, threads=None, progress_callback=<bound method RepoSyncTask.progress_callback of <pulp.server.api.repo_sync_task.RepoSyncTask object at 0xa330c4ac>>)
2011-10-25 12:15:41,782 15460:-1373635728: pulp.server.tasking.task:ERROR: task:405 document too large - BSON documents are limited to 4 MB
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pulp/server/tasking/task.py", line 403, in _complete
    self.complete_callback(self)
  File "/usr/lib/python2.7/site-packages/pulp/server/tasking/taskqueue/queue.py", line 301, in complete
    self.__storage.store_complete(task)
  File "/usr/lib/python2.7/site-packages/pulp/server/tasking/taskqueue/storage.py", line 244, in store_complete
    self.history_collection.insert(history)
  File "/usr/lib/python2.7/site-packages/pulp/server/db/connection.py", line 80, in retry
    return method(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/pymongo/collection.py", line 270, in insert
    check_keys, safe, kwargs), safe)
  File "/usr/lib/python2.7/site-packages/pymongo/message.py", line 67, in insert
    bson_data = "".join([bson.BSON.encode(doc, check_keys) for doc in docs])
  File "/usr/lib/python2.7/site-packages/bson/__init__.py", line 462, in encode
    return cls(_dict_to_bson(document, check_keys))
InvalidDocument: document too large - BSON documents are limited to 4 MB

Comment 1 John Matthews 2011-10-25 14:42:08 UTC
I created a custom repo with 6k packages.
chmod 700 *.rpm
To force errors for all rpm fetches.

Repo Feed can be seen below.

$ sudo pulp-admin repo create --id large_repo_with_errors --feed http://jmatthews.fedorapeople.org/large_repo_no_read_perms

$ sudo pulp-admin repo sync --id large_repo_with_errors -F
Sync for repository large_repo_with_errors started
Step: Running Createrepo
Waiting /
Task not found: 4650403a-ff15-11e0-9041-0022157359f6
Traceback (most recent call last):
  File "/usr/bin/pulp-admin", line 7, in <module>
    execfile(__file__)
  File "/shared/repo/pulp/bin/pulp-admin", line 23, in <module>
    admin.main()
  File "/shared/repo/pulp/src/pulp/client/lib/cli.py", line 152, in main
    command.main(self.args[1:])
  File "/shared/repo/pulp/src/pulp/client/pluginlib/command.py", line 129, in main
    action.main(args[1:])
  File "/shared/repo/pulp/src/pulp/client/pluginlib/command.py", line 211, in main
    self.run()
  File "/shared/repo/pulp/src/pulp/client/admin/plugins/repo.py", line 887, in run
    self.sync_foreground(task)
  File "/shared/repo/pulp/src/pulp/client/admin/plugins/repo.py", line 839, in sync_foreground
    while not task_end(task):
  File "/shared/repo/pulp/src/pulp/client/api/task.py", line 19, in task_end
    return task['state'] in ('finished', 'error', 'canceled', 'timed_out')
TypeError: 'NoneType' object is not subscriptable


Looking at /var/log/pulp/pulp.log


2011-10-25 10:38:56,866 21874:140718479759104: pulp.server.api.synchronizers:INFO: synchronizers:640 YumSynchronizer reported 0 successes, 0 downloads, 6000 errors
2011-10-25 10:38:56,880 21874:140718479759104: pulp.server.api.synchronizers:INFO: synchronizers:298 No image files to import to repo..
2011-10-25 10:38:56,883 21874:140718479759104: pulp.server.api.synchronizers:INFO: synchronizers:278 No metadata for 'File Sync' present; no files to import to repo..
2011-10-25 10:38:56,886 21874:140718479759104: pulp.server.api.synchronizers:INFO: synchronizers:270 Skipping errata imports from sync process
2011-10-25 10:38:56,888 21874:140718479759104: pulp.server.api.repo_sync:INFO: repo_sync:269 Sync returned 0 packages, 0 errata in 770.836686134 seconds
2011-10-25 10:38:56,889 21874:140718479759104: pulp.server.api.repo_sync:INFO: repo_sync:280 0 old packages to process, 0 new packages to process
2011-10-25 10:38:56,893 21874:140718479759104: pulp.server.api.repo_sync:INFO: repo_sync:295 Examining 0 errata from repo large_repo_with_errors
2011-10-25 10:38:56,894 21874:140718479759104: pulp.server.api.repo_sync:INFO: repo_sync:299 Removing 0 old errata from repo large_repo_with_errors
2011-10-25 10:38:56,899 21874:140718479759104: pulp.server.api.repo_sync:INFO: repo_sync:308 Adding 0 new errata to repo large_repo_with_errors
2011-10-25 10:38:56,917 21874:140718479759104: pulp.server.tasking.task:INFO: task:440 Task succeeded: Task 4650403a-ff15-11e0-9041-0022157359f6: _sync(large_repo_with_errors, synchronizer=<pulp.server.api.synchronizers.YumSynchronizer object at 0x7ffbcbf2dc90>, skip_dict={}, max_speed=None, threads=None, progress_callback=<bound method RepoSyncTask.progress_callback of <pulp.server.api.repo_sync_task.RepoSyncTask object at 0x7ffbcbf2d290>>)
2011-10-25 10:38:58,042 21874:140718479759104: pulp.server.tasking.task:ERROR: task:469 document too large - BSON documents are limited to 4 MB
Traceback (most recent call last):
  File "/shared/repo/pulp/src/pulp/server/tasking/task.py", line 467, in _complete
    self.complete_callback(self)
  File "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/queue.py", line 333, in complete
    self.__storage.store_complete(task)
  File "/shared/repo/pulp/src/pulp/server/tasking/taskqueue/storage.py", line 244, in store_complete
    self.history_collection.insert(history)
  File "/shared/repo/pulp/src/pulp/server/db/connection.py", line 80, in retry
    return method(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 270, in insert
    check_keys, safe, kwargs), safe)
  File "/usr/lib64/python2.7/site-packages/pymongo/message.py", line 67, in insert
    bson_data = "".join([bson.BSON.encode(doc, check_keys) for doc in docs])
  File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 462, in encode
    return cls(_dict_to_bson(document, check_keys))
InvalidDocument: document too large - BSON documents are limited to 4 MB

Comment 2 John Matthews 2011-10-25 14:48:08 UTC
We are storing every error in the progress dictionary.
For 6000 packages that each are receiving a 403, we have 6000 dicts with almost identical tracebacks.

We could store only one unique error type, and link each item name which saw that error.

Desire is to maintain the fine grained error reporting if possible, while shrinking the size of the returned data structure.

Comment 3 Lukas Zapletal 2011-10-25 14:54:16 UTC
FYI I have used Fedora 14 ISO: Fedora-14-x86_64-DVD.iso

Unpacked it (or use fuseiso or mount loop), and synchronized it via localhost.

Not sure if all the errors above were caused by the fact it is so fast or syncing to ISO content. Tried regular F15 repo now and I only have one error, works better.

Comment 4 Lukas Zapletal 2011-10-25 15:09:44 UTC
Attaching full debug output of

grinder yum --label test -U http://localhost/pub/isos/fuse/Fedora-14-x86_64-DVD.iso/

Sample:

grinder.ParallelFetch: ERROR    Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/grinder/activeobject.py", line 447, in process
    retval = method(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/grinder/RepoFetch.py", line 117, in fetchItem
    verify_options=self.verify_options)

  File "/usr/lib/python2.7/site-packages/grinder/BaseFetch.py", line 301, in fetch
    checksum, headers, retryTimes, packages_location)

  File "/usr/lib/python2.7/site-packages/grinder/BaseFetch.py", line 301, in fetch
    checksum, headers, retryTimes, packages_location)

  File "/usr/lib/python2.7/site-packages/grinder/BaseFetch.py", line 244, in fetch
    curl.perform()

error: (6, 'Could not resolve host: Packages; Cannot allocate memory')

Comment 5 Lukas Zapletal 2011-10-25 15:11:17 UTC
Created attachment 530117 [details]
log file 1

Comment 6 Lukas Zapletal 2011-10-25 15:11:53 UTC
Created attachment 530118 [details]
log file 2

Comment 7 John Matthews 2011-10-26 15:46:07 UTC
Here's the grinder bug that relates to the problem seen here from the ISO 
https://bugzilla.redhat.com/show_bug.cgi?id=749211

Comment 8 Randy Barlow 2013-12-23 21:50:35 UTC
I don't believe this issue exists anymore, since we rewrote the Yum Importer, and no longer use Grinder.


Note You need to log in before you can comment on or make changes to this bug.