Bug 1167950 - Cannot sync puppet-chef rpms from packagecloud.io. BSON document too large (16968021 bytes)
Summary: Cannot sync puppet-chef rpms from packagecloud.io. BSON document too large (1...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Pulp
Classification: Retired
Component: async/tasks
Version: Master
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: pulp-bugs
QA Contact: pulp-qe-list
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-25 17:04 UTC by Brian Bouterse
Modified: 2014-11-25 22:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-25 22:10:53 UTC
Embargoed:


Attachments (Terms of Use)

Description Brian Bouterse 2014-11-25 17:04:13 UTC
Originally reported by Paul Gonin (paul.gonin), but I was also able to reproduce it using:

1. pulp-admin -u admin -p admin rpm repo create --repo-id private-chef --relative-url private-chef --feed https://packagecloud.io/chef/stable/el/6/x86_64/

2. pulp-admin rpm repo sync run --repo-id private-chef

3. Observe the traceback in the log below.


Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808) Content unit association failed [Unit [key={'name': 'pri
vate-chef', 'checksum': '2d068c98c36c4eb548773efc616fec5c36bd2bdd', 'epoch': '0', 'version': '11.2.3', 'release': '1.el6', 'arch': 'x86_64', 'checksumtype': 'sh
a'}] [type=rpm] [id=None]]
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808) Traceback (most recent call last):
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/plugin
s/conduits/mixins.py", line 486, in save_unit
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     unit.id = self._update_unit(unit, pulp_unit)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/plugin
s/conduits/mixins.py", line 518, in _update_unit
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     return self._add_unit(unit, pulp_unit)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/plugin
s/conduits/mixins.py", line 540, in _add_unit
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     unit_id = content_manager.add_content_unit(unit.type
_id, None, pulp_unit)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/server
/managers/content/cud.py", line 35, in add_content_unit
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     collection.insert(unit_doc, safe=True)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/server
/db/connection.py", line 154, in retry
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     return method(*args, **kwargs)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/usr/lib64/python2.7/site-packages/pymongo/colle
ction.py", line 357, in insert
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     continue_on_error, self.__uuid_subtype), safe)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/usr/lib64/python2.7/site-packages/pymongo/mongo
_client.py", line 910, in _send_message
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     (request_id, data) = self.__check_bson_size(message)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/usr/lib64/python2.7/site-packages/pymongo/mongo
_client.py", line 881, in __check_bson_size
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     (max_doc_size, self.__max_bson_size))
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808) InvalidDocument: BSON document too large (16967486 bytes
) - the connected server supports BSON document sizes up to 16777216 bytes

Expected results: it would work and there would be no traceback

Comment 1 Michael Hrivnak 2014-11-25 22:10:53 UTC
I downloaded private-chef-11.2.2-1.el6.x86_64.rpm from that repo. The RPM alone is 551MB. It contains 79,292 individual files. That is a LOT and is definitely not a normal use case for a single RPM.

The filelist alone is about 7.9MB of XML. Pulp stores the file list (and all other metadata) in two copies: xml, and native BSON structures. So that's about 15.8MB of just file listings, which is getting real close to the BSON document max size.

Here is mongo's reasoning for their limit: http://docs.mongodb.org/manual/reference/limits/#BSON-Document-Size

For now, the effort it would take for pulp to accommodate storing such large metadata would be substantial, with little payoff. While we want to support your use case, and we recognize it is important to you, it currently falls in the small set of outliers that aren't worth investing in as a project.


Note You need to log in before you can comment on or make changes to this bug.