Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1167950

Summary: Cannot sync puppet-chef rpms from packagecloud.io. BSON document too large (16968021 bytes)
Product: [Retired] Pulp Reporter: Brian Bouterse <bmbouter>
Component: async/tasksAssignee: pulp-bugs
Status: CLOSED WONTFIX QA Contact: pulp-qe-list
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: MasterCC: mhrivnak
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-25 22:10:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brian Bouterse 2014-11-25 17:04:13 UTC
Originally reported by Paul Gonin (paul.gonin), but I was also able to reproduce it using:

1. pulp-admin -u admin -p admin rpm repo create --repo-id private-chef --relative-url private-chef --feed https://packagecloud.io/chef/stable/el/6/x86_64/

2. pulp-admin rpm repo sync run --repo-id private-chef

3. Observe the traceback in the log below.


Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808) Content unit association failed [Unit [key={'name': 'pri
vate-chef', 'checksum': '2d068c98c36c4eb548773efc616fec5c36bd2bdd', 'epoch': '0', 'version': '11.2.3', 'release': '1.el6', 'arch': 'x86_64', 'checksumtype': 'sh
a'}] [type=rpm] [id=None]]
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808) Traceback (most recent call last):
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/plugin
s/conduits/mixins.py", line 486, in save_unit
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     unit.id = self._update_unit(unit, pulp_unit)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/plugin
s/conduits/mixins.py", line 518, in _update_unit
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     return self._add_unit(unit, pulp_unit)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/plugin
s/conduits/mixins.py", line 540, in _add_unit
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     unit_id = content_manager.add_content_unit(unit.type
_id, None, pulp_unit)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/server
/managers/content/cud.py", line 35, in add_content_unit
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     collection.insert(unit_doc, safe=True)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/home/bmbouter/Documents/pulp/server/pulp/server
/db/connection.py", line 154, in retry
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     return method(*args, **kwargs)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/usr/lib64/python2.7/site-packages/pymongo/colle
ction.py", line 357, in insert
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     continue_on_error, self.__uuid_subtype), safe)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/usr/lib64/python2.7/site-packages/pymongo/mongo
_client.py", line 910, in _send_message
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     (request_id, data) = self.__check_bson_size(message)
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)   File "/usr/lib64/python2.7/site-packages/pymongo/mongo
_client.py", line 881, in __check_bson_size
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808)     (max_doc_size, self.__max_bson_size))
Nov 25 11:23:16 dhcp129-138.rdu.redhat.com pulp[9821]: pulp.plugins.conduits.mixins:ERROR: (9821-43808) InvalidDocument: BSON document too large (16967486 bytes
) - the connected server supports BSON document sizes up to 16777216 bytes

Expected results: it would work and there would be no traceback

Comment 1 Michael Hrivnak 2014-11-25 22:10:53 UTC
I downloaded private-chef-11.2.2-1.el6.x86_64.rpm from that repo. The RPM alone is 551MB. It contains 79,292 individual files. That is a LOT and is definitely not a normal use case for a single RPM.

The filelist alone is about 7.9MB of XML. Pulp stores the file list (and all other metadata) in two copies: xml, and native BSON structures. So that's about 15.8MB of just file listings, which is getting real close to the BSON document max size.

Here is mongo's reasoning for their limit: http://docs.mongodb.org/manual/reference/limits/#BSON-Document-Size

For now, the effort it would take for pulp to accommodate storing such large metadata would be substantial, with little payoff. While we want to support your use case, and we recognize it is important to you, it currently falls in the small set of outliers that aren't worth investing in as a project.