Bug 1410689

Summary:	Overflow sort stage buffered data usage during pulp mongo db migrate
Product:	Red Hat Satellite	Reporter:	jnikolak
Component:	Upgrades	Assignee:	satellite6-bugs <satellite6-bugs>
Status:	CLOSED NOTABUG	QA Contact:	Katello QA List <katello-qa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	Unspecified	CC:	bbuckingham, dkliban, inecas, jcallaha, jnikolak, mbacovsk, mhrivnak
Target Milestone:	Unspecified	Keywords:	Reopened, Triaged
Target Release:	Unused
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-02-08 00:41:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1410795

Description jnikolak 2017-01-06 05:43:29 UTC

Issue when upgrading from 6.1.11 to 6.2.5 or 6.2.6

When doing an upgrade

It fails from this message on.

Applying migration pulp.server.db.migrations.0016_remove_repo_content_unit_owner_type_and_id failed.

Halting migrations due to a migration failure.
database error: Runner error: Overflow sort stage buffered data usage of 33554584 bytes exceeds internal limit of 33554432 bytes
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 193, in main
    return _auto_manage_db(options)
  File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 256, in _auto_manage_db


Upun inspecting 
--> pulp.server.db.migrations.0016_remove_repo_content_unit_owner_type_and_id failed


This points to the repo_content index in mongo/pulp

db.repo_content_units.stats()
{
	"ns" : "pulp_database.repo_content_units",
	"count" : 1227598,
	"size" : 491247552,
	"avgObjSize" : 400,
	"storageSize" : 1164914688,
	"numExtents" : 18,
	"nindexes" : 0,
	"lastExtentSize" : 307515392,
	"paddingFactor" : 1,
	"systemFlags" : 0,
	"userFlags" : 0,
	"totalIndexSize" : 0,
	"indexSizes" : {
		
	},
	"ok" : 1
}

> db.repo_content_units.storageSize()
1164914688

I also noticed this error in db.

	"errmsg" : "BSONObj size: 50615673 (0x3045579) is invalid. Size must be between 0 and 16793600(16MB) First element: _?7\u0007W��V�G\r)x\u0003\u000f�\u0006updated: ObjectId('150000003230313a2d30383a')",
	"code" : 10334
}

Comment 1 Dennis Kliban 2017-01-06 14:15:21 UTC

I have not seen such a failure before. Could you please provide a mongo dump?

Comment 3 Michael Hrivnak 2017-01-06 15:15:06 UTC

Hopefully there is more to the traceback also. If you can paste in a full python traceback, that would be helpful.

Comment 7 jnikolak 2017-01-08 23:57:58 UTC

I have provided the foreman-debug and the db migrate that appears to be failing please let me know if this is enough information.

Comment 8 Ivan Necas 2017-01-09 07:08:03 UTC

I believe we also need the mondo dump generated by:

  mongodump --host localhost --out ~/mongo_dump

in order to be fully able to tell what the issue was

Comment 9 Dennis Kliban 2017-01-11 15:51:13 UTC

I restored the mongo dump and ran the migrations on my development environment. Migration 16 succeeded successfully. I could not reproduce the problem.

I looked up the error and it seems like it usually occurs when trying to sort a large number of documents in memory. The solution is to use an index. The failing migration in the bug report creates an index with all the needed fields before performing the rest of the operations. 

The customer should try upgrading again.

Comment 10 jnikolak 2017-01-11 22:56:47 UTC

Hello, 

Can you provide the exact steps that you executed where you could not reproduce the problem?

I just need a set of steps that I can send over to customer. The customer has already upgraded two times with failed error messages, are you suggestion to re-run another upgrade?

Comment 11 Dennis Kliban 2017-01-12 14:04:46 UTC

I restored the mongo dump that was uploaded. 
I ran migrations against the restored database.
Migrations finished without a problem.

The user needs to try upgrading again and capture Pulp logs during the upgrade. On RHEL 7, the logs for Pulp are stored in journaltcl. The following command outputs the logs to the screen as they are generated. 

sudo journalctl -f -l SYSLOG_IDENTIFIER=pulp | grep -v worker[\-,\.]heartbeat

I cannot provide any more information without a full traceback that is printed when the failure occurs.

Comment 12 jnikolak 2017-02-08 00:41:23 UTC

Hello thanks for your help, this bug can be closed.

The database is clean now after your assistance.