Bug 999557 - oo-admin-clear-pending-ops dies in PROD with Moped::Errors::CursorNotFound
Summary: oo-admin-clear-pending-ops dies in PROD with Moped::Errors::CursorNotFound
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Abhishek Gupta
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-21 14:41 UTC by Thomas Wiest
Modified: 2015-05-15 00:20 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-11 18:06:10 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Thomas Wiest 2013-08-21 14:41:39 UTC
Description of problem:
In PROD, oo-admin-clear-pending-ops runs for 78m34.811s, cleans up a bunch of stuff, but then dies with this error:

/opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:210:in `get_more': The operation: "GET MORE" (Moped::Errors::CursorNotFound)
failed with error "cursor 8769743155734407903 not found"
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:44:in `get_more'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:29:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:132:in `block in each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:556:in `selecting'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:131:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `each'
	from /usr/sbin/oo-admin-clear-pending-ops:177:in `<main>'

This MUST be fixed before we can go to PROD with 2.0.32 because the mongo migrations require clear pending ops to be 

Version-Release number of selected component (if applicable):
openshift-origin-broker-util-1.12.10-1.el6oso.noarch


How reproducible:
unknown, found in PROD


Steps to Reproduce:
1. Unknown, but I suspect you just have to create a situation where oo-admin-clear-pending-ops holds the cursor open for a long time


Actual results:
The error above


Expected results:
No error (should continue to clean up operations)

Comment 1 Rajat Chopra 2013-08-21 16:33:39 UTC
If it is a cursor timeout issue, some more runs of the script will take care of that. Sten helping with trying that out.

Second suspect would be a particular application that gets stuck and comes back after a long time (e.g. SIGABRT by mcoll) by when a cursor timeout follows.

Comment 2 Abhishek Gupta 2013-08-21 22:00:51 UTC
Repeated runs of the script ended up clearing the backlog and successive runs are successful. Pushing this out based on this.

Comment 3 Thomas Wiest 2013-09-13 14:14:33 UTC
We're still getting this error all the time in PROD.

I added a 'time' call to this so we can see how long it's running before we get the error. As you can see from below, it only ran for 11 minutes before we got this error, so it's not the cursor timeout stuff.

I think this should be a higher severity as we're not getting good runs of clear-pending-ops in PROD.


/opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:210:in `get_more': The operation: "GET MORE" (Moped::Errors::CursorNotFound)
failed with error "cursor 8134835790363728193 not found"
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:44:in `get_more'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:29:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:132:in `block in each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:556:in `selecting'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:131:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `each'
	from /usr/sbin/oo-admin-clear-pending-ops:177:in `<main>'

real	11m39.526s
user	0m33.960s
sys	0m13.129s

Comment 4 Dan McPherson 2013-11-11 16:39:32 UTC
Is this still an issue?

Comment 5 Thomas Wiest 2013-11-11 16:57:46 UTC
clear pending ops has been running really well in PROD recently.

I haven't seen this error in over a month now, so it's likely gone.


Note You need to log in before you can comment on or make changes to this bug.