Bug 999557

Summary: oo-admin-clear-pending-ops dies in PROD with Moped::Errors::CursorNotFound
Product: OpenShift Online Reporter: Thomas Wiest <twiest>
Component: PodAssignee: Abhishek Gupta <abhgupta>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: low Docs Contact:
Priority: unspecified    
Version: 2.xCC: bhatiam, dmcphers, rchopra, sten
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-11 18:06:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Thomas Wiest 2013-08-21 14:41:39 UTC
Description of problem:
In PROD, oo-admin-clear-pending-ops runs for 78m34.811s, cleans up a bunch of stuff, but then dies with this error:

/opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:210:in `get_more': The operation: "GET MORE" (Moped::Errors::CursorNotFound)
failed with error "cursor 8769743155734407903 not found"
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:44:in `get_more'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:29:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:132:in `block in each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:556:in `selecting'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:131:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `each'
	from /usr/sbin/oo-admin-clear-pending-ops:177:in `<main>'

This MUST be fixed before we can go to PROD with 2.0.32 because the mongo migrations require clear pending ops to be 

Version-Release number of selected component (if applicable):
openshift-origin-broker-util-1.12.10-1.el6oso.noarch


How reproducible:
unknown, found in PROD


Steps to Reproduce:
1. Unknown, but I suspect you just have to create a situation where oo-admin-clear-pending-ops holds the cursor open for a long time


Actual results:
The error above


Expected results:
No error (should continue to clean up operations)

Comment 1 Rajat Chopra 2013-08-21 16:33:39 UTC
If it is a cursor timeout issue, some more runs of the script will take care of that. Sten helping with trying that out.

Second suspect would be a particular application that gets stuck and comes back after a long time (e.g. SIGABRT by mcoll) by when a cursor timeout follows.

Comment 2 Abhishek Gupta 2013-08-21 22:00:51 UTC
Repeated runs of the script ended up clearing the backlog and successive runs are successful. Pushing this out based on this.

Comment 3 Thomas Wiest 2013-09-13 14:14:33 UTC
We're still getting this error all the time in PROD.

I added a 'time' call to this so we can see how long it's running before we get the error. As you can see from below, it only ran for 11 minutes before we got this error, so it's not the cursor timeout stuff.

I think this should be a higher severity as we're not getting good runs of clear-pending-ops in PROD.


/opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:210:in `get_more': The operation: "GET MORE" (Moped::Errors::CursorNotFound)
failed with error "cursor 8134835790363728193 not found"
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:44:in `get_more'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:29:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:132:in `block in each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:556:in `selecting'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:131:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `each'
	from /usr/sbin/oo-admin-clear-pending-ops:177:in `<main>'

real	11m39.526s
user	0m33.960s
sys	0m13.129s

Comment 4 Dan McPherson 2013-11-11 16:39:32 UTC
Is this still an issue?

Comment 5 Thomas Wiest 2013-11-11 16:57:46 UTC
clear pending ops has been running really well in PROD recently.

I haven't seen this error in over a month now, so it's likely gone.