Hide Forgot
Description of problem: In PROD, oo-admin-clear-pending-ops runs for 78m34.811s, cleans up a bunch of stuff, but then dies with this error: /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:210:in `get_more': The operation: "GET MORE" (Moped::Errors::CursorNotFound) failed with error "cursor 8769743155734407903 not found" from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:44:in `get_more' from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:29:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:132:in `block in each' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:556:in `selecting' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:131:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `each' from /usr/sbin/oo-admin-clear-pending-ops:177:in `<main>' This MUST be fixed before we can go to PROD with 2.0.32 because the mongo migrations require clear pending ops to be Version-Release number of selected component (if applicable): openshift-origin-broker-util-1.12.10-1.el6oso.noarch How reproducible: unknown, found in PROD Steps to Reproduce: 1. Unknown, but I suspect you just have to create a situation where oo-admin-clear-pending-ops holds the cursor open for a long time Actual results: The error above Expected results: No error (should continue to clean up operations)
If it is a cursor timeout issue, some more runs of the script will take care of that. Sten helping with trying that out. Second suspect would be a particular application that gets stuck and comes back after a long time (e.g. SIGABRT by mcoll) by when a cursor timeout follows.
Repeated runs of the script ended up clearing the backlog and successive runs are successful. Pushing this out based on this.
We're still getting this error all the time in PROD. I added a 'time' call to this so we can see how long it's running before we get the error. As you can see from below, it only ran for 11 minutes before we got this error, so it's not the cursor timeout stuff. I think this should be a higher severity as we're not getting good runs of clear-pending-ops in PROD. /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:210:in `get_more': The operation: "GET MORE" (Moped::Errors::CursorNotFound) failed with error "cursor 8134835790363728193 not found" from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:44:in `get_more' from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cursor.rb:29:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:77:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:132:in `block in each' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:556:in `selecting' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:131:in `each' from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `each' from /usr/sbin/oo-admin-clear-pending-ops:177:in `<main>' real 11m39.526s user 0m33.960s sys 0m13.129s
Is this still an issue?
clear pending ops has been running really well in PROD recently. I haven't seen this error in over a month now, so it's likely gone.