Bug 960148 - [oo-admin-move] Gear failed move because of a mongo timeout
Summary: [oo-admin-move] Gear failed move because of a mongo timeout
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
: ---
Assignee: Abhishek Gupta
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-06 15:01 UTC by Kenny Woodson
Modified: 2015-05-15 00:16 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-20 18:05:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Kenny Woodson 2013-05-06 15:01:04 UTC
Description of problem:

A gear failed to move due to a time out in mongo.  This left the gear in a bad state as DNS was still pointed at the new server.

Fri May  3 15:30:41 EDT 2013
URL: http://project-liveC91f6895f9ef.rhcloud.com
Login: liveC91f6895f9ef
App UUID: 515792ec4382ec6f120000f3
Gear UUID: 515792ec4382ec6f120000f3
DEBUG: Source district uuid: cc37a161477b4ca2a68b331ac138c4ba
DEBUG: Destination district uuid: cc37a161477b4ca2a68b331ac138c4ba
DEBUG: Getting existing app 'project' status before moving
DEBUG: Gear component 'diy-0.1' was stopped
DEBUG: Creating new account for gear 'project' on ex-c9-node24.prod.rhcloud.com
DEBUG: Moving content for app 'project', gear 'project' to ex-c9-node24.prod.rhcloud.com
Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa)
Write failed: Broken pipe
Agent pid 14357
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 14357 killed;
DEBUG: Moving system components for app 'project', gear 'project' to ex-c9-node24.prod.rhcloud.com
Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa)
Agent pid 30361
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 30361 killed;
DEBUG: Fixing DNS and mongo for gear 'project' after move
DEBUG: Changing server identity of 'project' from 'ex-c9-node23.prod.rhcloud.com' to 'ex-c9-node24.prod.rhcloud.com'
DEBUG: Moving failed.  Rolling back gear 'project' 'project' with remove-httpd-proxy on 'ex-c9-node24.prod.rhcloud.com'
DEBUG: Moving failed.  Rolling back gear 'project' in 'project' with destroy on 'ex-c9-node24.prod.rhcloud.com'
/opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/sockets/connectable.rb:45:in `read': Connection timed out (Errno::ETIMEDOUT)
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/sockets/connectable.rb:45:in `block in read'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/sockets/connectable.rb:78:in `handle_socket_errors'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/sockets/connectable.rb:45:in `read'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/connection.rb:177:in `read_data'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/connection.rb:99:in `block in read'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/connection.rb:202:in `with_connection'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/connection.rb:97:in `read'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/protocol/query.rb:148:in `receive_replies'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/connection.rb:135:in `block in receive_replies'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/connection.rb:134:in `map'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/connection.rb:134:in `receive_replies'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:561:in `block (2 levels) in flush'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:129:in `ensure_connected'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:559:in `block in flush'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:574:in `logging'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:558:in `flush'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:547:in `process'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:71:in `command'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/node.rb:400:in `refresh'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cluster.rb:168:in `block in refresh'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cluster.rb:181:in `each'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cluster.rb:181:in `refresh'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cluster.rb:134:in `nodes'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/cluster.rb:202:in `with_primary'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/session/context.rb:108:in `with_node'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/session/context.rb:50:in `command'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/database.rb:76:in `command'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/session.rb:78:in `command'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:239:in `block in modify'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/session.rb:312:in `with'
	from /opt/rh/ruby193/root/usr/share/gems/gems/moped-1.3.2/lib/moped/query.rb:238:in `modify'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/find_and_modify.rb:44:in `result'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual/mongo.rb:185:in `find_and_modify'
	from /opt/rh/ruby193/root/usr/share/gems/gems/mongoid-3.0.21/lib/mongoid/contextual.rb:18:in `find_and_modify'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.7.11/app/models/lock.rb:127:in `unlock_application'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.7.11/app/models/application.rb:1148:in `ensure in run_in_application_lock'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-controller-1.7.11/app/models/application.rb:1148:in `run_in_application_lock'
	from /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-msg-broker-mcollective-1.7.5/lib/openshift/mcollective_application_container_proxy.rb:1696:in `move_gear_secure'
	from /usr/sbin/oo-admin-move:112:in `<main>'


Version-Release number of selected component (if applicable):

Current.

How reproducible:

This would be difficult to get the exact timing correct but I believe it is reproducible.

Steps to Reproduce:
1.  Create an application.
2.  Move the application.
3.  During the move, stop the mongo database.
4.  Verify the application cleans itself up properly or flags itself as in a bad state.
  
Actual results:

The application is in a bad state without any ability to recover without manual intervention.

Expected results:

Should attempt a retry, a sleep, or some sort of a back off algorithm in order to leave the application in a working state.

Additional info:

Rerunning the move fixed this application immediately.

Comment 1 Abhishek Gupta 2013-05-06 19:06:01 UTC
Lowering severity since re-running the oo-admin-move fixed the issue.

Comment 2 Abhishek Gupta 2013-05-15 19:05:02 UTC
This happened because the connection to mongo failed. Not sure what can be done in such cases. Marking this as not-reproducible for now. If this happens more, often will dig deeper at that time.

Comment 3 zhaozhanqi 2013-05-17 11:28:35 UTC
if the mongoDB have shutdown during the move, this issue will be reproduced. 
this situation is a very small degree,please developer help check if need to fix this issue or close it.thx

Comment 4 Abhishek Gupta 2013-05-20 18:05:21 UTC
I don't think there is anything that can be done about this outside of manual intervention. The gear move does throw an error and Admin/Ops will be required to investigate and fix.


Note You need to log in before you can comment on or make changes to this bug.