Bug 1007085 - oo-admin-move does not fail and rollback on rsync errors
Summary: oo-admin-move does not fail and rollback on rsync errors
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Dan McPherson
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-11 21:36 UTC by Matt Woodson
Modified: 2015-05-15 00:20 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-19 16:50:59 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Matt Woodson 2013-09-11 21:36:31 UTC
Description of problem:

during the oo-admin-move operations we are seeing rsync connection issues.  The script does NOT catch this and continues to move on.  This is causing apps to not be moved correctly or completely.

This is impacting live customer data

Version-Release number of selected component (if applicable):

openshift-origin-broker-util-1.13.11-1.el6oso.noarch


How reproducible:

Not Sure

Steps to Reproduce:
1.Not Sure
2.
3.


Additional info:

I have seen examples where the rsync connections error will cause a rollback.  There are many cases where it won't.

Here are examples of the move logs. Notice the "rsync errors" and that the script continues to process.

==============================================================================
Tue Sep 10 20:35:48 EDT 2013
URL: XXXXXX
Login: XXXXX
App UUID: 511a77bff2cb835c28001123
Gear UUID: 511a77bff2cb835c28001123
DEBUG: Source district uuid: d5cdcbf8c1af482594451573783958f5
DEBUG: Destination district uuid: 522e193ae0b8cd6380000001
DEBUG: Getting existing app 'sekharapps' status before moving
DEBUG: Gear component 'php-5.3' was stopped
DEBUG: Reserved uid '2701' on district: '522e193ae0b8cd6380000001'
DEBUG: Creating new account for gear 'XXXXXXXX' on ex-std-node256.prod.rhcloud.com
DEBUG: Moving content for app 'XXXXXXXX', gear 'XXXXXXXX' to ex-std-node256.prod.rhcloud.com
Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa)
ssh: connect to host 10.77.1.19 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
Agent pid 23646
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 23646 killed;
DEBUG: Moving system components for app 'XXXXXXXX', gear 'XXXXXXXX' to ex-std-node256.prod.rhcloud.com
Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa)
Agent pid 24733
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 24733 killed;
DEBUG: Fixing DNS and mongo for gear 'XXXXX' after move
DEBUG: Changing server identity of 'XXXXXX' from 'ex-std-node20.prod.rhcloud.com' to 'ex-std-node256.prod.rhcloud.com'
DEBUG: Deconfiguring old app 'XXXXXX' on ex-std-node20.prod.rhcloud.com after move
Successfully moved gear with uuid 'ed2039e4b1024195a3b3e44f7b362016' of app 'XXXXX' from 'ex-std-node20.prod.rhcloud.com' to 'ex-std-node256.prod.rhcloud.com'
==============================================================================

Tue Sep 10 19:51:59 EDT 2013
URL: http://XXXXXX.rhcloud.com
Login: XXXXXX
App UUID: 511a7957f2cb831848004c80
Gear UUID: 511a7957f2cb831848004c80
DEBUG: Source district uuid: d5cdcbf8c1af482594451573783958f5
DEBUG: Destination district uuid: 522e193ae0b8cd6380000001
DEBUG: Getting existing app 'XXXXXX' status before moving
DEBUG: Gear component 'jbossas-7' was stopped
DEBUG: Reserved uid '6285' on district: '522e193ae0b8cd6380000001'
DEBUG: Creating new account for gear 'XXXXXX' on ex-std-node256.prod.rhcloud.com
DEBUG: Moving content for app 'XXXXXX', gear 'XXXXXX' to ex-std-node256.prod.rhcloud.com
Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa)
ssh: connect to host 10.77.1.19 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
Agent pid 27318
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 27318 killed;
DEBUG: Moving system components for app 'XXXXXX', gear 'XXXXXX' to ex-std-node256.prod.rhcloud.com
Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa)
Agent pid 28597
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 28597 killed;
DEBUG: Fixing DNS and mongo for gear 'XXXXXX' after move
DEBUG: Changing server identity of 'XXXXXX' from 'ex-std-node20.prod.rhcloud.com' to 'ex-std-node256.prod.rhcloud.com'
DEBUG: Deconfiguring old app 'XXXXXX' on ex-std-node20.prod.rhcloud.com after move
Successfully moved gear with uuid 'bf4c984494454e53bef5783e52075579' of app 'XXXXXX' from 'ex-std-node20.prod.rhcloud.com' to 'ex-std-node256.prod.rhcloud.com'

Comment 1 Dan McPherson 2013-09-12 13:41:19 UTC
https://github.com/openshift/origin-server/pull/3626

Comment 3 zhaozhanqi 2013-09-13 08:37:15 UTC
Tested this bug on devenv_3780, it has been fixed

1) remove the directory on destination node 
2) move the gear

[root@ip-10-154-184-93 lib]# oo-admin-move --gear_uuid 523286b7bef23b6cea000007 -i ip-10-184-6-242
URL: http://zqphp-zqd.dev.rhcloud.com
Login: zzhao
App UUID: 523286b7bef23b6cea000007
Gear UUID: 523286b7bef23b6cea000007
DEBUG: Source district uuid: c0a525681c2411e3aad322000a9ab85d
DEBUG: Destination district uuid: NONE
DEBUG: Getting existing app 'zqphp' status before moving
DEBUG: Gear component 'php-5.3' was running
DEBUG: Stopping existing app cartridge 'php-5.3' before moving
DEBUG: Force stopping existing app cartridge 'php-5.3' before moving
DEBUG: Reserved uid '' on district: 'NONE'
DEBUG: Creating new account for gear 'zqphp' on ip-10-184-6-242
DEBUG: Moving failed.  Rolling back gear 'zqphp' in 'zqphp' with delete on 'ip-10-184-6-242'
Node execution failure (invalid exit code from node).

it quit directly when rsync error.do not show the log: "DEBUG: Moving system components for app"

Comment 4 zhaozhanqi 2013-09-13 08:38:18 UTC
Tested this bug on devenv_3780, it has been fixed

1) remove the directory on destination node 
2) move the gear

[root@ip-10-154-184-93 lib]# oo-admin-move --gear_uuid 523286b7bef23b6cea000007 -i ip-10-184-6-242
URL: http://zqphp-zqd.dev.rhcloud.com
Login: zzhao
App UUID: 523286b7bef23b6cea000007
Gear UUID: 523286b7bef23b6cea000007
DEBUG: Source district uuid: c0a525681c2411e3aad322000a9ab85d
DEBUG: Destination district uuid: NONE
DEBUG: Getting existing app 'zqphp' status before moving
DEBUG: Gear component 'php-5.3' was running
DEBUG: Stopping existing app cartridge 'php-5.3' before moving
DEBUG: Force stopping existing app cartridge 'php-5.3' before moving
DEBUG: Reserved uid '' on district: 'NONE'
DEBUG: Creating new account for gear 'zqphp' on ip-10-184-6-242
DEBUG: Moving failed.  Rolling back gear 'zqphp' in 'zqphp' with delete on 'ip-10-184-6-242'
Node execution failure (invalid exit code from node).

it quit directly when rsync error.do not show the log: "DEBUG: Moving system components for app"


Note You need to log in before you can comment on or make changes to this bug.