Hide Forgot
Description of problem: during the oo-admin-move operations we are seeing rsync connection issues. The script does NOT catch this and continues to move on. This is causing apps to not be moved correctly or completely. This is impacting live customer data Version-Release number of selected component (if applicable): openshift-origin-broker-util-1.13.11-1.el6oso.noarch How reproducible: Not Sure Steps to Reproduce: 1.Not Sure 2. 3. Additional info: I have seen examples where the rsync connections error will cause a rollback. There are many cases where it won't. Here are examples of the move logs. Notice the "rsync errors" and that the script continues to process. ============================================================================== Tue Sep 10 20:35:48 EDT 2013 URL: XXXXXX Login: XXXXX App UUID: 511a77bff2cb835c28001123 Gear UUID: 511a77bff2cb835c28001123 DEBUG: Source district uuid: d5cdcbf8c1af482594451573783958f5 DEBUG: Destination district uuid: 522e193ae0b8cd6380000001 DEBUG: Getting existing app 'sekharapps' status before moving DEBUG: Gear component 'php-5.3' was stopped DEBUG: Reserved uid '2701' on district: '522e193ae0b8cd6380000001' DEBUG: Creating new account for gear 'XXXXXXXX' on ex-std-node256.prod.rhcloud.com DEBUG: Moving content for app 'XXXXXXXX', gear 'XXXXXXXX' to ex-std-node256.prod.rhcloud.com Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa) ssh: connect to host 10.77.1.19 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] Agent pid 23646 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 23646 killed; DEBUG: Moving system components for app 'XXXXXXXX', gear 'XXXXXXXX' to ex-std-node256.prod.rhcloud.com Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa) Agent pid 24733 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 24733 killed; DEBUG: Fixing DNS and mongo for gear 'XXXXX' after move DEBUG: Changing server identity of 'XXXXXX' from 'ex-std-node20.prod.rhcloud.com' to 'ex-std-node256.prod.rhcloud.com' DEBUG: Deconfiguring old app 'XXXXXX' on ex-std-node20.prod.rhcloud.com after move Successfully moved gear with uuid 'ed2039e4b1024195a3b3e44f7b362016' of app 'XXXXX' from 'ex-std-node20.prod.rhcloud.com' to 'ex-std-node256.prod.rhcloud.com' ============================================================================== Tue Sep 10 19:51:59 EDT 2013 URL: http://XXXXXX.rhcloud.com Login: XXXXXX App UUID: 511a7957f2cb831848004c80 Gear UUID: 511a7957f2cb831848004c80 DEBUG: Source district uuid: d5cdcbf8c1af482594451573783958f5 DEBUG: Destination district uuid: 522e193ae0b8cd6380000001 DEBUG: Getting existing app 'XXXXXX' status before moving DEBUG: Gear component 'jbossas-7' was stopped DEBUG: Reserved uid '6285' on district: '522e193ae0b8cd6380000001' DEBUG: Creating new account for gear 'XXXXXX' on ex-std-node256.prod.rhcloud.com DEBUG: Moving content for app 'XXXXXX', gear 'XXXXXX' to ex-std-node256.prod.rhcloud.com Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa) ssh: connect to host 10.77.1.19 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6] Agent pid 27318 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 27318 killed; DEBUG: Moving system components for app 'XXXXXX', gear 'XXXXXX' to ex-std-node256.prod.rhcloud.com Identity added: /var/www/openshift/broker/config/keys/rsync_id_rsa (/var/www/openshift/broker/config/keys/rsync_id_rsa) Agent pid 28597 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 28597 killed; DEBUG: Fixing DNS and mongo for gear 'XXXXXX' after move DEBUG: Changing server identity of 'XXXXXX' from 'ex-std-node20.prod.rhcloud.com' to 'ex-std-node256.prod.rhcloud.com' DEBUG: Deconfiguring old app 'XXXXXX' on ex-std-node20.prod.rhcloud.com after move Successfully moved gear with uuid 'bf4c984494454e53bef5783e52075579' of app 'XXXXXX' from 'ex-std-node20.prod.rhcloud.com' to 'ex-std-node256.prod.rhcloud.com'
https://github.com/openshift/origin-server/pull/3626
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/3532e6b8b5b78af582e5046ba14711301188b532 Bug 1007085
Tested this bug on devenv_3780, it has been fixed 1) remove the directory on destination node 2) move the gear [root@ip-10-154-184-93 lib]# oo-admin-move --gear_uuid 523286b7bef23b6cea000007 -i ip-10-184-6-242 URL: http://zqphp-zqd.dev.rhcloud.com Login: zzhao App UUID: 523286b7bef23b6cea000007 Gear UUID: 523286b7bef23b6cea000007 DEBUG: Source district uuid: c0a525681c2411e3aad322000a9ab85d DEBUG: Destination district uuid: NONE DEBUG: Getting existing app 'zqphp' status before moving DEBUG: Gear component 'php-5.3' was running DEBUG: Stopping existing app cartridge 'php-5.3' before moving DEBUG: Force stopping existing app cartridge 'php-5.3' before moving DEBUG: Reserved uid '' on district: 'NONE' DEBUG: Creating new account for gear 'zqphp' on ip-10-184-6-242 DEBUG: Moving failed. Rolling back gear 'zqphp' in 'zqphp' with delete on 'ip-10-184-6-242' Node execution failure (invalid exit code from node). it quit directly when rsync error.do not show the log: "DEBUG: Moving system components for app"