Description of problem: After 'move cartridge'. Do 'snapshot save', the snaphost can't be restored, it will failed with 'error: 422 Unprocessable Entity.' It seems a temp deployment timestamp directory is created during cartridge moving. The 'current' doesn't link to the last deployment. After delete the last deployment or update the deployment by 'git push' , Snapshot save/restore work fine. Version-Release number of selected component (if applicable): puddle-2-0-2-2014-01-16 How reproducible: Always Steps to Reproduce: 1.Create an app. eg: php1. 2.Move cartridge to another node 3.Don't 'git push' after move 4.Check app-deployment 5.rhc snapshot save php1 6.rhc snapshot restore php1 Actual results: Step 4: The current deployment is the last one. drwxr-x---. 5 52df298a4945dc0dbd0003ea 52df298a4945dc0dbd0003ea 4.0K Jan 21 21:41 2014-01-21_21-33-18.136 drwxr-x---. 5 52df298a4945dc0dbd0003ea 52df298a4945dc0dbd0003ea 4.0K Jan 21 21:41 2014-01-21_21-35-34.236 drwxr-x---. 5 52df298a4945dc0dbd0003ea 52df298a4945dc0dbd0003ea 4.0K Jan 21 21:48 2014-01-21_21-48-32.227 drwxr-xr-x. 2 52df298a4945dc0dbd0003ea 52df298a4945dc0dbd0003ea 4.0K Jan 21 21:41 by-id lrwxrwxrwx. 1 52df298a4945dc0dbd0003ea 52df298a4945dc0dbd0003ea 23 Jan 21 21:41 current -> 2014-01-21_21-35-34.236 Step 6:The snapshot save failed with error message below: [ose215@dhcp-9-237 ~]$ rhc snapshot restore t1 Restoring from snapshot t1.tar.gz... Removing old git repo: ~/git/t1.git/ Removing old data dir: ~/app-root/data/* Restoring ~/git/t1.git and ~/app-root/data httpd: Could not reliably determine the server's fully qualified domain name, using nd216.oseanli.cn for ServerName error: 422 Unprocessable Entity. Use --trace to view backtrace Error in trying to restore snapshot. You can try to restore manually by running: cat 't1.tar.gz' | ssh 52df298a4945dc0dbd0003ea.cn 'restore INCLUDE_GIT Expected results: 1. The temp deployment timestamp be hanlded correctly. 2. The snapshot can be save and restore. Additional info: The bug can be reproduced in Online.
The bug can't be reproduced in Online devenv-stage_655. It seems other bug fix also affect this problem.
Althoug the bug can't be reproduced in Online devenv-stage_655, there is another low bug with this reproduce step in Online. it is https://bugzilla.redhat.com/show_bug.cgi?id=1057976.
From what I see the error is occurring during deployment verification: My broken gear has two deployments after following the steps in the description: 2014-01-28_13-41-48.285 2014-01-28_13-43-32.100 The first one has the following in the metadata.json: {"git_ref":"master","git_sha1":"b15ec05","id":"b70979a6","hot_deploy":null,"force_clean_build":null,"activations":[1390934519.0726473,1390934722.984616],"checksum":"82eb881b7b17e68e57168111d3bb6369ee1a04e3"} The second has: {"git_ref":"master","git_sha1":null,"id":null,"hot_deploy":null,"force_clean_build":null,"activations":[],"checksum":null} I haven't been able to reproduce this upstream. The interesting thing is when I try to reproduce this upstream the second deployment doesn't even have a metadata.json.
This is a mess. Upstream had a number of changes that masked this problem. I backported the following commits and now it's working like Online/Origin: commit 2a7ca5491b59bbcbbaa7504cd0c383215b28465a Author: Paul Morie <pmorie> Date: Mon Jan 27 10:26:16 2014 -0500 Fix bug 1055653 for cases when httpd is down commit 19e2995306bff7bea037823675f5cf279bafe880 Author: Paul Morie <pmorie> Date: Tue Jan 21 16:05:29 2014 -0500 Fix bug 1055653 and improve post-receive output readability commit 836bb408aa7fff6a7605fedf55fc0294a771e9b6 Author: Ben Parees <bparees> Date: Tue Dec 17 00:11:52 2013 -0500 Bug 1033523 - The hot_deploy marker/--hot-deploy option can not take effect when deploying app with binary deployment commit a568bd63147b15e71ba41734421340eeee8b2b99 Author: jhadvig <jhadvig> Date: Wed Dec 4 17:11:30 2013 +0100 Bug 1038129 - Gear is not started after restore when hot_deploy marker is present commit a96ef04aa5a69db4e3d92c4bd6a6f4324bf50bcc Author: Jhon Honce <jhonce> Date: Thu Jan 16 12:19:29 2014 -0700 Bug 1054403 - Reset empty metadata.json file * Use defaults if file is empty However, there is still a problem. Even in Online a blank deployment is being created on application move. When the node tries registering it with the Broker it results in a 422 Unprocessible entity. Upstream recently added logic that ignores this error and continues. I suspect the root cause is that this blank deployment should never be created.
You can disregard commits Comment #6. Those fixes were cloned as separate bugs. The blank deployment dir bug is was solved upstream in the following commits: commit b0add52171ce19bb66f1f644940656e511355cc8 Author: Brenton Leanhardt <bleanhar> Date: Mon Feb 3 13:30:07 2014 -0500 Insure --with-initial-deployment-dir defaults to true in case the args isn't supplied. This is the handle the case of an out of date broker that doesn't pass the argument. We can't have the default behavior change. commit 5b6c8f1e177d98c0d1c52e6a76d57aaf9d2021b0 Author: Brenton Leanhardt <bleanhar> Date: Mon Feb 3 13:05:18 2014 -0500 --with-initial-deployment-dir only applies to gear creation commit 9f1e9e744a236befe66ab01bf69c1527c61d5dd0 Author: Brenton Leanhardt <bleanhar> Date: Wed Jan 29 16:08:18 2014 -0500 Fixing libvirt_container to match the new create semantics commit 26e5ad2e66670ac77d0621975562119911a0a120 Author: Brenton Leanhardt <bleanhar> Date: Wed Jan 29 15:22:55 2014 -0500 Adding a unit test commit d350fb02e3323fdf10e28db8f5c29dd8b90a6747 Author: Brenton Leanhardt <bleanhar> Date: Wed Jan 29 10:17:15 2014 -0500 First pass at avoiding deployment dir create on app moves
In addition to Comment #7 the following upstream commit was required for the backport: commit baeec29c1a7db0b07bf77354f5b02e35790f6156 Author: Jhon Honce <jhonce> Date: Wed Jan 22 15:08:19 2014 -0700 Node Platform - Optionally generate application key
https://github.com/openshift/enterprise-server/pull/228
Looks like the "fixed in version" field has a limit: rubygem-openshift-origin-container-selinux-0.4.1.2-1.el6op rubygem-openshift-origin-msg-broker-mcollective-1.17.6-1.el6op openshift-origin-msg-node-mcollective-1.17.6-1.el6op rubygem-openshift-origin-node-1.17.5.10-1.el6op rubygem-openshift-origin-controller-1.17.12.3-1.el6op
Verified and pass on puddle puddle: 2014-02-10. The result is as below now: 1. The snapshot can be saved/restored after move: [ose215@dhcp-9-237 ~]$ rhc snapshot save sruby1 Pulling down a snapshot to sruby1.tar.gz... Creating and sending tar.gz RESULT: Success [ose215@dhcp-9-237 ~]$ rhc snapshot restore sruby1 Restoring from snapshot sruby1.tar.gz... Removing old git repo: ~/git/sruby1.git/ Removing old data dir: ~/app-root/data/* Restoring ~/git/sruby1.git and ~/app-root/data httpd: Could not reliably determine the server's fully qualified domain name, using nd216.oseanli.cn for ServerName Activation status: success RESULT: Success 2. No tempory deployment version after restore. [sruby1-hanli1dom.oseanli.cn app-deployments]\> ls -lah total 16K drwxr-xr-x. 4 52f9ea824945dc480e000187 52f9ea824945dc480e000187 4.0K Feb 11 04:16 . drwxr-x---. 14 root 52f9ea824945dc480e000187 4.0K Feb 11 2014 .. drwxr-x---. 5 52f9ea824945dc480e000187 52f9ea824945dc480e000187 4.0K Feb 11 04:16 2014-02-11_04-16-58.916 drwxr-xr-x. 2 52f9ea824945dc480e000187 52f9ea824945dc480e000187 4.0K Feb 11 04:16 by-id lrwxrwxrwx. 1 52f9ea824945dc480e000187 52f9ea824945dc480e000187 23 Feb 11 04:16 current -> 2014-02-11_04-16-58.916
Extended test failures highlighted that the gear ssh configs were being created with the wrong permissions after the previous commits. This was causing issues when scaling up (the actual failure was when copying the user environment variables to the newly created gear. The following commits are also needed: commit e4065bf88ed8a8798129f94cd02e36365aa467d4 Author: Jhon Honce <jhonce> Date: Thu Jan 23 14:04:57 2014 -0700 Bug 1049044 - Create more of .openshift_ssh environment commit 64369335f74aaf4cbdbfb9e163b526e4304079c5 Author: Jhon Honce <jhonce> Date: Thu Jan 23 11:04:06 2014 -0700 Bug 1049044 - Restore setting ssh config settings for gear * Setting the file permissions was incorrectly removed * https://bugzilla.redhat.com/show_bug.cgi?id=1049044#c8 The following PR has been submitted to fix: https://github.com/openshift/enterprise-server/pull/229
rubygem-openshift-origin-node-1.17.5.11-1.el6op has been built.
Verified on puddle-2-0-3-2014-02-12 with following steps: 1. Create scaled app sry19. 2. check the .openshift_ssh After move [sry19-hanli1dom.oseanli.cn .openshift_ssh]\> ls -lah total 16K drwxr-x---. 2 52fc53684945dcb34300006b 52fc53684945dcb34300006b 4.0K Feb 13 00:06 . drwxr-x---. 16 root 52fc53684945dcb34300006b 4.0K Feb 13 00:06 .. -rw-rw----. 1 52fc53684945dcb34300006b 52fc53684945dcb34300006b 0 Feb 13 00:06 config -rw-------. 1 52fc53684945dcb34300006b 52fc53684945dcb34300006b 1.7K Feb 13 00:06 id_rsa -rw-------. 1 52fc53684945dcb34300006b 52fc53684945dcb34300006b 423 Feb 13 00:06 id_rsa.pub -rw-rw----. 1 52fc53684945dcb34300006b 52fc53684945dcb34300006b 0 Feb 13 00:06 known_hosts 3.snapshot save and restore [ose215@dhcp-9-237 ~]$ rhc snapshot save sry19 Pulling down a snapshot to sry19.tar.gz... Creating and sending tar.gz RESULT: Success [ose215@dhcp-9-237 ~]$ rhc snapshot restore sry19 Restoring from snapshot sry19.tar.gz... Removing old git repo: ~/git/sry19.git/ Removing old data dir: ~/app-root/data/* Restoring ~/git/sry19.git and ~/app-root/data httpd: Could not reliably determine the server's fully qualified domain name, using nd217.oseanli.cn for ServerName Activation status: success RESULT: Success 4. check the .openshift_ssh after restore. [sry19-hanli1dom.oseanli.cn .openshift_ssh]\> ls -lah total 16K -rw-rw----. 52fc53684945dcb34300006b 52fc53684945dcb34300006b system_u:object_r:openshift_var_lib_t:s0:c1,c163 config -rw-------. 52fc53684945dcb34300006b 52fc53684945dcb34300006b system_u:object_r:openshift_var_lib_t:s0:c1,c163 id_rsa -rw-------. 52fc53684945dcb34300006b 52fc53684945dcb34300006b system_u:object_r:openshift_var_lib_t:s0:c1,c163 id_rsa.pub -rw-rw----. 52fc53684945dcb34300006b 52fc53684945dcb34300006b system_u:object_r:openshift_var_lib_t:s0:c1,c163 known_hosts 5. scale up app ose215@dhcp-9-237 ~]$ rhc app show sry19 --gears ID State Cartridges Size SSH URL ------------------------ ------- -------------------- ----- ---------------------------------------------------------------------- 52fc53684945dcb34300006b started ruby-1.9 haproxy-1.4 small 52fc53684945dcb34300006b.cn 52fc55944945dcb343000095 started ruby-1.9 haproxy-1.4 small 52fc55944945dcb343000095.cn 6. ssh the second gear [ose215@dhcp-9-237 ~]$ ssh 52fc55944945dcb343000095.cn The authenticity of host '52fc55944945dcb343000095-hanli1dom.oseanli.cn (10.66.78.216)' can't be established. RSA key fingerprint is a3:ba:c0:09:5f:0f:13:50:8e:1e:2e:95:7f:66:ae:7c. Are you sure you want to continue connecting (yes/no)? yes
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0209.html