Bug 962432 - Deployment of scaled Rails app is broken
Summary: Deployment of scaled Rails app is broken
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Templates
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Hiro Asari
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-13 13:33 UTC by Boris Mironov
Modified: 2015-05-15 02:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-11 04:03:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
"git push" log from Apr 12, 2013 (52.69 KB, text/plain)
2013-05-22 19:04 UTC, Boris Mironov
no flags Details
"git push" log from May 22, 2013 (44.83 KB, text/plain)
2013-05-22 19:04 UTC, Boris Mironov
no flags Details
pre_build (1.06 KB, text/plain)
2013-05-24 15:33 UTC, Hiro Asari
no flags Details

Description Boris Mironov 2013-05-13 13:33:21 UTC
Description of problem:
Deployment of scaled Rails app is broken. By default, none of the leaf gears work

Version-Release number of selected component (if applicable):
Bug spotted and resolved on free OpenShift.com environment on May 11, 2013

How reproducible:
Always. After each "git push origin"

Steps to Reproduce:
1. Commit change to app repository on local system
2. git push origin
  
Actual results:
None of scaled gears work. Simple HAProxy report always shows red status for all leaf gears

Expected results:
"Leaf" gears should work and HAProxy should show green status for all of them. 

Additional info:
I did a test during OpenShift Beta with Rails application configured to have minimum of 3 gears. None of "leaf" gears work after application development. Application return HTTP 500 instead of proper pages

Comment 1 Boris Mironov 2013-05-13 13:46:21 UTC
It turns out deployment process of scaled Rails 3 app is broken in a couple of places:
- link to precompiled assets gets propagated to leaf gears
- precompiled assets do not get propagated to leaf gears

My application is configured (as default) to run production environment with precompiled assets and do not compile them if they are missing. I have a number of assets in my application (Javascript, CSS, images) and they all go through precompilation during deployment. I clearly see 'rake assets:precompile" to be executed during "git push".

To preserve static assets from recompilation on each "git push", Rails cartridge is configured to drop and recreate symbolic link to $OPENSHIFT_DATA_DIR/assets as "public/assets". rsync is taking care of propagating changes in application directory from head gear to leaf gears and picking up this soft link. Unfortunately, real path of $OPENSHIFT_DATA_DIR is different on each leaf gear because it has username embedded into it (and it is different on each of leaf gears). So, this is the first place where process is broken.

Second place is absence of rsync that would distribute content of $OPENSHIFT_DATA_DIR from head to leaf gears. Our "git push" could generate new static assets in "public/assets" and this content should be redistributed as a part of application upgrade on leaf gear.

Some extra steps should be taken to cleanup static assets in "public/assets" before precompilation to remove old unneeded assets. I believe, manifest.yml could be used to do it with extra care for some assets generated by application (eg, uploaded images via paperclip gem).

Comment 2 Boris Mironov 2013-05-13 13:51:45 UTC
I successfully tested simple solution:

- SSH to application gear
- rsync "ruby-1.9/repo" to all leaf gears
- rsync ~/app-data/root/assests to all leaf gears
- SSH to each leaf gear and change soft link "ruby-1.9/repo/public/assests" to point to ~/app-data/root/assets on that gear

The moment I do link change HAProxy turns status of this gear from red to green

Comment 3 Hiro Asari 2013-05-15 19:40:11 UTC
Is this a new application that you created on May 11, 2013? Or is it an existing application that started exhibiting this problem on that day?

Comment 4 Boris Mironov 2013-05-15 19:48:33 UTC
No, it is not a brand new app. I started this app last year and on May 11 just deployed it for "OpenShift Beta" test on medium gears. At that point I applied all applicable changes published on GitHub (https://github.com/openshift/rails-example/commits/master) including March 25.

Yesterday I tested commit from May 08 "Move asset logic from pre_build to deploy". It showed different behavior:
- application restarted successfully
- instead of sym link public/assets, now I have regular directory with own copy of assets. In turn, it breaks the logic of "static assets preserved during deployment"

Thanks,
Boris

Comment 6 Yujie Zhang 2013-05-17 11:39:34 UTC
(In reply to comment #5)
Tested on STG and PROD, the scalable rubyonrails application can be created and work correctly on STG, but I met "technical difficulty" error when creating scalable rubyonrails application on PROD. The gemfile issue of rubyonrails should be resolved now, detatils see bug 963548.
So could you please try it again? Thanks~

Comment 7 Hiro Asari 2013-05-21 20:16:49 UTC
I think I have something like a reproduction now, described in https://gist.github.com/BanzaiMan/8b637cc4b1bb069fe231.

It seems to me that 'git push' is not pushing code to the secondary gears, as Boris has been describing. It is not clear yet if this is the sole issue at play, but resolving this problem should help greatly.

Comment 8 Hiro Asari 2013-05-21 20:55:49 UTC
There is a report that the problem does not manifest if the application has only 2 gears, and that we may be hitting a timeout that causes this problem.

https://github.com/openshift/origin-server/pull/2578 increases the timeout; we will test it again tomorrow.

Comment 9 Boris Mironov 2013-05-22 02:01:08 UTC
Hi Hiro,

I believe you are mixing these reports (comment #7, #8) with another bug (#962807).

This bug (# 962432) is about broken symbolic links public/assets that are:
- created on head gear (ln -s ~/app-root/data/assets public/assets)
- copied over to all leaf gears.

The actual issue here is that ~/app-root/data/assets includes complete path that is specific to head gear only because it includes username that is unique to head gear. When this information is carried over by rsync (symbolic link public/assets) to ANY leaf gear (regardless, one or more) it produces wrong result there because my username on ANY leaf gear is different from what it is on head gear.

Just to make this long story short (watch for value of $HOME):

[swimming-bsmgroup.rhcloud.com ~]\> echo $HOME
/var/lib/openshift/4fcf6f211aeb4279b098ba74635a1e3b
[swimming-bsmgroup.rhcloud.com ~]\> cat haproxy-1.4/conf/gear-registry.db
79311893ae2544f68e6d9830121782cc.182.117:ruby-1.9;79311893ae-bsmgroup.rhcloud.com
[swimming-bsmgroup.rhcloud.com ~]\> ssh 79311893ae2544f68e6d9830121782cc.182.117
---- snip ---------
[79311893ae-bsmgroup.rhcloud.com ~]\> echo $HOME
/var/lib/openshift/79311893ae2544f68e6d9830121782cc


Therefore, my public/assets on head gear should look like:
assets -> /var/lib/openshift/4fcf6f211aeb4279b098ba74635a1e3b/app-root/data/assets/

And same public/assets on my first leaf gear should be:
assets -> /var/lib/openshift/79311893ae2544f68e6d9830121782cc/app-root/data/assets/


So, this is the first issue. Second issue is that my ~/app-root/data/assets from head gear should be replicated to all leaf gears to propagate all my static assets.

Best regards,
Boris

Comment 10 Hiro Asari 2013-05-22 16:18:24 UTC
Boris,

Thank you for pointing that out. Indeed, this is a separate issue. I will investigate the symlink problem.

Comment 11 Hiro Asari 2013-05-22 18:07:36 UTC
Boris,

Could you attach the entire log you see when you run 'git push' (and have the problem)?

Comment 12 Boris Mironov 2013-05-22 18:38:54 UTC
Hi Hiro,

There is nothing specific in the log of the session because it all hidden behind single line "SSH_CMD ....."

All I see is HAProxy turns gear into RED and this gear stays this way until I fix sym link and static assets.

Please note that recent commit cf6040b9e4c53008c9ac43883532371c2d52ffab to GitHub (https://github.com/openshift/rails-example) "Move asset logic from pre_build to deploy" introduced different behavior to my existing app:

- now it creates regular directory "public/assets"
- ~/app-root/data/assets is not used anymore due to not being referenced by sym link
- all gear come up to GREEN after "git push"
- nothing gets preserved in public/assets between code updates because whole public/assets dir is dropped

Hope this helps with resolution.

Best regards,
Boris

Comment 13 Boris Mironov 2013-05-22 19:03:36 UTC
Hi Hiro,

Please find requested logs.

I have one from Apr 12, 2013 (before applying change from GitHub) and one from today (May 22, 2013)

Best regards,
Boris

Comment 14 Boris Mironov 2013-05-22 19:04:04 UTC
Created attachment 751805 [details]
"git push" log from Apr 12, 2013

Comment 15 Boris Mironov 2013-05-22 19:04:33 UTC
Created attachment 751806 [details]
"git push" log from May 22, 2013

Comment 16 Hiro Asari 2013-05-22 19:25:43 UTC
Boris,

Thank you for the logs. Could you revert the change corresponding to cf6040b for the time being? I am revisiting the motivation for the switch now.

Comment 17 Boris Mironov 2013-05-22 19:38:02 UTC
Sure,

I will do it later today. I will try to provide update before tomorrow morning.

Comment 18 Hiro Asari 2013-05-22 20:21:42 UTC
Thanks. I undid the commit on Github as well, since I could not find the reasoning behind it.

https://github.com/openshift/rails-example/commit/0a3a777835bc537e95663714c7faf8b1b57951ba

Comment 19 Boris Mironov 2013-05-23 03:14:57 UTC
Hi Hiro,

I repeated the same change in my repository and tested it via "git push". Unfortunately, it did not help. Here is what I did and what I got:

- checked that all gears had public/assets as directory
- applied change to repo and did "git push origin"
- checked public/assets on each gear again and saw that they remained as directories
- SSH'ed to all my gears and:
  a) deleted public/assets dir
  b) created sym link public/assets pointing to ~/app-root/data/assets
- added tag to local repo and executed "git push --tags origin"
- checked each gear and noticed that sym links were deleted and recreated as dirs again

Please note that my repo does not track anything in pubic/assets as well as directory itself:

$ git ls-tree --full-tree -r HEAD | grep public
100644 blob 9a48320a5f1c025b6cc9819ab539a6d17fcbaf81	public/404.html
100644 blob 83660ab1878ba9adc6477ed910333e32bb6b46ce	public/422.html
100644 blob f3648a0dbc9f021131677c88383f1cc9b15ea22e	public/500.html
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	public/favicon.ico
100644 blob 085187fa58b1631e955f5d514d895a7721466797	public/robots.txt
100644 blob c28fd93dc444813ec73e705241216b887d4b3c9a	test/integration/public_browsing_test.rb

$ grep public .gitignore
public/assets

Comment 20 Yujie Zhang 2013-05-23 10:16:31 UTC
I still can not reproduce his issue, could you please help to check again? Thanks!

Comment 21 Boris Mironov 2013-05-23 13:58:29 UTC
Hi Yujie,

What are you getting?

Thanks,
Boris

Comment 22 Hiro Asari 2013-05-23 17:59:07 UTC
Hi, Boris,

Thank you for your continued patience.

Could you verify what you have in your pre_build and deploy hooks?

'rake assets:precompile' is run during the 'build' phase of the application lifecycle, so that if 'public/assets' doesn't exist at that point, the rake task will create it. This means that manipulating the 'public/assets' directory in the 'deploy' hook is wrong; the task rightfully belongs to the 'pre_build' hook.

I tried the bare 'rails-example' right now, and it seems to work for me. If this is not working for you right now, could you perhaps come up with a minimal example application that exhibits this problem, so that I can investigate further?

Comment 23 Hiro Asari 2013-05-23 18:19:04 UTC
Boris,

I made a few changes to the examples application for further testing, and I believe I have reproduced the problem you were describing.

I will investigate further.

Thank you.

Comment 24 Boris Mironov 2013-05-23 18:32:03 UTC
Hi Hiro,

Feel free to use application rails-ag47.rhcloud.com

This account is owned by Nam Duong of RedHat and I deployed my app some time ago there for OpenShift Beta.

At the moment that application is dormant and can be used to track down this and some other bugs.

Thanks,
Boris

Comment 25 Boris Mironov 2013-05-23 18:43:21 UTC
As per your request:

$ cat deploy
#!/bin/bash
# This deploy hook gets executed after dependencies are resolved and the
# build hook has been run but before the application has been started back
# up again.  This script gets executed directly, so it could be python, php,
# ruby, etc.

set -e

if [ -z "$OPENSHIFT_MYSQL_DB_HOST" ]
then
    echo 1>&2
    echo "Could not find mysql database.  Please run:" 1>&2
    echo "rhc cartridge add -a $OPENSHIFT_APP_NAME -c mysql-5.1" 1>&2
    echo "then make a sample commit (add whitespace somewhere) and re-push" 1>&2
    echo 1>&2
fi

if [ -z "$OPENSHIFT_MYSQL_DB_HOST" ]
then
    exit 5
fi

pushd ${OPENSHIFT_REPO_DIR} > /dev/null
bundle exec rake db:migrate RAILS_ENV="production"
popd > /dev/null







$ cat pre_build
#!/bin/bash
# This is a simple script and will be executed on your CI system if
# available.  Otherwise it will execute while your application is stopped
# before the build step.  This script gets executed directly, so it
# could be python, php, ruby, etc.

STORED_ASSETS="${OPENSHIFT_DATA_DIR}/assets"
LIVE_ASSETS="${OPENSHIFT_REPO_DIR}/public/assets"

# Ensure our stored assets directory exists
if [ ! -d "${STORED_ASSETS}" ]; then
  echo "  Creating permanent assets directory"
  mkdir "${STORED_ASSETS}"
fi

 Create symlink to stored assets unless we're uploading our own assets
if [ -d "${LIVE_ASSETS}" ]; then
  echo "  WARNING: Assets included in git repository, not using stored assets"
else
  echo "  Restoring stored assets"
  ln -sf "${STORED_ASSETS}" "${LIVE_ASSETS}"
fi

Comment 26 Hiro Asari 2013-05-24 15:33:07 UTC
Created attachment 752731 [details]
pre_build

Boris,

Here's an attempt to create the symbolic links relatively, so that the $HOME value would not affect the path to assets.

Could you test it? It may also be necessary for you to clean up public assets directories on the gears beforehand.

Thank you.

Comment 27 Boris Mironov 2013-05-24 17:43:19 UTC
Hi Hiro,

I deleted public/assets directory on both gears, applied pre_build change and "git push origin".

1) Deployment log showed:
remote: Running .openshift/action_hooks/pre_build
remote:   Restoring stored assets with 'ln -sf "../../../../app-root/data/assets" "../../app-root/repo/public/assets"' in /var/lib/openshift/4fcf6f211aeb4279b098ba74635a1e3b/git/swimming.git

2) Deployment process died after showing "remote: Precompiling with 'bundle exec rake assets:precompile'"

3) I opened another SSH session after about 10 minute wait and found that there is nothing working on head gear:

ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
3148     16427 21041  0 13:30 pts/0    00:00:00 /bin/ps -ef
3148     21031 19666  0 13:21 ?        00:00:00 sshd: 4fcf6f211aeb4279b098ba74635a1e3b@pts/0
3148     21041 21031  0 13:21 pts/0    00:00:00 /bin/bash --init-file /usr/bin/rhcsh -i


4) I verified that all assets were recreated in ~/app-root/data/assets just few minutes before

I do not think that change in pre_build broke deployment. Perhaps, network is a culprit here. I will re-check it again with different network configuration.



Do you need to put extra logic into pre_build to help people migrate back to symbolic links? In other words, if I have my gear has public/assets as a directory and I apply your change to pre_build, would it automatically rebuild it as a symbolic link? What about some assets that I could have put into public/assets via my application and do not track via git (eg, images loaded via paperclip gem)?

Thanks,
Boris

Comment 28 Boris Mironov 2013-05-24 17:53:06 UTC
Hi Hiro,

It is not directly related to this bug...

Do you think it makes sense to add extra environment variable for Rails gear to set RAILS_ENV=production?

And then just add to Readme few words to change it while deploying new application if it is not going to be "production".

Currently, you kind of hardcoded that any application is going to be "production" in "deploy" hook...

Best regards,
Boris

Comment 29 Boris Mironov 2013-05-24 18:21:20 UTC
Hi Hiro,

I retested pre_build again with different network configuration.

1) It was a bit more successful (this time it died after showing

remote: SSH_CMD: ssh 79311893ae2544f68e6d9830121782cc.182.117
Read from remote host swimming-bsmgroup.rhcloud.com: Connection reset by peer
fatal: The remote end hung up unexpectedly
error: error in sideband demultiplexer

2) I checked both of my gears and they both had public/assets as a sym link produced by new version of pre_build

3) application came up on both gears (HAProxy shows 2 green lines)

This networking issue between gears is very annoying.

I can not see content of ~/.ssh/config on my head gear because it is owned by root. Could you please confirm that "rails" cartridge comes with configuration to send KeepAlive pings between gears? Something like:

 ServerAliveInterval 180
 ServerAliveCountMax 5

Thanks,
Boris

Comment 30 Hiro Asari 2013-05-28 14:07:43 UTC
Boris,

Thanks for the updates. Am I correct to understand that the problem may be that of network timeouts?

At this time, there is no ~/.ssh/config on the gears, so the values are default: 0 for ServerAliveInterval (meaning that no signals are sent) and 3 for ServerAliveCountMax.

Comment 31 Hiro Asari 2013-05-28 14:31:05 UTC
Based on the previous item 2 in comment 28, I pushed the new pre_build hook to the quickstart. (https://github.com/openshift/rails-example/commit/3f11463)

Comment 32 Boris Mironov 2013-05-28 14:39:09 UTC
Hi Hiro,

I really think that SSH should be configured to send keepalive pings between gears:

- See our progress in bug #962801 (it is about communication between my desktop and OpenShift)
- I'm using 'yui-compressor' gem to minimize my CSS & JS. It is official and the only solution to minimize CSS. Unfortunately, in the backgroud this gem calls Java JDK. Needless to say that it takes big amount of time especially on small gears that are "short" on RAM. For example, my "git push origin" takes about 7-10 minutes to complete for my scaled app with 2 small gears.

Yes, you are correct that there is some SSH timeouts in play. ;) But this bug is about symbolic links.

Do you want to check "transition" solution too for people who have public/assets as a directory and now will have to switch to symbolic link?

Thanks,
Boris

Comment 33 Hiro Asari 2013-05-28 18:29:33 UTC
Boris,

Thank you for the feedback. Do you mind opening a separate bugzilla ticket to address the SSH configuration? I would like to have a clear separation of concerns here.

I committed the new hook to the quick start repo; I would love to have it tested in more applications.

Comment 34 Boris Mironov 2013-05-29 16:15:46 UTC
Hi Hiro,

I have opened bug #962801 a while ago. It is attributed to SSH configuration that is not robust during "git push origin". I added new comment there today suggesting that gear SSH configuration could be improved.

Thanks,
Boris

Comment 35 Hiro Asari 2013-05-29 20:24:17 UTC
Boris,

Thanks for the ticket. Let us then concentrate our discussion on symbolic links.

The issue was that we were using turbolinks to speed up the deployment process, but this interfered with rsync semantics by which we copied files from primary gears to the secondary gears. (Namely, the logic was to generate assets in $OPENSHIFT_DATA_DIR, but this is not copied to the secondary gears. This is the design decision, and not advisable to tweak around.)

It might be possible to make turbolinks work on scaled applications, but that seems to involved additional fixes that goes beyond the rails quickstart we are discussing here.

So, I reverted the turbolinks logic on the rails-example app (https://github.com/openshift/rails-example/commit/922e6a9). Please test this, and let us know if this works for you. (You might need to remove the existing symbolic links.)

For QA, please create a scalable Rails app, and check that the assets are created and that the app works.

Comment 36 Boris Mironov 2013-05-29 20:58:42 UTC
Hi Hiro,

We still need some way to get from application to $OPENSHIFT_DATA_DIR.

I liked the idea of having symbolic link in public dir. I also do not like to hardcode OpenShift environment variables into my application.

So, how about making public/static_assets as a symbolic link to $OPENSHIFT_DATA_DIR the same way old link was done?

We still have to provide way for people to store assets like images loaded via paperclip gem.

I think this solution will satisfy both requirements. The only missing part would be to sync data_dir between gears of scaled app. But that would be a different story unless you would add 2 rsync commands into post_deploy to sync data_dir back and forth.

Thanks,
Boris

Comment 37 Yujie Zhang 2013-05-30 10:33:05 UTC
Tested on devenv_3290, the scalable ruby on rails application can be created and work, but will meet "We are having technical difficulty...." error for bug 967504, will test it again when that bug is fixed.

Comment 38 Yujie Zhang 2013-05-30 10:54:26 UTC
Tried to create scalable rails app using rhc, it can be created successfully, the issue of sclalable rails has been resolved, so verify this bug, thanks.


Note You need to log in before you can comment on or make changes to this bug.