Bug 1250910 - Always met Internal Server Error when access non-scalable aerogear app
Always met Internal Server Error when access non-scalable aerogear app
Status: CLOSED CURRENTRELEASE
Product: OpenShift Online
Classification: Red Hat
Component: Image (Show other bugs)
2.x
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Jacob Lucky
Wenjing Zheng
: Reopened
: 1264724 1268717 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-06 05:37 EDT by Yan Du
Modified: 2016-01-20 12:03 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-21 16:11:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
gear requirement in openshift.redhat.com (70.65 KB, image/png)
2015-10-19 23:43 EDT, Kenjiro Nakayama
no flags Details
Screenshot comparison of aerogear info before/after changes (26.07 KB, image/png)
2015-10-21 09:22 EDT, John W. Lamb
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker AGPUSH-1519 Major Closed Aerogear Push Server Cartridge no longer works with "small" OpenShift gears 2016-12-14 04:34 EST

  None (edit)
Description Yan Du 2015-08-06 05:37:43 EDT
Description of problem:
Always met Internal Server Error when access non-scalable aerogear app



Version-Release number of selected component (if applicable):
devenv_5597


How reproducible:
always


Steps to Reproduce:
1. Login in web
2. Input "aerogear " in Search by keyword box
3. Create a non-scalable aerogear app
4. Access the app by url


Actual results:
Internal Server Error return


Expected results:
App could be accessed normally


Additional info:
1. Try to restart the app and met some error as below:
# rhc app restart push1
Failed to execute: 'control restart' for /var/lib/openshift/55c30c295520e2d30f000003/mysql

# rhc app show push1
push1 @ http://push1-111.dev.rhcloud.com/ (uuid: 55c30c295520e2d30f000003)
--------------------------------------------------------------------------
  Domain:     111
  Created:    3:26 PM
  Gears:      1 (defaults to small)
  Git URL:    ssh://55c30c295520e2d30f000003@push1-111.dev.rhcloud.com/~/git/push1.git/
  SSH:        55c30c295520e2d30f000003@push1-111.dev.rhcloud.com
  Deployment: auto (on git push)

  aerogear-aerogear-push-1.1.0 (AeroGear UnifiedPush Server 1.1.0 and WildFly)
  ----------------------------------------------------------------------------
    From:    https://cartreflect-claytondev.rhcloud.com/reflect?github=aerogear/openshift-origin-cartridge-aerogear-push#AeroGear
    Website: http://www.aerogear.org
    Gears:   Located with mysql-5.5

  mysql-5.5 (MySQL 5.5)
  ---------------------
    Gears:          Located with aerogear-aerogear-push-1.1.0
    Connection URL: mysql://$OPENSHIFT_MYSQL_DB_HOST:$OPENSHIFT_MYSQL_DB_PORT/
    Database Name:  push1
    Password:       WAdDEfFffiKn
    Username:       adminaRi6q5w


2. Scalable aerogear works normally
Comment 1 John W. Lamb 2015-08-18 17:06:46 EDT
I haven't been able to reproduce this in Online or a devenv. QE, please verify?
Comment 2 Yan Du 2015-08-20 22:48:22 EDT
Hi, John W. Lamb

I still could reproduce the issue on stg.openshift.com (devenv-stage_1175).

you could access my app on STG: http://aero-111.stg.rhcloud.com/
Comment 3 Florian Traverse 2015-08-21 11:16:48 EDT
I face the same issue on an aerogear that was running for months, some days ago it became unreliable, yet not as much as today(we could ssh).

We thought at the time there was a full, but in fact we had OutOfMemory errors, so we used the JVM_HEAP_RATIO=0.6 env var to give it a bit more, then it crashed again, we have set JVM_HEAP_RATIO=0.8, then it worked for 2 hours, and it never recovered from it.

I cannot even ssh / snapshot it to put it on a bigger gear. I've tried several things, including setting JVM_HEAP_RATIO=0.1 and JVM_HEAP_RATIO=0.0 to make the app crash but so I could ssh and snapshot, but it did not work, still.

This aerogear is in production.


If I create a new aerogear on another account(a free one) I could ssh during the afternoon, yet it had several issues, simple commands like "env | grep MYSQL" returned 6 errors that it couldn't fork, lately I could not snapshot this one either nor could I ssh it anymore :

```
$> rhc snapshot save
Pulling down a snapshot of application 'aerogear' to aerogear.tar.gz ...
Error in trying to save snapshot. You can try to save manually by running:
ssh 55d7083a0c1e667f64000181@aerogear-alloci.rhcloud.com 'snapshot' > aerogear.tar.gz

$> rhc ssh
Connecting to 55d7083a0c1e667f64000181@aerogear-alloci.rhcloud.com ...
Write failed: Broken pipe
```
Comment 5 Timothy Williams 2015-08-21 14:12:20 EDT
Florian,

Thank you for the useful information. I can verify that this bug is reproducible in production. You should be able to run the below to force-stop your application, then you can snapshot the application to a larger gear:

  # rhc force-stop-app <app_name>

The issue is being investigated.
Comment 7 Florian Traverse 2015-08-24 03:44:22 EDT
Timothy,

Thank you for your help, yet I did this several times( maybe a hundred in a day ), yet I couldn't snapshot the application to a larger gear:

  # rhc force-stop-app -a aerogear -n allocab
  RESULT:
  aerogear force stopped

  # time rhc snapshot save
  Pulling down a snapshot of application 'aerogear' to aerogear.tar.gz ...
  Error in trying to save snapshot. You can try to save manually by running:
  ssh 55141216e0b8cd9b7a00002a@aerogear-allocab.rhcloud.com 'snapshot' > aerogear.tar.gz
         76.86 real         0.58 user         0.09 sys

  # time ssh  -vvv -o ConnectTimeout=240 55141216e0b8cd9b7a00002a@aerogear-allocab.rhcloud.com 'snapshot' > aerogear.tar.gz
  OpenSSH_6.2p2, OSSLShim 0.9.8r 8 Dec 2011
  debug1: Reading configuration data /Users/XXXXX/.ssh/config
  debug1: Reading configuration data /etc/ssh_config
  debug1: /etc/ssh_config line 20: Applying options for *
  debug1: /etc/ssh_config line 102: Applying options for *
  debug2: ssh_connect: needpriv 0
  debug1: Connecting to aerogear-allocab.rhcloud.com [54.172.190.223] port 22.
  debug2: fd 3 setting O_NONBLOCK
  debug1: connect to address 54.172.190.223 port 22: Operation timed out
  ssh: connect to host aerogear-allocab.rhcloud.com port 22: Operation timed out
         75.98 real         0.00 user         0.00 sys


I've added "time" in order to give you the timeouts. I've tried to set the timeout to larger values in ssh, yet it still stop in timeout at 76s.

This looks like a severe bug to me, should I create a new one or you can answer on this one as this may be related ?
Comment 8 Timothy Williams 2015-08-24 11:37:57 EDT
Florian,

I apologize for the inconvience. I've reached out the the openshift.com operations team about your gear. It appears that the multiple oom's had caused your application to be locked to keep the gear from restarting constantly. You should be able to snapshot your application without issue now.

Could you please try this and let us know?
Comment 9 Florian Traverse 2015-08-25 04:17:59 EDT
Timothy,

Sorry, missed the message yesterday, did it right away :

  # time rhc snapshot save
  Pulling down a snapshot of application 'aerogear' to aerogear.tar.gz ...
  Error in trying to save snapshot. You can try to save manually by running:
  ssh 55141216e0b8cd9b7a00002a@aerogear-allocab.rhcloud.com 'snapshot' > aerogear.tar.gz
         79.60 real         0.61 user         0.34 sys


Still the same issue :(
Comment 10 Florian Traverse 2015-08-25 06:16:23 EDT
Tried again :

  # rhc force-stop-app -a aerogear -n allocab
  RESULT:
  aerogear force stopped

  # time rhc snapshot save
  Pulling down a snapshot of application 'aerogear' to aerogear.tar.gz ...
  Error in trying to save snapshot. You can try to save manually by running:
  ssh 55141216e0b8cd9b7a00002a@aerogear-allocab.rhcloud.com 'snapshot' > aerogear.tar.gz
         77.35 real         0.51 user         0.14 sys
Comment 11 Timothy Williams 2015-08-25 13:31:20 EDT
Florian,

It looks like the gear was locked again after it was started. The lock has been removed again and you should be able to snapshot. Please do not start the application before snapshotting.

Is simply re-creating the application in a larger gear size an option for you? Or is there data on the aerogear application that you require?

Additionally, did you make any changes to your application before you started seeing OutOfMemory errors? The most recent aerogear quickstart now requires a medium gear. Did you update the aerogear components in the gear?
Comment 12 Florian Traverse 2015-08-25 18:02:45 EDT
Timothy,

I've just seen your answer, then I've just tried to snapshot it. It did not work again.

I've got an uptime monitor that may do hits on application url, yet the said monitor was "paused". it may have awaken the gear then (maybe paused means only the alarm is not sent).

I've just deleted it entirely rather than paused, so it may help.

Simply recreating the app would already have been done if we would not need the database content for our current keys.
My issue for dumping the database is being unable to to connect to it, even through another gear (I'm just missing ip/port to be able to do this once the mysql cartridge is running), and snapshot would be event better in order to move to a bigger gear.

We did not make any changes to the application before we started seeing OutOfMemory errors, we just used it for some months in production ( the push is relatively secondary, yet it is useful to our user for using our service, while not mandatory ), then it started to crash (one time, then we have added an uptimerobot, then another crash some days later, then tidy + env-var for giving more memory, then crash after 1hour, then infinite crash...)

We did not update the aerogear components in the gear.
Comment 13 Florian Traverse 2015-08-25 18:08:17 EDT
BTW, we probably have some services trying to send pushes regularly (I'd say between 100 and 500 pushes a day), so it probably awakens the app.

I'll make a hotfix in production in order to avoid using the app from our servers. I hope our user phones do not attempt to connect to it (apps are Android / iOS, I suppose they do not need to connect directly, so changing server configuration should be enough).
Comment 14 Florian Traverse 2015-08-25 18:40:18 EDT
Ok production hotfix is done, we no longer hit the service regularly
Comment 15 Florian Traverse 2015-08-26 04:54:18 EDT
So could you stop the gear, then I''l do my "rhc force-stop-app -a aerogear -n allocab ; time rhc snapshot save -a aerogear -n allocab" ?
Comment 16 Timothy Williams 2015-08-26 14:20:31 EDT
Florian,

We've stopped the gear and removed the lock again. We have also temporarily increased the memory limit a bit so you can get the snapshot. Assuming you are not starting the gear any more, you should be able to obtain the snapshot now.
Comment 17 Florian Traverse 2015-08-27 03:05:56 EDT
Timothy,

I tried again and still could not snapshot the gear, ssh still timeout, maybe I should not "rhc force-stop-app" just before running the "rhc snapshot save" ?

Or maybe something still tries to use the gear, but I've removed anything I know, would it be possible to rename the gear temporarily in order to avoid such a case ?
Comment 18 Florian Traverse 2015-08-31 04:43:03 EDT
Note : My (ssh, etc.) issues were linked in some manner with a DNS issue in Openshift Online, where my gear was referenced by IP rather than name, and I guess after some crashes (and/or assistance from redhat) it has been moved from server, so it would just not work anymore.
Comment 19 Timothy Williams 2015-09-01 10:21:26 EDT
Florian,

I apologize for the delay in my replies.

Is your application working now? Were you at least able to snapshot it to move it to another application/gear size?
Comment 20 John W. Lamb 2015-09-28 14:44:03 EDT
The aerogear cartridge requires a medium-sized gear, as of this change:
https://github.com/aerogear/openshift-origin-cartridge-aerogear-push/commit/9dc0b2595f7b58beaf538947c2670c673d38f1f5

This means that aerogear carts won't run properly under free OpenShift Online accounts, since they are limited to Small gears. As the change above makes clear, this wasn't always the case.

The README for this cart needs to be clear about what the expected behavior for this cartridge will be when deployed onto a small gear. It also needs to note the change more prominently and possibly describe how a user could roll back their deployment to a commit id that will work with small gears.
Comment 21 John W. Lamb 2015-10-16 16:44:22 EDT
*** Bug 1268717 has been marked as a duplicate of this bug. ***
Comment 22 John W. Lamb 2015-10-16 16:46:48 EDT
Another customer has run into this issue. Are there any plans to make the Aerogear quick start work with small gears? Evidently the medium gear requirement isn't obvious from the way the instructions are written.

Relevant customer case attached.
Comment 23 matzew 2015-10-19 02:30:38 EDT
yes, we need medium gear size for this
Comment 24 matzew 2015-10-19 03:10:16 EDT
let me update the README that we require the medium size.
Comment 25 John W. Lamb 2015-10-19 16:20:49 EDT
The medium gear size requirement is now prominently displayed on the hub.openshift.com gear listing in the short and long descriptions, and the requirement will be more clearly indicated in the README, so closing this bug. Marking "WONTFIX" since the behavior reported is still going to happen for users who try to deploy this on small gears.
Comment 26 Kenjiro Nakayama 2015-10-19 23:42:34 EDT
Can you highlight the requirement note in openshift.redhat.com? Generally users start creating the application from this page.

It is a little difficult to notice the note now. (I attach the screenshot.)
Comment 27 Kenjiro Nakayama 2015-10-19 23:43 EDT
Created attachment 1084581 [details]
gear requirement in openshift.redhat.com
Comment 28 John W. Lamb 2015-10-20 11:47:13 EDT
(In reply to Kenjiro Nakayama from comment #26)
> Can you highlight the requirement note in openshift.redhat.com? Generally
> users start creating the application from this page.
> 
> It is a little difficult to notice the note now. (I attach the screenshot.)

Jake, is this possible? If you can't bold the "Note:" via markup, maybe put it at the top of the short description?
Comment 29 Jacob Lucky 2015-10-20 11:51:05 EDT
I can't pass through styling information, but I did remove some of the line breaks to bring this up higher on the page.
Comment 30 John W. Lamb 2015-10-20 13:23:43 EDT
Kenjiro, does this work for you? I'm not sure we can make this more clear.
Comment 31 Kenjiro Nakayama 2015-10-20 21:20:18 EDT
Hmm... I can't see any update on my web console (I guess the sync has not been done yet.) I will check it and update tomorrow again.
Comment 32 John W. Lamb 2015-10-21 09:22 EDT
Created attachment 1085129 [details]
Screenshot comparison of aerogear info before/after changes
Comment 33 John W. Lamb 2015-10-21 09:23:51 EDT
(In reply to Kenjiro Nakayama from comment #31)
> Hmm... I can't see any update on my web console (I guess the sync has not
> been done yet.) I will check it and update tomorrow again.

I've uploaded a screenshot comparison showing the changes I see.
Comment 34 Kenjiro Nakayama 2015-10-21 09:39:53 EDT
Thank you. If you can't modify the styling, I think this is fine.
Comment 35 Rory Thrasher 2016-01-20 12:03:44 EST
*** Bug 1264724 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.