Hide Forgot
Description of problem: I occasionally (usually about once a month, sometimes more) start getting 503 errors when accessing my app. A restart solves the problem. I noticed that the cause of the 503 is inability to access the database (I'm using Python and Postgres). I also noticed that the problem usually pops up right after a maintenance event. Version-Release number of selected component (if applicable): The app is located at: http://wabistory-saharagray.rhcloud.com/ It uses cartridges: Python 2.6 PostgreSQL 8.4 How reproducible: Since I noticed the problem, I set up a cron job on another server to check the status of the app every 10 minutes. When I get connectivity issues, I notice it is usually following a maintenance event. Here are recent outage times (US/Chicago time): 19 Aug - 13:00 24 Jul - 05:40 19 Jul - 11:30 16 Jun - 16:40 The OpenShift status page doesn't show exact dates and times, but I believe you will find this correspond to maintenance events. Steps to Reproduce: Wait for maintenance. Check this URL: https://wabistory-saharagray.rhcloud.com/1/status The method attempts a very simple database query. Actual results: When it is working, returns 200 status with JSON success: true. When it fails, returns a 500 status & default error page. Expected results: It should always return 200 status with JSON success: true in the body, except *during* planned outages. Additional info: Event when the above is failing, methods that do not attempt to connect to the database return a 200 status. E.g. https://wabistory-saharagray.rhcloud.com/ Also discussed in this thread: https://www.openshift.com/forums/openshift/app-requires-restart-after-maintenance
Posted by @bjudson on 8/29: This happened again 27 Aug 22:00, and then 29 Aug 00:20 (I'm on US Central time). The first one I restarted almost immediately, the second I had just go to sleep, so the app was down for about 8 hours. Updated the bugs severity to high.
Several Questions: #1 Are you using a connection pool to access the database? Or, does your connection configuration have a retry timer for when the database is unavailable? #2 Is this a scalable application? #3 What process do you use to restart the application?
*** Bug 1004521 has been marked as a duplicate of this bug. ***
I don't know if this matters at this point, but the answers are: 1. I'm using a standard Flask-SQLAlchemy configuration, which if I'm not mistaken uses a connection pool by default. 2. It is not a scalable app. 3. rhc app restart <appname> I can't access the bug you have marked this a duplicate of, but am I correct in assuming the issue is resolved?
Fix will go out in next production release. If you experience issues after that, please reopen bug. Sorry for any inconvenience.
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/59d0a4b73ab67585d7f69cdf5f37b13bea15d2f1 Bug 1000764 - Enforce cartridge start order * Start secondary cartridges before primary cartridge
Test on devenv_3772, with jbosseap + postgresql 8.4 jdbc configured. During app restarting, there is no such error appears in jboss server log. Also can find the start sequence in the output: [jbeap1-bmengdev.dev.rhcloud.com 523009b9c6aa501c16000001]\> gear start Starting gear... Starting Postgres cartridge server starting Postgres started Starting jbosseap cartridge Move bug to verified.