Bug 996609 - broker reports 'started' status for the application even when all cartridges for the app are not running.
broker reports 'started' status for the application even when all cartridges ...
Status: CLOSED NOTABUG
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
2.x
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Hiro Asari
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-13 10:19 EDT by Chris Ryan
Modified: 2015-05-14 19:26 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-14 14:05:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
nodejs error log (4.68 KB, text/plain)
2013-08-13 10:19 EDT, Chris Ryan
no flags Details

  None (edit)
Description Chris Ryan 2013-08-13 10:19:22 EDT
Created attachment 786176 [details]
nodejs error log

Description of problem:
After creating a nodejs application (scalable or non-scalable) with a postgresql cartridge (8.4 or 9.2) attached, the postgresql cartridge does not restart properly after a 'stop'. That is to say, after the cartridge is restarted, when the app url is accessed, a 503 error is returned, but the rhc status of the app returns 'running.' I have attached the error logs from the app. 

Version-Release number of selected component (if applicable):
devenv_3641 (ami-fee7a197)

How reproducible:
Always

Steps to Reproduce:
1. Create a nodejs app and attach a postgresql cartridge
2. Stop the postgresql cartridge
3. Restart the postgrestql cartridge

Actual results:
The cartridge shows "running" but a 503 error is returned from the url, as well, the error "`sh "-c" "node server.js"` failed with 1' was found in the logs


Expected results:
The cartridge is restarted without issue. 

Additional info:
Comment 1 Hiro Asari 2013-08-13 11:24:29 EDT
The bare application doesn't seem to have this problem. Does your application try to make connection to the PostgreSQL server? I can't tell from the logs what is refusing the connection to cause the failure.

$ bx bin/rhc app create foo nodejs-0.6 postgresql-9.2
Application Options
-------------------
  Namespace:  fooooooooooo
  Cartridges: nodejs-0.6, postgresql-9.2
  Gear Size:  default
  Scaling:    no

Creating application 'foo' ... done

⋮
⋮

Your application 'foo' is now available.

  URL:        http://foo-fooooooooooo.dev.rhcloud.com/
  SSH to:     621273939073518031339520@foo-fooooooooooo.dev.rhcloud.com
  Git remote: ssh://621273939073518031339520@foo-fooooooooooo.dev.rhcloud.com/~/git/foo.git/
  Cloned to:  /Users/asari/Development/src/rht/rhc/foo

Run 'rhc show-app foo' for more details about your app.

$ bx bin/rhc cartridge stop postgresql-9.2 -a foo
Stopping postgresql-9.2 ... done

$ bx bin/rhc app restart foo
RESULT:
foo restarted

$ curl -IL http://foo-fooooooooooo.dev.rhcloud.com/
HTTP/1.0 200 OK
Date: Tue, 13 Aug 2013 15:12:31 GMT
X-Powered-By: Express
Content-Type: text/html; charset=UTF-8
Content-Length: 5235
Vary: Accept-Encoding,User-Agent
ProxyTime: D=4707
X-Cache: MISS from file01.intranet.prod.int.rdu2.redhat.com
X-Cache-Lookup: MISS from file01.intranet.prod.int.rdu2.redhat.com:8080
Via: 1.0 file01.intranet.prod.int.rdu2.redhat.com (squid/3.1.10)
Connection: keep-alive
Comment 2 Chris Ryan 2013-08-13 11:54:03 EDT
Yes, the application makes a connection to the postgresql server.  Please reference the code at https://github.com/cjryan/nodejs-bughunting. Particularly, the postgresql_factory.js, server.js, and package.json files.
Comment 3 Hiro Asari 2013-08-13 13:09:55 EDT
To reproduce this error using the application, do:

rhc app create foo nodejs-0.6 postgresql-9.2 --from-code https://github.com/cjryan/nodejs-bughunting.git
rhc cartridge stop postgresql-9.2 -a foo
rhc cartridge restart nodejs-0.6 -a foo
curl -IL http://foo-*/

After this, I see that individual cartridges report 'down' status, but 'rhc app show --state' reports 'up'.

$ bx bin/rhc cartridge status nodejs-0.6 -a foo    
RESULT:
Application is not running
$ bx bin/rhc cartridge status postgresql-9.2 -a foo
RESULT:
Postgres is stopped
$ bx bin/rhc app show foo --state                  
Cartridge nodejs-0.6, postgresql-9.2 is started
Comment 4 Hiro Asari 2013-08-13 14:56:10 EDT
After stopping postgresql cartridge, you need to access '/postgresql' to trigger  the error and status 503.

Thus the more accurate procedure is:

rhc app create foo nodejs-0.6 postgresql-9.2 --from-code \
https://github.com/cjryan/nodejs-bughunting.git
rhc cartridge stop postgresql-9.2 -a foo
rhc cartridge restart nodejs-0.6 -a foo
curl -IL http://foo-*/postgresql
curl -IL http://foo-*/


Then:

$ bx bin/rhc cartridge status nodejs-0.6 -a foo
RESULT:
Application is not running
$ bx bin/rhc cartridge status postgresql-9.2 -a foo
RESULT:
Postgres is not running
$ bx bin/rhc app show foo --state
Cartridge nodejs-0.6, postgresql-9.2 is started
Comment 5 Hiro Asari 2013-08-13 15:02:05 EDT
The CLI ticket is Bug 996713.
Comment 6 Hiro Asari 2013-08-13 15:04:18 EDT
'app show --state' uses broker's response from the REST URL, e.g.,

https://ec2-23-22-236-121.compute-1.amazonaws.com/broker/rest/domains/fooooooooooo/applications/foo/gear_groups

If the primary web framework cartridge dies, as it does here, the broker fails to pick up the correct status of the gear group. (In other words, if the primary web framework cartridge is stopped correctly--say, via 'cartridge stop'--the broker updates the status correctly.

On the other hand, it may be more desirable for 'rhc' to report each cartridge's status for 'app state --show'. (This is Bug 996713.)

I'm sending this to the broker team for further review.
Comment 7 Jhon Honce 2013-08-13 22:19:14 EDT
State is what is intended for the application, while status is per cartridge.

It is expected that a failing application could have a state of started while a status of not running for the web framework.

Hiro, on restart is the node.js code pooling or waiting on the database connection? As a test, stop the application, start the postgres cartridge, then start the node.js cartridge.  Is that successful?
Comment 8 Hiro Asari 2013-08-13 23:12:25 EDT
I'm going to confine the discussion to how this example application that Chris created.

When the application is restarted, node.js process is running; 'cartridge state' returns running. I guess 'waiting' is the closer of the two alternatives given; there is nothing that really waits. But only when '/postgresql' is accessed, is the connection attempt made. And the process dies. Yay for callbacks.

The stop-the-app, start-the-db, start-the-nodejs flow results in:

$ bx bin/rhc app stop foo
RESULT:
foo stopped
$ bx bin/rhc app show foo --state
Cartridge nodejs-0.6, postgresql-9.2 is stopped
$ bx bin/rhc cartridge start postgresql-9.2 -a foo                                                                                                                                                              
Starting postgresql-9.2 ... done
$ bx bin/rhc app show foo --state                 
Cartridge nodejs-0.6, postgresql-9.2 is stopped
$ bx bin/rhc cartridge start nodejs-0.6 -a foo    
Starting nodejs-0.6 ... done
$ bx bin/rhc app show foo --state             
Cartridge nodejs-0.6, postgresql-9.2 is started

So, at some point, the broker consults the cartridges to figure out that the application's state should be 'started'. It is just not happening when the process dies unexpectedly.
Comment 9 Chris Ryan 2013-08-14 04:12:58 EDT
Not sure if this will influence much, but a slight addendum to the procedure. This is how we were originally testing to reproduce this bug:

rhc app create foo nodejs-0.6 postgresql-9.2 --from-code https://github.com/cjryan/nodejs-bughunting.git
rhc cartridge stop postgresql-9.2 -a foo
rhc cartridge restart postgresql-9.2 -a foo
curl -IL http://foo-*/

That is, only the database cartridge is stopped/restarted, not the app. Thanks!
Comment 10 Hiro Asari 2013-08-14 14:05:32 EDT
As this discrepancy is by design and documented, I'm closing this as NOTABUG.

Note You need to log in before you can comment on or make changes to this bug.