Bug 893348 - [STG]Caught "504 Gateway Time-out" error or "Node Execution failure" when creating scalable app or JBoss app sometimes
Summary: [STG]Caught "504 Gateway Time-out" error or "Node Execution failure" when cre...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Dan McPherson
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-09 07:54 UTC by Mengjiao Gao
Modified: 2015-05-14 23:04 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-13 23:03:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Caught "504 Gateway Time-out" error when creating scalable app (76.25 KB, image/png)
2013-01-09 07:54 UTC, Mengjiao Gao
no flags Details
jenkins server log (29.53 KB, text/x-log)
2013-01-25 10:47 UTC, Jianwei Hou
no flags Details

Description Mengjiao Gao 2013-01-09 07:54:17 UTC
Created attachment 675296 [details]
Caught "504 Gateway Time-out" error when creating scalable app

Description of problem:
Caught "504 Gateway Time-out" error when creating scalable app, while the scalabel app can be created successfully in the fact.

Version-Release number of selected component (if applicable):
STG

How reproducible:
always

Steps to Reproduce:
1.Go to https://stg.openshift.redhat.com and log in.
2.Try to create a scalable non-template or template app.
3.Check the created app list
4.Check the created scalable app.
  
Actual results:
2.User will get a page said "504 Gateway Time-out.The server didn't respond in time. ", and cannot get the app get_started page.
3.The scalable app created at step 2 can be listed.
4.The app url of the scalable app created at step 2 can be accessed successfully.

Expected results:
2.User should get the app get_started page without error.

Additional info:
Seems this is a server problem ,but not a functional problem.

Comment 1 Zhe Wang 2013-01-11 04:50:22 UTC
returned error 504 when creating scaling apps via both rhc and REST API. 

However, for a scaling jbossas-7 app, although its state was "started" after creation (with error 504 being returned), its URL was inaccessible. Moreover, for all other types of scaling apps, their states were "started" and they could be accessed via their URLs.

Comment 2 Wei Sun 2013-01-11 06:37:39 UTC
Today tested on STG,the same issue was reproduced as Description ,and added db cartridge to the app,also returned error 504,but db cartridge was embedded successfully.

Comment 3 Zhe Wang 2013-01-11 09:49:46 UTC
In today's auto and manual testing, the timing-out problem existed when I created a scaling app (of any types supporting scaling), but the scaling app was indeed created (the gear groups' states were "started", and the URLs were accessible). On the other hand, there was no problem to create any non-scaling apps.

Comment 4 chunchen 2013-01-11 09:57:37 UTC
after alter domain name, then all the old apps were broken,but could create new apps in the new name domain.


Version-Release number of selected component (if applicable):
STG

How reproducible:
always

Steps to Reproduce:
1.Create domain
  rhc domain create dccy2 
2.Create 7 apps, contained scalable and non-scalable apps 
3.Alter domain name
  rhc domain update dccy2 dccy3
4.Show domain
  rhc domain show
5.All operations related old apps will be failed,like:
  rhc app show <appname>
  rhc app delete <appname>
  rhc cartridge remove <DB_cartridge> -a <appname>
  ...and so on
6.Access all old apps URL via browser
  
Actual results:
Step 3. Server returned an unexpected error code: 504
Setp 4. Domain dccy2 not found
Step 5. Domain dccy2 not found
Step 6. apps urls are changed to "https://<app-url>/app" from "http://<app-url>", and the content of non-jbossas apps becomes that of jbossas. 

Expected results:
Alter domain name successfully, should not return any error.

Additional info:

Comment 5 Matt Woodson 2013-01-11 14:23:57 UTC
I believe i have fixed this issue.

I have had a project to implement haproxy at our proxy node layer.  In doing so, haproxy has a 1 minute server timeout.  This was kicking in after 1 minute, and giving back the 504 error.

I have extended the timeout time to 5 minutes.  I have done some initial testing.  Before the change, apps would timeout.  After the change, I am getting the completion.

Comment 6 Peter Ruan 2013-01-11 17:54:11 UTC
I'm still experiencing the following error when I tried to create a scalable app in STG.

OP status: Gateway Time-out

Comment 7 Peter Ruan 2013-01-11 23:33:12 UTC
twiest fixed the issue and I was able to create scaleable apps of different types in STG.  Putting status as VERIFIED.

Comment 8 Zhe Wang 2013-01-25 06:59:44 UTC
"Node execution failure" is reported when creating apps in STG.

Comment 9 Xiaoli Tian 2013-01-25 07:02:59 UTC
It happened again today on current stage(devenv-stage_278), when creating scaling app or jbossas7, jbossews related applications, it will failed with " Server returned an unexpected error code: 504" or "Node Execution failure".

Some logs from mcollective ( captured by whearn):

Couldn't determine IP for cartridge haproxy-1.4
        Cart namespace: HAPROXY
        Lookup order: [:OPENSHIFT_HAPROXY_IP, :OPENSHIFT_HAPROXY_DB_HOST]
        Env: {:OPENSHIFT_GEAR_UUID=>"4742c367289a44f292312d17f2a426c1", :OPENSHIFT_DATA_DIR=>"/var/lib/openshift/4742c367289a44f292312d17f2a426c1/app-root/data/", :OPENSHIFT_GEAR_NAME=>"jbosseap23423", :OPENSHIFT_REPO_DIR=>"/var/lib/openshift/4742c367289a44f292312d17f2a426c1/app-root/runtime/repo/", :OPENSHIFT_HAPROXY_INTERNAL_IP=>"127.13.76.1", :OPENSHIFT_APP_DNS=>"jbosseap23423-42gx9o0fl6.stg.rhcloud.com", :OPENSHIFT_GEAR_DNS=>"jbosseap23423-42gx9o0fl6.stg.rhcloud.com", :OPENSHIFT_APP_UUID=>"4742c367289a44f292312d17f2a426c1", :OPENSHIFT_HOMEDIR=>"/var/lib/openshift/4742c367289a44f292312d17f2a426c1/", :OPENSHIFT_APP_NAME=>"jbosseap23423", :HISTFILE=>"/var/lib/openshift/4742c367289a44f292312d17f2a426c1/app-root/data/.bash_history", :PATH=>"/usr/libexec/openshift/cartridges/embedded/haproxy-1.4/info/bin/:'/usr/libexec/openshift/cartridgesabstract-httpd/info/bin/:/usr/libexec/openshift/cartridgesabstract/info/bin/:$PATH'", :OPENSHIFT_HAPROXY_STATUS_IP=>"127.13.76.2", :OPENSHIFT_TMP_DIR=>"/tmp/"}
      
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.3.6/lib/openshift-origin-node/model/application_container.rb:426:in `get_cart_ip'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.3.6/lib/openshift-origin-node/model/application_container.rb:351:in `delete_endpoints'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.3.6/bin/oo-delete-endpoints:71:in `<top (required)>'
/usr/bin/oo-delete-endpoints:23:in `load'
/usr/bin/oo-delete-endpoints:23:in `<main>'
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 950: cat: command not found
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 966: rm: command not found
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 968: chown: command not found
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 969: chcon: command not found
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 970: chmod: command not found
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 196: tac: command not found
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 234: tr: command not found
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 234: sed: command not found
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 234: sort: command not found
/usr/libexec/openshift/cartridges/abstract/info/lib/util: line 734: find: command not found
/usr/libexec/openshift/cartridges/embedded/haproxy-1.4/info/hooks/deconfigure: line 54: sed: command not found
cat: /var/lib/openshift/4742c367289a44f292312d17f2a426c1//haproxy-1.4/run/haproxy.pid: No such file or directory
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]

Comment 10 Xiaoli Tian 2013-01-25 09:53:51 UTC
More errors :

Running Command - rhc app create jbossewsaeyonhv jbossews-1.0 -l bmeng+1 -p changeme --timeout 360 -s 
Application Options
-------------------
Namespace: zfspq5qemy
Cartridges: jbossews-1.0
Gear Size: default
Scaling: yes

Creating application 'jbossewsaeyonhv' ... An error occurred while communicating with the server. This problem may only be temporary. Check that you have correctly specified your OpenShift server 'https://stg.openshift.redhat.com/broker/rest/domains/zfspq5qemy/applications'.
Command Return: 1
ERROR!

Comment 11 Jianwei Hou 2013-01-25 10:47:16 UTC
Created attachment 687398 [details]
jenkins server log

I was having a hard time doing jenkins build after upgrade on stage.
The jenkins server is very slow, I waited around 40 minutes for the jenkins web console to be available. Moreover, I was unable to build applications with jenkins for my old apps, the slave apps can't be created successfully.

It seems more like a problem with node itself, rather than a functional problem. I'm not sure what's going on with the nodes on stage, and I can't provide logs for broker and mcollective.

I added jenkins server log

Comment 12 Peter Ruan 2013-01-25 20:49:44 UTC
After the hotfix, I can now create scaleable jboss apps in STG.
[peter@unused-32-138 junk]$ rhc app create jbossewsaeyonhv jbossews-1.0 -s 
Password: ********

Application Options
===================
  Gear Size: default
  Namespace: migration
  Cartridge: jbossews-1.0
  Scaling:   yes

Creating application 'jbossewsaeyonhv' ... done

Waiting for your DNS name to be available ... done

Downloading the application Git repository ...
Cloning into 'jbossewsaeyonhv'...
Warning: Permanently added 'jbossewsaeyonhv-migration.stg.rhcloud.com' (RSA) to the list of known hosts.

Your application code is now in 'jbossewsaeyonhv'

jbossewsaeyonhv @ http://jbossewsaeyonhv-migration.stg.rhcloud.com/ (uuid: 464527a7a83744fda5416252ce5f7151)
============================================================================================================
  Created:   3:46 PM
  Gear Size: small
  Git URL:   ssh://464527a7a83744fda5416252ce5f7151.rhcloud.com/~/git/jbossewsaeyonhv.git/
  SSH:       464527a7a83744fda5416252ce5f7151.rhcloud.com

  jbossews-1.0 (Tomcat 6 (JBoss EWS 1.0))
  =======================================
    Scaling: x2 (minimum: 2, maximum: available) on small gears

  haproxy-1.4 (OpenShift Web Balancer)
  ====================================

RESULT:
Application jbossewsaeyonhv was created.

Comment 13 Xiaoli Tian 2013-01-29 13:41:26 UTC
Have to re-open this bug to track the issue in STG:

Tested in STG(devenv-stage_281), When stg  had just finished upgrading , it  failed to create scaling application or jboss application with high frequency( 90%) , will fail to do other operations like add db cartridges sometime as well.

After some operations from OPS team, the successful rate is increasing, but all the operations may still fail sometimes like creating jboss application, scaling application or add postgresql cartridge etc:

Running Command - rhc cartridge add mongodb-2.2 -a mongoueyyfxu1 -l bmeng+1 -p '123123' --timeout 360
Adding mongodb-2.2 to application 'mongoueyyfxu1' ... Node execution failure (invalid exit code from node). If the problem persists please contact Red Hat support.
Reference ID: fda2fa6d053f4d939f8913cc091cba31
Command Return: 1
ERROR [06:14:25] Adding mongodb-2.2 to application 'mongoueyyfxu1' ... Node execution failure (invalid exit code from node). If the problem persists please contact Red Hat support.
Reference ID: fda2fa6d053f4d939f8913cc091cba31

Scaleup :

INFO [04:38:54]: Action: SCALE_UP
[33mDIAGNOSTIC[0m: URL: https://stg.openshift.redhat.com/broker/rest/domains/xbna1th8sk/applications/7k68qry5er/events
[33mDIAGNOSTIC[0m: Response of non 'OK' [status/data]: Internal Server Error/None
ERROR [04:39:27] {"data":null,"messages":[{"exit_code":120,"field":null,"severity":"error","text":"Node execution failure (invalid exit code from node). If the problem persists please contact Red Hat support.\nReference ID: 34ab6ea1e5c2422d99d5cc20da07637c"}],"status":"internal_server_error","supported_api_versions":[1.0,1.1,1.2,1.3],"type":null,"version":"1.3"}
[31;01mFAILED[0m: Caught Fail exception: Expected return: 0, got 1



Running Command - rhc app create c87vxn2sod jbossas-7 -l wsun+2 -p redhat -r ./c87vxn2sod --timeout 360 -s 
Application Options
-------------------
Namespace: 3z0ixz89t9
Cartridges: jbossas-7
Gear Size: default
Scaling: yes

Creating application 'c87vxn2sod' ... Server returned an unexpected error code: 504

Comment 14 Dan Mace 2013-01-29 17:59:04 UTC
The haproxy deconfigure exception noted in comment #11 is unrelated to the configure timeouts originally reported in the issue and doesn't cause the deconfigure operation to fail, regardless. I filed a new bug for the deconfigure issue:

https://bugzilla.redhat.com/show_bug.cgi?id=905568

Comment 15 Dan McPherson 2013-01-29 23:44:07 UTC
We have lowered the number of apps in stage to be closer to prod.  I haven't been able to create a failure all day.  But it is still painfully slow.  I also tested the previous stage release for a regression and I find the times to be very similar.  At this point I believe we have a general performance issue that is pushed over the edge by the number of apps on the system.  Please retest with the new state of stage and let me know if you experience is similar.

Comment 16 Xiaoli Tian 2013-01-30 02:39:17 UTC
Quick update about current result on STG(devenv-stage_281)

The successful rate is high, many scaling apps like JBoss related scaling apps could be created successfully, it will take about 3 minutes to create a scaling ruby app, more information about the other cartridges will be added after finish testing today.

Comment 17 Meng Bo 2013-01-30 03:03:18 UTC
Checked on latest stage (devenv-stage_281), all the scalable app can be created successfully.

Only one failure (Node execution failure) during my testing, which create all the kind of scalable apps.

All the time stamp list as below.

RESULT:
Application ruby19s was created.
real    2m59.792s
user    0m1.009s
sys     0m0.108s

RESULT:
Application jbossas1s was created.
real    4m35.146s
user    0m1.004s
sys     0m0.100s

RESULT:
Application php1s was created.
real    2m14.045s
user    0m1.031s
sys     0m0.080s

RESULT:
Application perl1s was created.
real    2m22.067s
user    0m1.254s
sys     0m0.106s

RESULT:
Application python1s was created.
real    2m26.713s
user    0m1.027s
sys     0m0.090s

RESULT:
Application nodejs1s was created.
real    3m58.618s
user    0m1.000s
sys     0m0.097s

RESULT:
Application ruby18s was created.
real    3m37.229s
user    0m1.243s
sys     0m0.104s

RESULT:
Application jbosseap1s was created.
real    5m22.615s
user    0m1.061s
sys     0m0.106s

RESULT:
Application jbossews1s was created.
real    3m52.851s
user    0m1.017s
sys     0m0.097s

RESULT:
Application jbossews2s was created.
real    3m40.123s
user    0m1.160s
sys     0m0.093s

Mark this bug as fixed.


Note You need to log in before you can comment on or make changes to this bug.