Created attachment 675296 [details] Caught "504 Gateway Time-out" error when creating scalable app Description of problem: Caught "504 Gateway Time-out" error when creating scalable app, while the scalabel app can be created successfully in the fact. Version-Release number of selected component (if applicable): STG How reproducible: always Steps to Reproduce: 1.Go to https://stg.openshift.redhat.com and log in. 2.Try to create a scalable non-template or template app. 3.Check the created app list 4.Check the created scalable app. Actual results: 2.User will get a page said "504 Gateway Time-out.The server didn't respond in time. ", and cannot get the app get_started page. 3.The scalable app created at step 2 can be listed. 4.The app url of the scalable app created at step 2 can be accessed successfully. Expected results: 2.User should get the app get_started page without error. Additional info: Seems this is a server problem ,but not a functional problem.
returned error 504 when creating scaling apps via both rhc and REST API. However, for a scaling jbossas-7 app, although its state was "started" after creation (with error 504 being returned), its URL was inaccessible. Moreover, for all other types of scaling apps, their states were "started" and they could be accessed via their URLs.
Today tested on STG,the same issue was reproduced as Description ,and added db cartridge to the app,also returned error 504,but db cartridge was embedded successfully.
In today's auto and manual testing, the timing-out problem existed when I created a scaling app (of any types supporting scaling), but the scaling app was indeed created (the gear groups' states were "started", and the URLs were accessible). On the other hand, there was no problem to create any non-scaling apps.
after alter domain name, then all the old apps were broken,but could create new apps in the new name domain. Version-Release number of selected component (if applicable): STG How reproducible: always Steps to Reproduce: 1.Create domain rhc domain create dccy2 2.Create 7 apps, contained scalable and non-scalable apps 3.Alter domain name rhc domain update dccy2 dccy3 4.Show domain rhc domain show 5.All operations related old apps will be failed,like: rhc app show <appname> rhc app delete <appname> rhc cartridge remove <DB_cartridge> -a <appname> ...and so on 6.Access all old apps URL via browser Actual results: Step 3. Server returned an unexpected error code: 504 Setp 4. Domain dccy2 not found Step 5. Domain dccy2 not found Step 6. apps urls are changed to "https://<app-url>/app" from "http://<app-url>", and the content of non-jbossas apps becomes that of jbossas. Expected results: Alter domain name successfully, should not return any error. Additional info:
I believe i have fixed this issue. I have had a project to implement haproxy at our proxy node layer. In doing so, haproxy has a 1 minute server timeout. This was kicking in after 1 minute, and giving back the 504 error. I have extended the timeout time to 5 minutes. I have done some initial testing. Before the change, apps would timeout. After the change, I am getting the completion.
I'm still experiencing the following error when I tried to create a scalable app in STG. OP status: Gateway Time-out
twiest fixed the issue and I was able to create scaleable apps of different types in STG. Putting status as VERIFIED.
"Node execution failure" is reported when creating apps in STG.
It happened again today on current stage(devenv-stage_278), when creating scaling app or jbossas7, jbossews related applications, it will failed with " Server returned an unexpected error code: 504" or "Node Execution failure". Some logs from mcollective ( captured by whearn): Couldn't determine IP for cartridge haproxy-1.4 Cart namespace: HAPROXY Lookup order: [:OPENSHIFT_HAPROXY_IP, :OPENSHIFT_HAPROXY_DB_HOST] Env: {:OPENSHIFT_GEAR_UUID=>"4742c367289a44f292312d17f2a426c1", :OPENSHIFT_DATA_DIR=>"/var/lib/openshift/4742c367289a44f292312d17f2a426c1/app-root/data/", :OPENSHIFT_GEAR_NAME=>"jbosseap23423", :OPENSHIFT_REPO_DIR=>"/var/lib/openshift/4742c367289a44f292312d17f2a426c1/app-root/runtime/repo/", :OPENSHIFT_HAPROXY_INTERNAL_IP=>"127.13.76.1", :OPENSHIFT_APP_DNS=>"jbosseap23423-42gx9o0fl6.stg.rhcloud.com", :OPENSHIFT_GEAR_DNS=>"jbosseap23423-42gx9o0fl6.stg.rhcloud.com", :OPENSHIFT_APP_UUID=>"4742c367289a44f292312d17f2a426c1", :OPENSHIFT_HOMEDIR=>"/var/lib/openshift/4742c367289a44f292312d17f2a426c1/", :OPENSHIFT_APP_NAME=>"jbosseap23423", :HISTFILE=>"/var/lib/openshift/4742c367289a44f292312d17f2a426c1/app-root/data/.bash_history", :PATH=>"/usr/libexec/openshift/cartridges/embedded/haproxy-1.4/info/bin/:'/usr/libexec/openshift/cartridgesabstract-httpd/info/bin/:/usr/libexec/openshift/cartridgesabstract/info/bin/:$PATH'", :OPENSHIFT_HAPROXY_STATUS_IP=>"127.13.76.2", :OPENSHIFT_TMP_DIR=>"/tmp/"} /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.3.6/lib/openshift-origin-node/model/application_container.rb:426:in `get_cart_ip' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.3.6/lib/openshift-origin-node/model/application_container.rb:351:in `delete_endpoints' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.3.6/bin/oo-delete-endpoints:71:in `<top (required)>' /usr/bin/oo-delete-endpoints:23:in `load' /usr/bin/oo-delete-endpoints:23:in `<main>' /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 950: cat: command not found /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 966: rm: command not found /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 968: chown: command not found /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 969: chcon: command not found /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 970: chmod: command not found /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 196: tac: command not found /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 234: tr: command not found /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 234: sed: command not found /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 234: sort: command not found /usr/libexec/openshift/cartridges/abstract/info/lib/util: line 734: find: command not found /usr/libexec/openshift/cartridges/embedded/haproxy-1.4/info/hooks/deconfigure: line 54: sed: command not found cat: /var/lib/openshift/4742c367289a44f292312d17f2a426c1//haproxy-1.4/run/haproxy.pid: No such file or directory kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
More errors : Running Command - rhc app create jbossewsaeyonhv jbossews-1.0 -l bmeng+1 -p changeme --timeout 360 -s Application Options ------------------- Namespace: zfspq5qemy Cartridges: jbossews-1.0 Gear Size: default Scaling: yes Creating application 'jbossewsaeyonhv' ... An error occurred while communicating with the server. This problem may only be temporary. Check that you have correctly specified your OpenShift server 'https://stg.openshift.redhat.com/broker/rest/domains/zfspq5qemy/applications'. Command Return: 1 ERROR!
Created attachment 687398 [details] jenkins server log I was having a hard time doing jenkins build after upgrade on stage. The jenkins server is very slow, I waited around 40 minutes for the jenkins web console to be available. Moreover, I was unable to build applications with jenkins for my old apps, the slave apps can't be created successfully. It seems more like a problem with node itself, rather than a functional problem. I'm not sure what's going on with the nodes on stage, and I can't provide logs for broker and mcollective. I added jenkins server log
After the hotfix, I can now create scaleable jboss apps in STG. [peter@unused-32-138 junk]$ rhc app create jbossewsaeyonhv jbossews-1.0 -s Password: ******** Application Options =================== Gear Size: default Namespace: migration Cartridge: jbossews-1.0 Scaling: yes Creating application 'jbossewsaeyonhv' ... done Waiting for your DNS name to be available ... done Downloading the application Git repository ... Cloning into 'jbossewsaeyonhv'... Warning: Permanently added 'jbossewsaeyonhv-migration.stg.rhcloud.com' (RSA) to the list of known hosts. Your application code is now in 'jbossewsaeyonhv' jbossewsaeyonhv @ http://jbossewsaeyonhv-migration.stg.rhcloud.com/ (uuid: 464527a7a83744fda5416252ce5f7151) ============================================================================================================ Created: 3:46 PM Gear Size: small Git URL: ssh://464527a7a83744fda5416252ce5f7151.rhcloud.com/~/git/jbossewsaeyonhv.git/ SSH: 464527a7a83744fda5416252ce5f7151.rhcloud.com jbossews-1.0 (Tomcat 6 (JBoss EWS 1.0)) ======================================= Scaling: x2 (minimum: 2, maximum: available) on small gears haproxy-1.4 (OpenShift Web Balancer) ==================================== RESULT: Application jbossewsaeyonhv was created.
Have to re-open this bug to track the issue in STG: Tested in STG(devenv-stage_281), When stg had just finished upgrading , it failed to create scaling application or jboss application with high frequency( 90%) , will fail to do other operations like add db cartridges sometime as well. After some operations from OPS team, the successful rate is increasing, but all the operations may still fail sometimes like creating jboss application, scaling application or add postgresql cartridge etc: Running Command - rhc cartridge add mongodb-2.2 -a mongoueyyfxu1 -l bmeng+1 -p '123123' --timeout 360 Adding mongodb-2.2 to application 'mongoueyyfxu1' ... Node execution failure (invalid exit code from node). If the problem persists please contact Red Hat support. Reference ID: fda2fa6d053f4d939f8913cc091cba31 Command Return: 1 ERROR [06:14:25] Adding mongodb-2.2 to application 'mongoueyyfxu1' ... Node execution failure (invalid exit code from node). If the problem persists please contact Red Hat support. Reference ID: fda2fa6d053f4d939f8913cc091cba31 Scaleup : INFO [04:38:54]: Action: SCALE_UP [33mDIAGNOSTIC[0m: URL: https://stg.openshift.redhat.com/broker/rest/domains/xbna1th8sk/applications/7k68qry5er/events [33mDIAGNOSTIC[0m: Response of non 'OK' [status/data]: Internal Server Error/None ERROR [04:39:27] {"data":null,"messages":[{"exit_code":120,"field":null,"severity":"error","text":"Node execution failure (invalid exit code from node). If the problem persists please contact Red Hat support.\nReference ID: 34ab6ea1e5c2422d99d5cc20da07637c"}],"status":"internal_server_error","supported_api_versions":[1.0,1.1,1.2,1.3],"type":null,"version":"1.3"} [31;01mFAILED[0m: Caught Fail exception: Expected return: 0, got 1 Running Command - rhc app create c87vxn2sod jbossas-7 -l wsun+2 -p redhat -r ./c87vxn2sod --timeout 360 -s Application Options ------------------- Namespace: 3z0ixz89t9 Cartridges: jbossas-7 Gear Size: default Scaling: yes Creating application 'c87vxn2sod' ... Server returned an unexpected error code: 504
The haproxy deconfigure exception noted in comment #11 is unrelated to the configure timeouts originally reported in the issue and doesn't cause the deconfigure operation to fail, regardless. I filed a new bug for the deconfigure issue: https://bugzilla.redhat.com/show_bug.cgi?id=905568
We have lowered the number of apps in stage to be closer to prod. I haven't been able to create a failure all day. But it is still painfully slow. I also tested the previous stage release for a regression and I find the times to be very similar. At this point I believe we have a general performance issue that is pushed over the edge by the number of apps on the system. Please retest with the new state of stage and let me know if you experience is similar.
Quick update about current result on STG(devenv-stage_281) The successful rate is high, many scaling apps like JBoss related scaling apps could be created successfully, it will take about 3 minutes to create a scaling ruby app, more information about the other cartridges will be added after finish testing today.
Checked on latest stage (devenv-stage_281), all the scalable app can be created successfully. Only one failure (Node execution failure) during my testing, which create all the kind of scalable apps. All the time stamp list as below. RESULT: Application ruby19s was created. real 2m59.792s user 0m1.009s sys 0m0.108s RESULT: Application jbossas1s was created. real 4m35.146s user 0m1.004s sys 0m0.100s RESULT: Application php1s was created. real 2m14.045s user 0m1.031s sys 0m0.080s RESULT: Application perl1s was created. real 2m22.067s user 0m1.254s sys 0m0.106s RESULT: Application python1s was created. real 2m26.713s user 0m1.027s sys 0m0.090s RESULT: Application nodejs1s was created. real 3m58.618s user 0m1.000s sys 0m0.097s RESULT: Application ruby18s was created. real 3m37.229s user 0m1.243s sys 0m0.104s RESULT: Application jbosseap1s was created. real 5m22.615s user 0m1.061s sys 0m0.106s RESULT: Application jbossews1s was created. real 3m52.851s user 0m1.017s sys 0m0.097s RESULT: Application jbossews2s was created. real 3m40.123s user 0m1.160s sys 0m0.093s Mark this bug as fixed.