Bug 1124416
Summary: | Could not scale a rails application | ||
---|---|---|---|
Product: | OpenShift Online | Reporter: | Hugo Amorim <hugoamorimlyra> |
Component: | Image | Assignee: | Maciej Szulik <maszulik> |
Status: | CLOSED CANTFIX | QA Contact: | libra bugs <libra-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.x | CC: | bparees, decarr, dmcphers, hugoamorimlyra, jokerman, mmccomas, ringrum |
Target Milestone: | --- | Keywords: | UpcomingRelease |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-06-12 01:38:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Hugo Amorim
2014-07-29 12:48:35 UTC
I tried to reproduce the described scenario as follows on devenv_5022: Steps to Reproduce: 1. rhc setup --server https://localhost/broker/rest/api 2. rhc create-app myapp ruby-2.0 -s 3. rhc app scale-up -a myapp -d Actual results: [root@ip-10-63-41-92 ~]# rhc app scale-up -a myapp -d DEBUG: Using config file /root/.openshift/express.conf DEBUG: Git config 'git config --get rhc.app-id' returned '' DEBUG: Git config 'git config --get rhc.app-name' returned '' DEBUG: Git config 'git config --get rhc.domain-name' returned '' DEBUG: Authenticating with RHC::Auth::Token DEBUG: Connecting to https://localhost/broker/rest/api DEBUG: Getting all domains DEBUG: Client supports API versions 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 DEBUG: Using token authentication DEBUG: Created new httpclient DEBUG: Request GET https://localhost/broker/rest/api DEBUG: SSL Verification failed -- Using self signed cert DEBUG: code 200 24 ms DEBUG: Server supports API versions 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 DEBUG: Using API version 1.7 DEBUG: Client API version 1.7 is not current. Refetching API DEBUG: Using token authentication DEBUG: Request GET https://localhost/broker/rest/api DEBUG: code 200 13 ms DEBUG: Using token authentication DEBUG: Request GET https://localhost/broker/rest/domains DEBUG: code 200 18 ms DEBUG: Using token authentication DEBUG: Request GET https://localhost/broker/rest/domain/temp/application/myapp DEBUG: code 200 85 ms DEBUG: Using token authentication DEBUG: Request POST https://localhost/broker/rest/application/53d7f975676b7cd98e000029/events DEBUG: code 200 20627 ms RESULT: myapp scaled up Derek, i removed the compiled assets process in deployment and this steps works like you show. But when I testing the autoscale with load test i get this error: I, [2014-07-29T11:22:25.726508 #173588] INFO -- : Starting haproxy_ctld I, [2014-07-29T11:22:25.748485 #173588] INFO -- : add-gear - capacity: 462.5% gear_count: 1 sessions: 74 up_thresh: 90.0% I, [2014-07-29T11:27:10.972963 #173588] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 500 I'm try with jenkins the same error occurs with auto-scale: I, [2014-07-29T12:42:29.305580 #262170] INFO -- : add-gear - capacity: 112.5% gear_count: 1 sessions: 18 up_thresh: 90.0% I, [2014-07-29T12:48:05.932454 #262170] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 504 Manual scale works! Autoscaling with traffic keep generate this error: add-gear - capacity: 100.0% gear_count: 1 sessions: 16 up_thresh: 90.0% I, [2014-07-29T13:58:54.944096 #426382] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 504 Can you provide the following information so we can debug further: 1) user login (not password) 2) domain 3) application name attempting to scale Thanks, It looks like you have set both max and min on this app/cartridge to 2. That means you cannot scale up or down. What is your intention behind this setting? To change the settings you can run rhc cartridge scale ruby-1.9 --min <integer> --max <integer> -a saro Lili, I really set to 2! Until I dont figure out what happens with auto scale. (In reply to Lili Nader from comment #6) > It looks like you have set both max and min on this app/cartridge to 2. > That means you cannot scale up or down. What is your intention behind this > setting? > > To change the settings you can run > > rhc cartridge scale ruby-1.9 --min <integer> --max <integer> -a saro hugoamorimlyra www.sistemasaro.com.br sistemasaro (In reply to Derek Carr from comment #5) > Can you provide the following information so we can debug further: > > 1) user login (not password) > 2) domain > 3) application name attempting to scale > > Thanks, Hi Hugo, by setting min and max to the same number, you have essentially disabled autoscaling. Because autoscaling cannot scale-up above 2 gears and it cannot scale-down below 2 gears. Please change max to be greater than min value. Lili, I understand that. But when i changed to 1-16 a example and started a load test the autoscale see the traffic and called a "action" to start another Gear, but that gear don't succeed in this action, keep return this errors: add-gear - capacity: 100.0% gear_count: 1 sessions: 16 up_thresh: 90.0% I, [2014-07-29T13:58:54.944096 #426382] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 504 I, [2014-07-29T11:22:25.726508 #173588] INFO -- : Starting haproxy_ctld I, [2014-07-29T11:22:25.748485 #173588] INFO -- : add-gear - capacity: 462.5% gear_count: 1 sessions: 74 up_thresh: 90.0% I, [2014-07-29T11:27:10.972963 #173588] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 500 Hugo, where are you getting these log entries? When I look at our system logs I see A success message on scale up INFO src=useraction RESULT=SUCCESS STATUS=ok TIMESTAMP=1406654982 DATE=2014-07-29 TIME=13:29:42 ACTION=SCALE_UP_APPLICATION REQ_ID=15fce493774a8b2ea9758d6444c4b74f USER_ID=53d401b65973cac8de00018d LOGIN=hugoamorimlyra APP_UUID=53d7d1dae0b8cd1aed0003f9 DOMAIN=manguebits Application saro has scaled to 3 A failure of scale up due to time out which is ok because it will try again INFO src=useraction RESULT=FAILURE STATUS=internal_server_error TIMESTAMP=1406656774 DATE=2014-07-29 TIME=13:59:34 ACTION=SCALE_UP_APPLICATION REQ_ID=9080f46c18a8ea1c6e4b515b1c3fb433 USER_ID=53d401b65973cac8de00018d LOGIN=hugoamorimlyra APP_UUID=53d7d1dae0b8cd1aed0003f9 DOMAIN=manguebits Unable to complete the requested operation due to: The server ex-std-node19.prod.rhcloud.com that your application is running on failed to respond in time. This may be due to a system restart. And a failure to scale down due to min=2 INFO src=useraction RESULT=SUCCESS STATUS=unprocessable_entity TIMESTAMP=1406657441 DATE=2014-07-29 TIME=14:10:41 ACTION=SCALE_DOWN_APPLICATION REQ_ID=32f12cece39302888ef1c98f1486e3a1 USER_ID=53d401b65973cac8de00018d LOGIN=hugoamorimlyra APP_UUID=53d7d1dae0b8cd1aed0003f9 DOMAIN=manguebits Cannot scale down below gear limit of 2. I'm see this logs on haproxy_ctld.log! I configure the application to scale to 1-3, now the application has one gear, and started a load test. Nothing again the scale see the traffic but nothing to successfully scale up. Again the log in haproxy_ctld.log: I, [2014-07-31T07:21:13.706729 #254681] INFO -- : Starting haproxy_ctld I, [2014-07-31T07:24:59.022322 #254681] INFO -- : add-gear - capacity: 112.5% gear_count: 1 sessions: 18 up_thresh: 90.0% I, [2014-07-31T07:25:38.681584 #254681] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 502 The status of my application for almost 30 minutes was http://cubeupload.com/im/lwwirh.png and nothing to scale. I will keep this settings 1-3, please try to make some request and see if you successfully scale the application. I appreciate your help and attention. (In reply to Lili Nader from comment #11) > Hugo, where are you getting these log entries? When I look at our system > logs I see > > A success message on scale up > > INFO src=useraction RESULT=SUCCESS STATUS=ok TIMESTAMP=1406654982 > DATE=2014-07-29 TIME=13:29:42 ACTION=SCALE_UP_APPLICATION > REQ_ID=15fce493774a8b2ea9758d6444c4b74f USER_ID=53d401b65973cac8de00018d > LOGIN=hugoamorimlyra APP_UUID=53d7d1dae0b8cd1aed0003f9 > DOMAIN=manguebits Application saro has scaled to 3 > > A failure of scale up due to time out which is ok because it will try again > > INFO src=useraction RESULT=FAILURE STATUS=internal_server_error > TIMESTAMP=1406656774 DATE=2014-07-29 TIME=13:59:34 > ACTION=SCALE_UP_APPLICATION REQ_ID=9080f46c18a8ea1c6e4b515b1c3fb433 > USER_ID=53d401b65973cac8de00018d LOGIN=hugoamorimlyra > APP_UUID=53d7d1dae0b8cd1aed0003f9 DOMAIN=manguebits Unable to complete the > requested operation due to: The server ex-std-node19.prod.rhcloud.com that > your application is running on failed to respond in time. This may be due > to a system restart. > > > And a failure to scale down due to min=2 > > INFO src=useraction RESULT=SUCCESS STATUS=unprocessable_entity > TIMESTAMP=1406657441 DATE=2014-07-29 TIME=14:10:41 > ACTION=SCALE_DOWN_APPLICATION REQ_ID=32f12cece39302888ef1c98f1486e3a1 > USER_ID=53d401b65973cac8de00018d LOGIN=hugoamorimlyra > APP_UUID=53d7d1dae0b8cd1aed0003f9 DOMAIN=manguebits Cannot scale down below > gear limit of 2. I keep the load test for a little while: I, [2014-07-31T07:24:59.022322 #254681] INFO -- : add-gear - capacity: 112.5% gear_count: 1 sessions: 18 up_thresh: 90.0% I, [2014-07-31T07:25:38.681584 #254681] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 502 I, [2014-07-31T07:35:39.770020 #254681] INFO -- : add-gear - capacity: 518.75% gear_count: 1 sessions: 83 up_thresh: 90.0% I, [2014-07-31T07:37:22.708368 #254681] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 500 I, [2014-07-31T07:47:24.147964 #254681] INFO -- : add-gear - capacity: 506.25% gear_count: 1 sessions: 81 up_thresh: 90.0% I, [2014-07-31T07:48:54.480351 #254681] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 500 He keep trying to scale! In the last try: I, [2014-07-31T07:24:59.022322 #254681] INFO -- : add-gear - capacity: 112.5% gear_count: 1 sessions: 18 up_thresh: 90.0% I, [2014-07-31T07:25:38.681584 #254681] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 502 I, [2014-07-31T07:35:39.770020 #254681] INFO -- : add-gear - capacity: 518.75% gear_count: 1 sessions: 83 up_thresh: 90.0% I, [2014-07-31T07:37:22.708368 #254681] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 500 I, [2014-07-31T07:47:24.147964 #254681] INFO -- : add-gear - capacity: 506.25% gear_count: 1 sessions: 81 up_thresh: 90.0% I, [2014-07-31T07:48:54.480351 #254681] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 500 I, [2014-07-31T07:58:55.849197 #254681] INFO -- : add-gear - capacity: 506.25% gear_count: 1 sessions: 81 up_thresh: 90.0% I, [2014-07-31T08:04:29.522432 #254681] INFO -- : add-gear - exit_code: 1 output: The add-gear request failed with http_code: 504 He change the code and a worst thing happens he make me available gears down for a little time! Hugo, thanks this very helpful! I think it indicates that the haproxy cartridge itself is having problems adding a new gear. Anyway, you can share with us you code and load test so we can recreate this issue is a dev environment? If not that is ok too Lili for load test i'm using ApacheBench with this simple command: ab -n 10000 -c 40 https://www.sistemasaro.com.br/ The code I dont have permission to share. Sorry. Some info about the stack may help: Rails 4 Delayed Job Postgres -The only thing i change on openshift config was: -Add marker disable_asset_compilation -Add on post_start_ruby cd $OPENSHIFT_REPO_DIR RAILS_ENV=production script/delayed_job start & -Add on pre_stop_ruby cd $OPENSHIFT_REPO_DIR RAILS_ENV=production script/delayed_job stop & It appears that the haproxy cartridge itself is having the problem adding the gear. Maciej, please take a look and see if this is a cartridge issue (problem creating the additional gear). (See if you can recreate it first) Hugo, can you please provide us with more information regarding your setup? From previous comments I remember you can't provide us with the code, but can you give more or less information what kind of calls are being made on the main page? Which cartridges exactly you're using, as well as what's inside your job? Did you try to test your application without that job? The reason for me asking all this is that I've spent last two days testing various combination of ruby carts, db calls and scaling and unfortunately wasn't able to reproduce the problem. Thanks, Cartridges: Ruby 1.9 Web Load Balancer PostgreSQL 9.2 The jobs is only a delaydjob work to send email with mandrill api. The application is a simple crud application only http request. Has there been a resolution to this? I'm getting the same issue on a Django application. Robert, if fails when autoscaling, or manually scaling? Can you tell us more about your app? any other cartridges besides python? Of course. It fails both when autoscaling, and when manually scaling. One gear has Python 2.7, Web Load Balancer, and Cron 1.4. The other has PostgreSQL 9.2. I'm trying to scale up the Python gear. Here is a copy of the console log when I try to scale up using rhc: C:\Users\Robert>rhc cartridge-scale python --min 2 -a production -d DEBUG: Using config file C:/Users/Robert/.openshift/express.conf DEBUG: Git config 'git config --get rhc.app-id' returned '' DEBUG: Git config 'git config --get rhc.app-name' returned '' DEBUG: Git config 'git config --get rhc.domain-name' returned '' DEBUG: Authenticating with RHC::Auth::Token DEBUG: Connecting to https://openshift.redhat.com/broker/rest/api DEBUG: Getting all domains DEBUG: Client supports API versions 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 DEBUG: Created new httpclient DEBUG: Request GET https://openshift.redhat.com/broker/rest/api DEBUG: code 200 1880 ms DEBUG: Server supports API versions 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 DEBUG: Using API version 1.7 DEBUG: Client API version 1.7 is not current. Refetching API DEBUG: Request GET https://openshift.redhat.com/broker/rest/api DEBUG: code 200 112 ms DEBUG: Using token authentication DEBUG: Request GET https://openshift.redhat.com/broker/rest/domains DEBUG: code 200 220 ms DEBUG: Using token authentication DEBUG: Request GET https://openshift.redhat.com/broker/rest/domain/purplewall/application/production?include=cartridges DEBUG: code 200 352 ms Using python-2.7 (Python 2.7) for 'python' This operation will run until the application is at the minimum scale and may take several minutes. Setting scale range for python-2.7 ... DEBUG: Setting scales = scales_from: 2 DEBUG: Using token authentication DEBUG: Request PUT https://openshift.redhat.com/broker/rest/application/52f2c6d7e0b8cd42a200001a/cartridge/python-2.7 DEBUG: code 500 189132 ms Unable to complete the requested operation due to: An invalid exit code (1) was returned from the server ex-std-node506.prod.rhcloud.com. This indicates an unexpected problem during the execution of your request. Reference ID: 7264cb26da5a89ce552d6ce6fa334687 That looks a bit different than the original issue here. That sounds like something our operations team may need to look into. Can you open a new bug and paste in your last comment? You can assign it to me initially and i'll get the right team to look at it. thanks. Will do, thank you for the quick response. |