994993 – Already at the maximum number of gears

Bug 994993 - Already at the maximum number of gears

Summary: Already at the maximum number of gears

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	1.2.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Brenton Leanhardt
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-08 11:14 UTC by John Wiebalk
Modified:	2017-03-08 17:35 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-30 17:12:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
access_log (298.11 KB, text/plain) 2013-08-08 18:19 UTC, John Wiebalk	no flags	Details
error_log (3.33 MB, text/plain) 2013-08-08 18:20 UTC, John Wiebalk	no flags	Details
mcollective.log (1001.48 KB, text/plain) 2013-08-08 18:20 UTC, John Wiebalk	no flags	Details
production.log (1.43 MB, text/plain) 2013-08-08 18:20 UTC, John Wiebalk	no flags	Details
View All

Description John Wiebalk 2013-08-08 11:14:16 UTC

Description of problem:
When trying to scale an app I am receiving an "Already at max number of gears allowed for either the app or your account" error. I have the app at -1 max, account is set to 100 gears max. Tried with multiple apps, new and older ones. Verifyied account is set to 100 max with oo-admin-chk-user.

Version-Release number of selected component (if applicable): 1.2.0


How reproducible: Reproduces with any app under my account. Have not tried with new account (using ldap authentication so cannot create a new account for myself)


Steps to Reproduce:
1. Have app attempt to auto scale - Receive error in scale_log
2. Tried to manually set to more than 1 gear in web console. Received "/bin/sh: /var/lib/openshift/520260403443a9fd3a00005f/jbosseap/bin/setup: Permission denied.
3.

Actual results: Received Already at max number of gears allowed for either the app or your account" or "/bin/sh: /var/lib/openshift/520260403443a9fd3a00005f/jbosseap/bin/setup: Permission denied errors. If I watch the /var/lib/openshift directory on my node I can see the new gear get created and then immediately deleted.


Expected results: Should create new gear and start that gear.


Additional info:

Comment 2 Miciah Dashiel Butler Masters 2013-08-08 15:48:55 UTC

Have you run oo-diagnostics on your broker and node hosts? If not, please do so and report any output.

Some background: When you use the Web console to scale the app up, the console sends a REST request to the broker API with your scale-up command, the broker passes that command on to the node, and then on the node the haproxy cartridge's control script sends a REST call to the broker API to initiate the scale-up.

From the error message you are seeing, it looks like when the node runtime sends the REST call to the broker to scale up, the broker responds with 422 Unprocessable Entity, which the node runtime interprets as your hitting the limit on gears:

https://github.com/openshift/origin-server/blob/master/cartridges/openshift-origin-cartridge-haproxy/usr/bin/gear-scale-ctl.rb#L53

It will help to diagnose the issue if you can provide some log files:

On the broker host,

• /var/log/openshift/broker/httpd/access_log
  (so we can see what REST calls the console is making to the broker and what HTTP codes the broker is sending back)

• /var/log/openshift/broker/httpd/error_log
  (so we can look for hints as to why the broker is sending an error response to the node host; maybe we'll luck out and get a backtrace)

• /var/log/openshift/broker/production.log
  (ditto)


On the node host,

• /var/log/mcollective.log
  (so we can look for any hints in case the node runtime is confused)

Would you mind attaching these log files to this bug report?

Comment 3 John Wiebalk 2013-08-08 18:19:46 UTC

Created attachment 784543 [details]
access_log

Comment 4 John Wiebalk 2013-08-08 18:20:08 UTC

Created attachment 784544 [details]
error_log

Comment 5 John Wiebalk 2013-08-08 18:20:28 UTC

Created attachment 784545 [details]
mcollective.log

Comment 6 John Wiebalk 2013-08-08 18:20:49 UTC

Created attachment 784546 [details]
production.log

Comment 7 John Wiebalk 2013-08-08 18:22:10 UTC

I have uploaded the requested log files.


Running oo-diagnostics only reports warnings for selinux being in permissive mode.

Comment 8 Brenton Leanhardt 2013-08-08 18:41:50 UTC

I'm betting this is related to JBoss timing out.

As a sanity test, could you verify you can create other types of scaled gears.  You could try something lightweight like a scaled php application.  If that's the case we can try working through what could be causing the problem with JBoss on your system.

Comment 9 John Wiebalk 2013-08-08 18:44:04 UTC

I have/had a php application that I also attempted to scale and I got the same issues. I was able to scale previously but after a few demo's of the Scaleapp it is no longer working. I also tried removing and recreating the app and still got the same issue.

Thanks

Comment 10 John Wiebalk 2013-08-08 19:32:27 UTC

I just had a co-worker attempt to create a php app with scaling and it gave him the permission denied app. I then tried and was able to create a new php app but was not able to modify the minimum to scale it without getting the permission denied.

Comment 11 John Wiebalk 2013-08-08 19:38:52 UTC

I had him retry and it failed again, tried without haproxy and it worked. Tried a new one with haproxy again and it now worked. Tried manually scaling it inside the webconsole and that failed.

Hope this helps.

Comment 12 Brenton Leanhardt 2013-08-09 17:28:15 UTC

Can you tail the production.log while triggering the error?  There a number of stack traces in the logs so we could be dealing with multiple problems.  I'd like to focus on one at a time.

Comment 13 John Wiebalk 2013-08-13 11:15:27 UTC

Sorry for the delayed response. In a desperate attempt to get everything working for a presentation on friday I rebooted the broker and nodes. Everything appears to be working now, I was able to do my scaling presentation without getting these errors. 


Thanks

Comment 14 Brenton Leanhardt 2013-08-13 12:00:46 UTC

Interesting.  There's definitely still something going wrong.  Let's leave this bug open for now.  Please update us if the problem persists.

Comment 16 Brenton Leanhardt 2013-09-30 17:12:22 UTC

I'm going to close this bug for now since we aren't able to reproduce the exact problem as originally reported.

The current plan is to spend time improving the haproxy's interpretation of the Broker's response on scaling events.  We believe that was what led to the unhelpful error message.

Note You need to log in before you can comment on or make changes to this bug.