Bug 1299198

Summary:

Cant stop/start/restart/delete or ssh on the application

Product:

OpenShift Online

Reporter:

nutrilord0 <nutrilord0>

Component:

Image

Assignee:

Rory Thrasher <rthrashe>

Status:

CLOSED DUPLICATE

QA Contact:

Wang Haoran <haowang>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

2.x

CC:

abhgupta, agrimm, aos-bugs, cdaley, jokerman, mmccomas, nutrilord0, wzheng

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-04-04 20:36:33 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Scripted remediation runnable outside watchman	none

Description nutrilord0@gmail.com 2016-01-17 09:21:25 UTC

Hello we have application called keycloak-nutrilord.rhcloud.com on openshift and its completely blocked. We cant do anything with it. Can anybody help us please?

rhc force-stop-app keycloak -d
DEBUG: Using config file /home/martin/.openshift/express.conf
DEBUG: Git config 'git config --get rhc.app-id' returned ''
DEBUG: Git config 'git config --get rhc.app-name' returned ''
DEBUG: Git config 'git config --get rhc.domain-name' returned ''
DEBUG: Authenticating with RHC::Auth::Token
DEBUG: Connecting to https://openshift.redhat.com/broker/rest/api
DEBUG: Getting all domains
DEBUG: Client supports API versions 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
DEBUG: Created new httpclient
DEBUG: Request GET https://openshift.redhat.com/broker/rest/api
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
DEBUG:    code 200  844 ms
DEBUG: Server supports API versions 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7
DEBUG:    Using API version 1.7
DEBUG: Client API version 1.7 is not current. Refetching API
DEBUG: Request GET https://openshift.redhat.com/broker/rest/api
DEBUG:    code 200  186 ms
DEBUG: Using token authentication
DEBUG: Request GET https://openshift.redhat.com/broker/rest/domains
DEBUG:    code 200  187 ms
DEBUG: Using token authentication
DEBUG: Request GET https://openshift.redhat.com/broker/rest/domain/nutrilord/application/keycloak
DEBUG:    code 200  252 ms
DEBUG: Stopping application keycloak force-true
DEBUG: Using token authentication
DEBUG: Request POST https://openshift.redhat.com/broker/rest/application/5698c1ec0c1e66dd79000186/events
DEBUG:    code 422 5972 ms
Resources unavailable for operation. You may need to run 'rhc force-stop-app -a keycloak' and retry.
/sbin/runuser: cannot set user id: Resource temporarily unavailable


rhc ssh -a keycloak
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Connecting to 5698c1ec0c1e66dd79000186.com ...
The authenticity of host 'keycloak-nutrilord.rhcloud.com (54.84.57.164)' can't be established.
RSA key fingerprint is cf:ee:77:cb:0e:fc:02:d7:72:7e:ae:80:c0:90:88:a7.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'keycloak-nutrilord.rhcloud.com,54.84.57.164' (RSA) to the list of known hosts.
Write failed: Broken pipe

Comment 1 Dan McPherson 2016-01-18 13:06:13 UTC

Have you tried force-stop and then restarting as suggested?

Comment 2 nutrilord0@gmail.com 2016-01-18 14:02:57 UTC

Hello thank you for respond. We tried already force-stop. The output with fail is in the post. We cant use git, we cant create, delete,stop and we cant also delete ssh keys or push new ones. Long story short: We cant acces to the application keycloak-nutrilord.rhcloud.com with rhc,git and ssh. We also cant delete or restart application trought the user console and we cant manage ssh keys on the account. Wen can connect other two pplications using rhc, git or ssh, but only with created and working ssh key. We cant create new ones.

Comment 3 nutrilord0@gmail.com 2016-01-18 15:08:53 UTC

(In reply to Dan McPherson from comment #1)
> Have you tried force-stop and then restarting as suggested?

Hello thank you for your response. We tried already force-stop. The output with fail is in the post. We cant use git, we cant create, delete,stop and we cant also delete ssh keys or push new ones. Long story short: We cant acces to the application keycloak-nutrilord.rhcloud.com with rhc,git and ssh. We also cant delete or restart application trought the user console and we cant manage ssh keys on the account. Wen can connect other two pplications using rhc, git or ssh, but only with created and working ssh key. We cant create new ones.

Comment 4 Andy Grimm 2016-01-18 16:54:56 UTC

The reason that force-stop does not work in this case is that it simply inserts a force-stop operation into the "pending operations" queue.  As long as the application continues to exceed its "nproc" (number of processes / threads) limit, most pending operations in the queue will fail, and when one operation fails, the operations behind it in the queue are not processed.

I have manually killed process in your gear, which should allow you to do other things now, but it's very likely that you will hit this same problem again.  We are still considering further increases to the nproc limit for small gears, but for now, the workaround is to use a larger gear size (medium or large).

Comment 5 nutrilord0@gmail.com 2016-01-18 17:18:22 UTC

Thank you very much for your help. I think the problem was caused, when we tried to install MongoDB and we had no enought space on our gear. 

Thank you once again for quick help

Comment 6 John W. Lamb 2016-01-18 18:52:43 UTC

Just a note to add that JBoss-based carts tend to be resource heavy, both in terms of RAM and number of processes. If you continue to run into this issue, you might consider switching to a medium gear (via our bronze or silver plans), which provides 2x the RAM and more than 3x the maximum running processes.

Comment 7 Corey Daley 2016-01-19 12:05:57 UTC

if you are going to run a database in addition to a JBoss-based cart, I would suggest that you look into running a scaled application so that the database gets it's own gear and does not use up the resources that you want allocated to your Java web application.

Comment 8 Andy Grimm 2016-01-19 16:49:27 UTC

John, maybe the nproc limit is something we should address in watchman?  We are already getting a process list in the gear state plugin, but we currently aren't getting threads there.  If we got threads, we could compare the thread count against ::OpenShift::Runtime::Node.get_pam_limits(<uuid>)['nproc']

If the gear is at the limit, then we do a pkill and restart.

That might be a little expensive, because we'd have to read the limits config file for every started gear on a node.  We could read and cache the default nproc limit and only check the individual limit for gears whose thread count exceeds that, since that should be correct in the vast majority of cases.

I'm going to do a little PoC in a cron job to see if this helps us out with these cases, and we can integrate later if it works.

Comment 9 Andy Grimm 2016-01-20 14:33:07 UTC

Created attachment 1116665 [details]
Scripted remediation runnable outside watchman

Attached a script that I've started running in our environment.  So far, I have not seen any tracebacks in oo-admin-clear-pending-ops since this script was deployed.

I think integrating this functionality into watchman's gearstate plugin would be trivial.

Comment 11 Rory Thrasher 2016-04-04 20:36:33 UTC

Closing as a duplicate to focus nproc efforts onto a single bug.

*** This bug has been marked as a duplicate of bug 1265183 ***

Comment 12 Red Hat Bugzilla 2023-09-14 03:16:18 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days