Hello we have application called keycloak-nutrilord.rhcloud.com on openshift and its completely blocked. We cant do anything with it. Can anybody help us please? rhc force-stop-app keycloak -d DEBUG: Using config file /home/martin/.openshift/express.conf DEBUG: Git config 'git config --get rhc.app-id' returned '' DEBUG: Git config 'git config --get rhc.app-name' returned '' DEBUG: Git config 'git config --get rhc.domain-name' returned '' DEBUG: Authenticating with RHC::Auth::Token DEBUG: Connecting to https://openshift.redhat.com/broker/rest/api DEBUG: Getting all domains DEBUG: Client supports API versions 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 DEBUG: Created new httpclient DEBUG: Request GET https://openshift.redhat.com/broker/rest/api RSA 1024 bit CA certificates are loaded due to old openssl compatibility DEBUG: code 200 844 ms DEBUG: Server supports API versions 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 DEBUG: Using API version 1.7 DEBUG: Client API version 1.7 is not current. Refetching API DEBUG: Request GET https://openshift.redhat.com/broker/rest/api DEBUG: code 200 186 ms DEBUG: Using token authentication DEBUG: Request GET https://openshift.redhat.com/broker/rest/domains DEBUG: code 200 187 ms DEBUG: Using token authentication DEBUG: Request GET https://openshift.redhat.com/broker/rest/domain/nutrilord/application/keycloak DEBUG: code 200 252 ms DEBUG: Stopping application keycloak force-true DEBUG: Using token authentication DEBUG: Request POST https://openshift.redhat.com/broker/rest/application/5698c1ec0c1e66dd79000186/events DEBUG: code 422 5972 ms Resources unavailable for operation. You may need to run 'rhc force-stop-app -a keycloak' and retry. /sbin/runuser: cannot set user id: Resource temporarily unavailable rhc ssh -a keycloak RSA 1024 bit CA certificates are loaded due to old openssl compatibility Connecting to 5698c1ec0c1e66dd79000186.com ... The authenticity of host 'keycloak-nutrilord.rhcloud.com (54.84.57.164)' can't be established. RSA key fingerprint is cf:ee:77:cb:0e:fc:02:d7:72:7e:ae:80:c0:90:88:a7. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'keycloak-nutrilord.rhcloud.com,54.84.57.164' (RSA) to the list of known hosts. Write failed: Broken pipe
Have you tried force-stop and then restarting as suggested?
Hello thank you for respond. We tried already force-stop. The output with fail is in the post. We cant use git, we cant create, delete,stop and we cant also delete ssh keys or push new ones. Long story short: We cant acces to the application keycloak-nutrilord.rhcloud.com with rhc,git and ssh. We also cant delete or restart application trought the user console and we cant manage ssh keys on the account. Wen can connect other two pplications using rhc, git or ssh, but only with created and working ssh key. We cant create new ones.
(In reply to Dan McPherson from comment #1) > Have you tried force-stop and then restarting as suggested? Hello thank you for your response. We tried already force-stop. The output with fail is in the post. We cant use git, we cant create, delete,stop and we cant also delete ssh keys or push new ones. Long story short: We cant acces to the application keycloak-nutrilord.rhcloud.com with rhc,git and ssh. We also cant delete or restart application trought the user console and we cant manage ssh keys on the account. Wen can connect other two pplications using rhc, git or ssh, but only with created and working ssh key. We cant create new ones.
The reason that force-stop does not work in this case is that it simply inserts a force-stop operation into the "pending operations" queue. As long as the application continues to exceed its "nproc" (number of processes / threads) limit, most pending operations in the queue will fail, and when one operation fails, the operations behind it in the queue are not processed. I have manually killed process in your gear, which should allow you to do other things now, but it's very likely that you will hit this same problem again. We are still considering further increases to the nproc limit for small gears, but for now, the workaround is to use a larger gear size (medium or large).
Thank you very much for your help. I think the problem was caused, when we tried to install MongoDB and we had no enought space on our gear. Thank you once again for quick help
Just a note to add that JBoss-based carts tend to be resource heavy, both in terms of RAM and number of processes. If you continue to run into this issue, you might consider switching to a medium gear (via our bronze or silver plans), which provides 2x the RAM and more than 3x the maximum running processes.
if you are going to run a database in addition to a JBoss-based cart, I would suggest that you look into running a scaled application so that the database gets it's own gear and does not use up the resources that you want allocated to your Java web application.
John, maybe the nproc limit is something we should address in watchman? We are already getting a process list in the gear state plugin, but we currently aren't getting threads there. If we got threads, we could compare the thread count against ::OpenShift::Runtime::Node.get_pam_limits(<uuid>)['nproc'] If the gear is at the limit, then we do a pkill and restart. That might be a little expensive, because we'd have to read the limits config file for every started gear on a node. We could read and cache the default nproc limit and only check the individual limit for gears whose thread count exceeds that, since that should be correct in the vast majority of cases. I'm going to do a little PoC in a cron job to see if this helps us out with these cases, and we can integrate later if it works.
Created attachment 1116665 [details] Scripted remediation runnable outside watchman Attached a script that I've started running in our environment. So far, I have not seen any tracebacks in oo-admin-clear-pending-ops since this script was deployed. I think integrating this functionality into watchman's gearstate plugin would be trivial.
Closing as a duplicate to focus nproc efforts onto a single bug. *** This bug has been marked as a duplicate of bug 1265183 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days