| Summary: | Cant stop/start/restart/delete or ssh on the application | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Online | Reporter: | nutrilord0 <nutrilord0> | ||||
| Component: | Image | Assignee: | Rory Thrasher <rthrashe> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Wang Haoran <haowang> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 2.x | CC: | abhgupta, agrimm, aos-bugs, cdaley, jokerman, mmccomas, nutrilord0, wzheng | ||||
| Target Milestone: | --- | Flags: | nutrilord0:
needinfo?
(nutrilord0) |
||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-04-04 20:36:33 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
nutrilord0@gmail.com
2016-01-17 09:21:25 UTC
Have you tried force-stop and then restarting as suggested? Hello thank you for respond. We tried already force-stop. The output with fail is in the post. We cant use git, we cant create, delete,stop and we cant also delete ssh keys or push new ones. Long story short: We cant acces to the application keycloak-nutrilord.rhcloud.com with rhc,git and ssh. We also cant delete or restart application trought the user console and we cant manage ssh keys on the account. Wen can connect other two pplications using rhc, git or ssh, but only with created and working ssh key. We cant create new ones. (In reply to Dan McPherson from comment #1) > Have you tried force-stop and then restarting as suggested? Hello thank you for your response. We tried already force-stop. The output with fail is in the post. We cant use git, we cant create, delete,stop and we cant also delete ssh keys or push new ones. Long story short: We cant acces to the application keycloak-nutrilord.rhcloud.com with rhc,git and ssh. We also cant delete or restart application trought the user console and we cant manage ssh keys on the account. Wen can connect other two pplications using rhc, git or ssh, but only with created and working ssh key. We cant create new ones. The reason that force-stop does not work in this case is that it simply inserts a force-stop operation into the "pending operations" queue. As long as the application continues to exceed its "nproc" (number of processes / threads) limit, most pending operations in the queue will fail, and when one operation fails, the operations behind it in the queue are not processed. I have manually killed process in your gear, which should allow you to do other things now, but it's very likely that you will hit this same problem again. We are still considering further increases to the nproc limit for small gears, but for now, the workaround is to use a larger gear size (medium or large). Thank you very much for your help. I think the problem was caused, when we tried to install MongoDB and we had no enought space on our gear. Thank you once again for quick help Just a note to add that JBoss-based carts tend to be resource heavy, both in terms of RAM and number of processes. If you continue to run into this issue, you might consider switching to a medium gear (via our bronze or silver plans), which provides 2x the RAM and more than 3x the maximum running processes. if you are going to run a database in addition to a JBoss-based cart, I would suggest that you look into running a scaled application so that the database gets it's own gear and does not use up the resources that you want allocated to your Java web application. John, maybe the nproc limit is something we should address in watchman? We are already getting a process list in the gear state plugin, but we currently aren't getting threads there. If we got threads, we could compare the thread count against ::OpenShift::Runtime::Node.get_pam_limits(<uuid>)['nproc'] If the gear is at the limit, then we do a pkill and restart. That might be a little expensive, because we'd have to read the limits config file for every started gear on a node. We could read and cache the default nproc limit and only check the individual limit for gears whose thread count exceeds that, since that should be correct in the vast majority of cases. I'm going to do a little PoC in a cron job to see if this helps us out with these cases, and we can integrate later if it works. Created attachment 1116665 [details]
Scripted remediation runnable outside watchman
Attached a script that I've started running in our environment. So far, I have not seen any tracebacks in oo-admin-clear-pending-ops since this script was deployed.
I think integrating this functionality into watchman's gearstate plugin would be trivial.
Closing as a duplicate to focus nproc efforts onto a single bug. *** This bug has been marked as a duplicate of bug 1265183 *** |