Bug 1061926
Summary: | Gears idle but not in idlerdb | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Online | Reporter: | Andy Grimm <agrimm> | ||||
Component: | Containers | Assignee: | Jhon Honce <jhonce> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 2.x | CC: | bmeng, corinthianmonthly, jgoulding, jhonce, lzhang | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-05-15 15:28:09 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Andy Grimm
2014-02-05 21:38:38 UTC
Please update with your research Research shows that over 99% of our idle apps currently have a .stop_lock file. Approximately 1135 of them are also not in their node's idler.txt, which means they will not properly unidle. Test on devenv_4352, after idle one gear using 'oo-admin-ctl-gears idlegear $gearid' command, it shows 'Idling gear $gearid... [OK]', but when visit the app via browser(http://myjbossews10-chunchen.dev.rhcloud.com/), it shows "The page isn't redirecting properly" and gear still in idle status, when unidle the gear using 'oo-admin-ctl-gears unidlegear $gearid' command, it prompts error message. [root@ip-10-225-9-204 ~]# oo-admin-ctl-gears idlegear 52f87f331de5448356000258 Idling gear 52f87f331de5448356000258 ... [ OK ] [root@ip-10-225-9-204 ~]# oo-admin-ctl-gears unidlegear 52f87f331de5448356000258 error: Gear is locked: 52f87f331de5448356000258. Use --trace to view backtrace (In reply to Andy Grimm from comment #2) > Research shows that over 99% of our idle apps currently have a .stop_lock > file. > Approximately 1135 of them are also not in their node's idler.txt, which > means they will not properly unidle. Idled gears _should_ have a stop_lock and be in idler.txt. The problem appears to be that gear moves do not adjust the idler database on the target system. Nevermind my previous comment. The pattern just looked suspicious because I picked a newer node to look at, where the majority of the idle gears not in the idler db happened to have been moved from elsewhere. Created attachment 874003 [details]
Error Log When Trying To Restart An Idled Application
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/7e08d4d4fc40ce39566b870a06f5eb22c5d1cf3f Bug 1061926 - Ensure frontend unidled if backend unidled Checked on devenv_4540, after gear auto idled by auto-idler, the gear has .stop_lock under app-root/runtime/ and gear info can be found in idler.txt and idler.db. After access the app dns, the gear info was cleared from idler.txt and idler.db, .stop_lock also be removed form app-root/runtime/. Move bug verified. I apologize that my mention of the stop_lock file may have confused the issue here somewhat, but the real problem that we need to address here is not the stop_lock file. It is that, on idle, a gear sometimes does not end up in the idlerdb. If they are not in the idlerdb, then the openshift-idler rewrite map will not redirect to the restorer script. Andy, the code change commit addresses your issue and has nothing to do with stop lock. If the issue resurfaces after this code is deployed, please reopen. Reopening, as we are back to having 91 gears in this state in OpenShift Online. I suspect that the most reliable way to reproduce this is to create an app with the following properties: 1) the app pings itself via the frontend apache server every second 2) the app's control script sleeps a few seconds in its "stop" function before it actually shuts down the web server process Lock file fix introduced in https://github.com/openshift/origin-server/pull/5274 Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/3b94f0e7085940f283a86a0eb0683947a934a9cc Bug 1061926 - Use lock file to prevent race between idle/unidle Checked on devenv_4692, the gear.{uuid} lock was introduced for the gear idle/unidle. Do parallel idle/unidle for 4 gears, the gear.{uuid} lock was generated per gear, and all the gear info can be found in idler.txt and can be removed after unidle. Move bug to verified. # ls /var/lock/ gear.534fe0e7fd382ff534000085 gear.534fe10bfd382ff534000099 gear.534fe128fd382ff5340000ad gear.534fe14cfd382ff5340000c1 |