Description of problem: We had a situation today where two gears on a node had "idle" in their state file, but they also had a .stop_lock file, and they were not in the idlerdb. The net effect of this is that the app will not unidle when accessed. I worked around the problem by removing the .stop_lock file and running "oo-admin-ctl-gears idle <uuid>", which then added the gear to the idlerdb. I'm still researching how wide-spread this problem is in our environment. Version-Release number of selected component (if applicable): openshift-origin-node-util-1.18.8-1.el6oso.noarch
Please update with your research
Research shows that over 99% of our idle apps currently have a .stop_lock file. Approximately 1135 of them are also not in their node's idler.txt, which means they will not properly unidle.
Test on devenv_4352, after idle one gear using 'oo-admin-ctl-gears idlegear $gearid' command, it shows 'Idling gear $gearid... [OK]', but when visit the app via browser(http://myjbossews10-chunchen.dev.rhcloud.com/), it shows "The page isn't redirecting properly" and gear still in idle status, when unidle the gear using 'oo-admin-ctl-gears unidlegear $gearid' command, it prompts error message. [root@ip-10-225-9-204 ~]# oo-admin-ctl-gears idlegear 52f87f331de5448356000258 Idling gear 52f87f331de5448356000258 ... [ OK ] [root@ip-10-225-9-204 ~]# oo-admin-ctl-gears unidlegear 52f87f331de5448356000258 error: Gear is locked: 52f87f331de5448356000258. Use --trace to view backtrace
(In reply to Andy Grimm from comment #2) > Research shows that over 99% of our idle apps currently have a .stop_lock > file. > Approximately 1135 of them are also not in their node's idler.txt, which > means they will not properly unidle. Idled gears _should_ have a stop_lock and be in idler.txt.
The problem appears to be that gear moves do not adjust the idler database on the target system.
Nevermind my previous comment. The pattern just looked suspicious because I picked a newer node to look at, where the majority of the idle gears not in the idler db happened to have been moved from elsewhere.
Created attachment 874003 [details] Error Log When Trying To Restart An Idled Application
Fix in https://github.com/openshift/origin-server/pull/4996
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/7e08d4d4fc40ce39566b870a06f5eb22c5d1cf3f Bug 1061926 - Ensure frontend unidled if backend unidled
Checked on devenv_4540, after gear auto idled by auto-idler, the gear has .stop_lock under app-root/runtime/ and gear info can be found in idler.txt and idler.db. After access the app dns, the gear info was cleared from idler.txt and idler.db, .stop_lock also be removed form app-root/runtime/. Move bug verified.
I apologize that my mention of the stop_lock file may have confused the issue here somewhat, but the real problem that we need to address here is not the stop_lock file. It is that, on idle, a gear sometimes does not end up in the idlerdb. If they are not in the idlerdb, then the openshift-idler rewrite map will not redirect to the restorer script.
Andy, the code change commit addresses your issue and has nothing to do with stop lock. If the issue resurfaces after this code is deployed, please reopen.
Reopening, as we are back to having 91 gears in this state in OpenShift Online.
I suspect that the most reliable way to reproduce this is to create an app with the following properties: 1) the app pings itself via the frontend apache server every second 2) the app's control script sleeps a few seconds in its "stop" function before it actually shuts down the web server process
Lock file fix introduced in https://github.com/openshift/origin-server/pull/5274
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/3b94f0e7085940f283a86a0eb0683947a934a9cc Bug 1061926 - Use lock file to prevent race between idle/unidle
Checked on devenv_4692, the gear.{uuid} lock was introduced for the gear idle/unidle. Do parallel idle/unidle for 4 gears, the gear.{uuid} lock was generated per gear, and all the gear info can be found in idler.txt and can be removed after unidle. Move bug to verified. # ls /var/lock/ gear.534fe0e7fd382ff534000085 gear.534fe10bfd382ff534000099 gear.534fe128fd382ff5340000ad gear.534fe14cfd382ff5340000c1