Bug 1061926 - Gears idle but not in idlerdb
Summary: Gears idle but not in idlerdb
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Jhon Honce
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-05 21:38 UTC by Andy Grimm
Modified: 2016-11-08 03:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-05-15 15:28:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Error Log When Trying To Restart An Idled Application (1.93 KB, text/plain)
2014-03-13 13:45 UTC, corinthianmonthly
no flags Details

Description Andy Grimm 2014-02-05 21:38:38 UTC
Description of problem:

We had a situation today where two gears on a node had "idle" in their state file, but they also had a .stop_lock file, and they were not in the idlerdb.  The net effect of this is that the app will not unidle when accessed.

I worked around the problem by removing the .stop_lock file and running "oo-admin-ctl-gears idle <uuid>", which then added the gear to the idlerdb.

I'm still researching how wide-spread this problem is in our environment.

Version-Release number of selected component (if applicable):

openshift-origin-node-util-1.18.8-1.el6oso.noarch

Comment 1 Jhon Honce 2014-02-07 18:25:05 UTC
Please update with your research

Comment 2 Andy Grimm 2014-02-07 19:46:12 UTC
Research shows that over 99% of our idle apps currently have a .stop_lock file.
Approximately 1135 of them are also not in their node's idler.txt, which means they will not properly unidle.

Comment 3 Lei Zhang 2014-02-10 08:03:12 UTC
Test on devenv_4352, after idle one gear using 'oo-admin-ctl-gears idlegear $gearid' command, it shows 'Idling gear $gearid... [OK]', but when visit the app via browser(http://myjbossews10-chunchen.dev.rhcloud.com/), it shows "The page isn't redirecting properly" and gear still in idle status, when unidle the gear using 'oo-admin-ctl-gears unidlegear $gearid' command, it prompts error message.


[root@ip-10-225-9-204 ~]# oo-admin-ctl-gears idlegear 52f87f331de5448356000258
Idling gear 52f87f331de5448356000258 ... [ OK ]
[root@ip-10-225-9-204 ~]# oo-admin-ctl-gears unidlegear 52f87f331de5448356000258
error: Gear is locked: 52f87f331de5448356000258. Use --trace to view backtrace

Comment 4 Jhon Honce 2014-02-10 18:22:38 UTC
(In reply to Andy Grimm from comment #2)
> Research shows that over 99% of our idle apps currently have a .stop_lock
> file.
> Approximately 1135 of them are also not in their node's idler.txt, which
> means they will not properly unidle.

Idled gears _should_ have a stop_lock and be in idler.txt.

Comment 5 Andy Grimm 2014-02-13 00:59:23 UTC
The problem appears to be that gear moves do not adjust the idler database on the target system.

Comment 6 Andy Grimm 2014-02-13 01:28:42 UTC
Nevermind my previous comment.  The pattern just looked suspicious because I picked a newer node to look at, where the majority of the idle gears not in the idler db happened to have been moved from elsewhere.

Comment 8 corinthianmonthly 2014-03-13 13:45:11 UTC
Created attachment 874003 [details]
Error Log When Trying To Restart An Idled Application

Comment 9 Jhon Honce 2014-03-18 23:25:00 UTC
Fix in https://github.com/openshift/origin-server/pull/4996

Comment 10 openshift-github-bot 2014-03-19 00:53:56 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/7e08d4d4fc40ce39566b870a06f5eb22c5d1cf3f
Bug 1061926 - Ensure frontend unidled if backend unidled

Comment 11 Meng Bo 2014-03-20 12:11:37 UTC
Checked on devenv_4540, after gear auto idled by auto-idler, the gear has .stop_lock under app-root/runtime/ and gear info can be found in idler.txt and idler.db.

After access the app dns, the gear info was cleared from idler.txt and idler.db, .stop_lock also be removed form app-root/runtime/.

Move bug verified.

Comment 12 Andy Grimm 2014-03-20 13:25:54 UTC
I apologize that my mention of the stop_lock file may have confused the issue here somewhat, but the real problem that we need to address here is not the stop_lock file.  It is that, on idle, a gear sometimes does not end up in the idlerdb.  If they are not in the idlerdb, then the openshift-idler rewrite map will not redirect to the restorer script.

Comment 13 Jhon Honce 2014-03-20 16:45:38 UTC
Andy, the code change commit addresses your issue and has nothing to do with stop lock.  If the issue resurfaces after this code is deployed, please reopen.

Comment 14 Andy Grimm 2014-04-14 18:58:09 UTC
Reopening, as we are back to having 91 gears in this state in OpenShift Online.

Comment 16 Andy Grimm 2014-04-14 19:27:52 UTC
I suspect that the most reliable way to reproduce this is to create an app with the following properties:

1) the app pings itself via the frontend apache server every second
2) the app's control script sleeps a few seconds in its "stop" function before it actually shuts down the web server process

Comment 17 Jhon Honce 2014-04-15 19:08:31 UTC
Lock file fix introduced in https://github.com/openshift/origin-server/pull/5274

Comment 18 openshift-github-bot 2014-04-16 18:26:00 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/3b94f0e7085940f283a86a0eb0683947a934a9cc
Bug 1061926 - Use lock file to prevent race between idle/unidle

Comment 19 Meng Bo 2014-04-17 11:44:34 UTC
Checked on devenv_4692, the gear.{uuid} lock was introduced for the gear idle/unidle.

Do parallel idle/unidle for 4 gears, the gear.{uuid} lock was generated per gear, and all the gear info can be found in idler.txt and can be removed after unidle.

Move bug to verified.


# ls /var/lock/
gear.534fe0e7fd382ff534000085
gear.534fe10bfd382ff534000099
gear.534fe128fd382ff5340000ad
gear.534fe14cfd382ff5340000c1


Note You need to log in before you can comment on or make changes to this bug.