Bug 1006557 - stop_lock gears have status of "started"
stop_lock gears have status of "started"
Status: CLOSED CURRENTRELEASE
Product: OpenShift Online
Classification: Red Hat
Component: Image (Show other bugs)
2.x
Unspecified Unspecified
medium Severity low
: ---
: ---
Assigned To: Paul Morie
libra bugs
:
Depends On:
Blocks: 1207486
  Show dependency treegraph
 
Reported: 2013-09-10 15:54 EDT by Matt Woodson
Modified: 2015-05-14 20:34 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1207486 (view as bug list)
Environment:
Last Closed: 2014-01-29 19:48:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matt Woodson 2013-09-10 15:54:26 EDT
Description of problem:

There are many gears we have found that have a .stop_lock file but their .state file shows started. Also, they don't have any running processes.

# ll /var/lib/openshift/4cd359f0852e4146ba8a6600de5f213b/app-root/runtime/.stop_lock
-rw-r--r--. 1 4cd359f0852e4146ba8a6600de5f213b 4cd359f0852e4146ba8a6600de5f213b 0 Jun  5 16:49 /var/lib/openshift/4cd359f0852e4146ba8a6600de5f213b/app-root/runtime/.stop_lock

# cat /var/lib/openshift/4cd359f0852e4146ba8a6600de5f213b/app-root/runtime/.state 
started

# ps -u 4cd359f0852e4146ba8a6600de5f213b
  PID TTY          TIME CMD
# 

This is causing alerts to go off because we show a high number of gears in the "started" state, but without any processes.


NOTE: As part of the fix for this bug, please add a check for this to oo-accept-node so that bugs like this are found with the OpenShift unit tests.



Version-Release number of selected component (if applicable):
rhc-node-1.13.6-1.el6oso.x86_64


How reproducible:
unknown, found in PROD


Steps to Reproduce:
1. unknown, found in PROD


Actual results:
Gears with a .stop_lock file and "started" in the .state file.


Expected results:
Gears with .stop_lock file should always be in a "stopped" state.
Comment 1 Jhon Honce 2013-09-10 16:47:17 EDT
Any change of state, ie starting the application will reset the values correctly.

Are these V1 gears upgraded to V2?
Comment 2 Matt Woodson 2013-09-10 17:03:09 EDT
(In reply to Jhon Honce from comment #1)
> Any change of state, ie starting the application will reset the values
> correctly.
> 

The .state file is update when stop/starting one of these gears.



> Are these V1 gears upgraded to V2?

On the host I looked at, all of the gears in this state have been migrated from V1 to V2.  To tell that it was migrated to V2 I checked for the existence of .env/CARTRIDGE_VERSION_2
Comment 3 Jhon Honce 2013-09-10 17:18:46 EDT
This has only been found on V1 created applications.

Resolving issue would require writing a script to transverse all gears on a Node and ensuring that stop_lock and .state are consistent.
Comment 4 Paul Morie 2013-11-05 14:03:21 EST
I have modified watchman to detect and correct this when it happens.
Comment 5 openshift-github-bot 2013-11-05 16:09:09 EST
Commit pushed to master at https://github.com/openshift/li

https://github.com/openshift/li/commit/300f8aaa02a1d3631f2c181e633c276a526d9c17
Fix bug 1006557: make watchman check for hanging stop_lock files
Comment 7 Meng Bo 2013-11-08 02:04:54 EST
Checked on devenv_4003,

1. Touch .stop_lock for a started gear,
[root@ip-10-239-15-120 runtime]# ls -al
total 28
drwxr-x---. 5 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c 4096 Nov  8 01:59 .
drwxr-xr-x. 4 root                     527c7e73aef9e993f300000c 4096 Nov  8 01:02 ..
drwxr-x---. 2 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c 4096 Nov  8 01:02 build-dependencies
lrwxrwxrwx. 1 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c    7 Nov  8 01:02 data -> ../data
drwxr-x---. 3 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c 4096 Nov  8 01:02 dependencies
drwxr-x---. 6 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c 4096 Nov  8 01:04 repo
-rw-r-----. 1 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c    8 Nov  8 01:28 .state
-rw-r-----. 1 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c    8 Nov  8 01:28 .stop_lock


2. Check the /var/log/messages for the rhc-watchman log
Found:
Nov  8 02:00:10 ip-10-239-15-120 rhc-watchman[2051]: watchman deleted stop lock for user 527c7e73aef9e993f300000c because the state of the gear was STARTED

3. Check the stop_lock in the gear dir again, the stop_lock has been removed automatically.
[root@ip-10-239-15-120 runtime]# ls -al
total 24
drwxr-x---. 5 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c 4096 Nov  8 02:00 .
drwxr-xr-x. 4 root                     527c7e73aef9e993f300000c 4096 Nov  8 01:02 ..
drwxr-x---. 2 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c 4096 Nov  8 01:02 build-dependencies
lrwxrwxrwx. 1 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c    7 Nov  8 01:02 data -> ../data
drwxr-x---. 3 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c 4096 Nov  8 01:02 dependencies
drwxr-x---. 6 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c 4096 Nov  8 01:04 repo
-rw-r-----. 1 527c7e73aef9e993f300000c 527c7e73aef9e993f300000c    8 Nov  8 01:28 .state


For stopped gear, it will not be impacted.

Move bug to verified.

Note You need to log in before you can comment on or make changes to this bug.