Bug 1038809

Summary: rhc-watchman exiting on missing state file
Product: OpenShift Online Reporter: Sten Turpin <sten>
Component: ContainersAssignee: Jhon Honce <jhonce>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 2.xCC: bmeng, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 00:52:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sten Turpin 2013-12-05 20:54:40 UTC
Description of problem: rhc-watchman is failing due to an app with a missing statefile


Version-Release number of selected component (if applicable): 
rhc-node-1.17.2-1.el6oso.x86_64


How reproducible:
always

Steps to Reproduce:
1. remove an application's .state file

Actual results:
rhc-watchman logs the following, then exits: 

Dec  5 15:32:15 ex-std-node35 rhc-watchman[172943]: Running rhc-watchman => delay: 60s, exception threshold: 2, throttler: running
Dec  5 15:32:16 ex-std-node35 rhc-watchman[172943]: watchman caught #<Errno::ENOENT: No such file or directory - /var/lib/openshift/6b0436f911ca4a1ebde51ed4bcab26b0/app-root/runtime/.state>: No such file or directory - /var/lib/openshift/6b0436f911ca4a1ebde51ed4bcab26b0/app-root/runtime/.state. Retries left: 1
Dec  5 15:33:17 ex-std-node35 rhc-watchman[172943]: Running rhc-watchman => delay: 60s, exception threshold: 1, throttler: running
Dec  5 15:33:17 ex-std-node35 rhc-watchman[172943]: watchman caught #<Errno::ENOENT: No such file or directory - /var/lib/openshift/6b0436f911ca4a1ebde51ed4bcab26b0/app-root/runtime/.state>: No such file or directory - /var/lib/openshift/6b0436f911ca4a1ebde51ed4bcab26b0/app-root/runtime/.state. Retries left: 0
Dec  5 15:35:42 ex-std-node35 rhc-watchman[186946]: Starting rhc-watchman => delay: 20s, exception threshold: 10


Expected results:
rhc-watchman should not exit as a result of a missing application file. 

Additional info:

Comment 1 openshift-github-bot 2013-12-06 17:07:18 UTC
Commit pushed to master at https://github.com/openshift/li

https://github.com/openshift/li/commit/7bbb3d1d24cdf3cb1522260f0b5ba852f22f3745
Bug 1038809 - Added guard to check_hanging_stoplock with .state file was missing

* .state files are missing either when user deletes them, or after partial
  gear creates or destroys

Comment 2 Meng Bo 2013-12-09 09:35:41 UTC
Checked on devenv-stage_604, after delete all the .state file for gears, the rhc-watchman keeps running.

[root@ip-10-168-39-108 openshift]# tailf /var/log/messages 
Dec  9 03:59:46 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:00:06 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:00:26 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:00:46 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:01:01 ip-10-168-39-108 charlie: People are logged in: 2
Dec  9 04:01:06 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:01:26 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:01:46 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:02:06 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:02:26 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:02:46 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:03:06 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running
Dec  9 04:03:26 ip-10-168-39-108 rhc-watchman[1962]: Running rhc-watchman => delay: 20s, exception threshold: 10, throttler: running


Move bug to verified.