Bug 998704 - [Watchman] Exception: NoMethodError
[Watchman] Exception: NoMethodError
Status: CLOSED CURRENTRELEASE
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
1.x
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Fotios Lindiakos
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-19 16:04 EDT by Kenny Woodson
Modified: 2014-01-12 20:43 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-29 08:53:07 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kenny Woodson 2013-08-19 16:04:00 EDT
Description of problem:

When debugging a few issues I noticed these messages in rsyslog:

Aug 19 15:51:21 ex-std-node43 rhc-watchman[19981]: watchman caught #<NoMethodError: undefined method `>=' for nil:NilClass>: undefined method `>=' for nil:NilClass. Retries left: 1
Aug 19 15:53:13 ex-std-node43 rhc-watchman[19981]: watchman caught #<NoMethodError: undefined method `>=' for nil:NilClass>: undefined method `>=' for nil:NilClass. Retries left: 0


Version-Release number of selected component (if applicable):
Current

How reproducible:
Quite a few of our nodes are seeing this issue.  I'd say it would be reproducible but I'm not sure what the NilClass is.

Steps to Reproduce:
1.
2.
3.

Actual results:
Watchman throws an exception.

Expected results:

Watchman should be hardened and should not be throwing and exception for this specific issue.

Additional info:

We depend on watchman to manage resources and currently see that it fails often.  We have even written a restart handler for the service.  Let's harden it.
Comment 1 Fotios Lindiakos 2013-08-20 17:59:33 EDT
Fix in this PR, undergoing review and testing.

https://github.com/openshift/origin-server/pull/3443
Comment 2 Meng Bo 2013-08-22 06:22:30 EDT
Aug 22 01:54:40 ip-10-196-51-239 rhc-watchman[1928]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 01:54:40 ip-10-196-51-239 rhc-watchman[1928]: Throttler: REFUSED restore => 342483957415281973788672 (unknown utilization)
Aug 22 01:55:00 ip-10-196-51-239 rhc-watchman[1928]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 01:55:00 ip-10-196-51-239 rhc-watchman[1928]: Throttler: REFUSED restore => 342483957415281973788672 (unknown utilization)
Aug 22 01:55:20 ip-10-196-51-239 rhc-watchman[1928]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 01:55:20 ip-10-196-51-239 rhc-watchman[1928]: Throttler: REFUSED restore => 342483957415281973788672 (unknown utilization)
Aug 22 01:55:40 ip-10-196-51-239 rhc-watchman[1928]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 01:55:40 ip-10-196-51-239 rhc-watchman[1928]: Throttler: REFUSED restore => 342483957415281973788672 (unknown utilization)
Aug 22 01:56:00 ip-10-196-51-239 rhc-watchman[1928]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 01:56:00 ip-10-196-51-239 rhc-watchman[1928]: Throttler: REFUSED restore => 342483957415281973788672 (unknown utilization)
Aug 22 01:56:20 ip-10-196-51-239 rhc-watchman[1928]: Running rhc-watchman => delay: 20s, exception threshold: 10
Aug 22 01:56:20 ip-10-196-51-239 rhc-watchman[1928]: watchman caught #<ArgumentError: comparison of String with Float failed>: comparison of String with Float failed. Retries left: 9
Aug 22 01:56:40 ip-10-196-51-239 rhc-watchman[1928]: Running rhc-watchman => delay: 20s, exception threshold: 9
Aug 22 01:56:40 ip-10-196-51-239 rhc-watchman[1928]: watchman caught #<ArgumentError: comparison of String with Float failed>: comparison of String with Float failed. Retries left: 8
Aug 22 01:57:00 ip-10-196-51-239 rhc-watchman[1928]: Running rhc-watchman => delay: 20s, exception threshold: 8
Aug 22 01:57:00 ip-10-196-51-239 rhc-watchman[1928]: watchman caught #<ArgumentError: comparison of String with Float failed>: comparison of String with Float failed. Retries left: 7
Aug 22 01:57:20 ip-10-196-51-239 rhc-watchman[1928]: Running rhc-watchman => delay: 20s, exception threshold: 7
Aug 22 01:57:20 ip-10-196-51-239 rhc-watchman[1928]: watchman caught #<ArgumentError: comparison of String with Float failed>: comparison of String with Float failed. Retries left: 6




Meet the above ArgumentError during my testing. And not sure how to reproduce it.
Comment 3 Fotios Lindiakos 2013-08-22 13:36:23 EDT
Tried another fix: https://github.com/openshift/origin-server/pull/3470

I have not been able to reproduce this, but I added some additional logging information. This should no longer fail, but please check /var/log/messages for "Throttler: problem in find for ..." and attach the log if it's found. I will watch the logs from some Jenkins runs as well. Hopefully with this information I can find the root cause of the problem.
Comment 4 openshift-github-bot 2013-08-22 19:17:50 EDT
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/8c575e912dd486a5b04bfcd99b126ba6f9db1547
Merge pull request #3470 from fotioslindiakos/Bug998704

Merged by openshift-bot
Comment 5 Meng Bo 2013-08-23 05:46:59 EDT
Checked on devenv-stage_452, did not meet such error in /var/log/messages

Move bug to verified.

Note You need to log in before you can comment on or make changes to this bug.