999183 – [watchman] ArgumentError when parsing date

Bug 999183 - [watchman] ArgumentError when parsing date

Summary: [watchman] ArgumentError when parsing date

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	2.x
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Michal Fojtik
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-20 21:44 UTC by Kenny Woodson
Modified:	2015-05-14 23:26 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-19 16:47:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Kenny Woodson 2013-08-20 21:44:30 UTC

Description of problem:

Lately the importance of watchman has increased due to its handling of throttling.  Ops has added monitoring for this service and therefore we have seen, in one week, 750 restarts for the watchman service.

This is what we see in the logs:
Aug 20 16:14:40 ex-std-node148 rhc-watchman[31800]: Running rhc-watchman => delay: 60s, exception threshold: 5
Aug 20 16:14:41 ex-std-node148 rhc-watchman[31800]: watchman caught #<ArgumentError: invalid date>: invalid date. Retries left: 4

It appears there is an invalid date that is being parsed over and over which causes watchman to fail after 10 exceptions.

Version-Release number of selected component (if applicable):
Current

How reproducible:
Very.

Steps to Reproduce:
1. Create a date that is unreadable for rhc-watchman
2. View the exception/retry loop in /var/log/messages
3.

Actual results:
rhc-watchman dies after 10 exceptions are caught.

Expected results:
rhc-watchman should attempt to parse the incorrect date.  If it fails after a number of attempts it should correct the date.  If it is unable to do so, print an exception to syslog and continue processing.



Additional info:

This is happening on a high number of our servers.  This needs to be repaired/fixed/patched.

Comment 1 Kenny Woodson 2013-08-21 14:04:39 UTC

Here is the specific line that is killing the parser:
---------
2013/02013/05/13 12:24:45,313 ERROR [stderr] (Thread-0 (HornetQ-scheduled-threads-33041257)) Exception in thread "Thread-0 (HornetQ-scheduled-threads-33041257)" java.lang.OutOfMemoryError: unable to create new native thread
---------

This happens on the strptime call:
------------
File.open(server_log).grep(/ java.lang.OutOfMemoryError/) {|msg|
      Syslog.info("MSG => #{msg}") #DEBUG
      # timezones are just a PITA. server.log message doesn't include timezone or date so inject both from today
      ts = DateTime.strptime(msg, '%Y/%m/%d %T')
      timestamp = DateTime.civil(ts.year, ts.month, ts.day, ts.hour, ts.min, ts.sec, epoch.zone)
      next if epoch > timestamp.

      cache_incident(uuid, timestamp, app_env)
}
-------------

Comment 2 Kenny Woodson 2013-08-21 14:27:22 UTC

The file that is causing the problem:

/var/lib/openshift/518809f74382ec8dfb000015/jbossas//logs/server.log

Comment 3 Michal Fojtik 2013-08-21 14:39:49 UTC

I have fix which make watchman ignore the invalid timestamps with a warning dumped to syslog:

https://github.com/openshift/li/pull/1843

Comment 4 Kenny Woodson 2013-08-30 15:03:47 UTC

Aug 30 10:11:44 ex-std-node74 rhc-watchman[15274]: watchman caught #<ArgumentError: invalid date>: invalid date. Retries left: 0

We are still seeing this issue in production on a few of our nodes.  I added some debugging inside of watchman to see where it was failing.  I see its failing to parse the java server.log

Aug 30 10:51:54 ex-std-node148 rhc-watchman[25453]: PROBLEM PARSING: /var/lib/openshift/518809f74382ec8dfb000015/jbossas//logs/server.log 

Here is the actual line it is failing on.

2013/02013/05/13 12:24:45,313 ERROR [stderr] (Thread-0 (HornetQ-scheduled-threads-33041257)) Exception in thread "Thread-0 (HornetQ-scheduled-threads-33041257)" java.lang.OutOfMemoryError: unable to create new native thread

./rhc-watchman:156:in `strptime'

Comment 5 openshift-github-bot 2013-08-31 01:42:14 UTC

Commit pushed to master at https://github.com/openshift/li

https://github.com/openshift/li/commit/bac7a1dfb4af4765815aadaf51c9a134c8ab3575
Bug 999183 - Make rhc-watchman ignore messages with invalid timestamp

Comment 6 Meng Bo 2013-09-05 10:14:54 UTC

Checked on devenv_3745,

With about 50 jbossas apps on node.
Modify the timestamp for the server.log to some invalid value. Like the comment#4.
Restart the gear.

Watch the /var/log/messages
There is no date related error, and the rhc-watchman did not restart.

Move bug to verified.

Note You need to log in before you can comment on or make changes to this bug.