Bug 1006619 - Log events sources are unstable and events are lost due to Log4j log parsing not thread safe
Summary: Log events sources are unstable and events are lost due to Log4j log parsing ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Monitoring - Events
Version: JON 3.1.2
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ER01
: JON 3.2.0
Assignee: John Sanda
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On: 846082
Blocks: 1015715
TreeView+ depends on / blocked
 
Reported: 2013-09-10 23:16 UTC by Larry O'Leary
Modified: 2018-12-03 19:54 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1015715 (view as bug list)
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
All events from the log files are reported on the web. (183.31 KB, image/png)
2013-10-14 17:45 UTC, jvlasak
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 912547 0 unspecified CLOSED Events - server doesn't seem to recognize log lines in newly created log file 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1009666 0 unspecified CLOSED Not all event sources are being polled by the agent 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 446653 0 None None None Never

Internal Links: 912547 1009666

Description Larry O'Leary 2013-09-10 23:16:14 UTC
Description of problem:
Even though log messages are logged to a resource's log file and events had originally been reported for that source, from time to time events are simply missed and can results in loss of event history and alert failures. This requires log monitoring to be performed outside of JBoss ON to verify JBoss ON is working.

Version-Release number of selected component (if applicable):
4.4.0.JON312GA

It is not clear how or why this issue occurs. In the reported case, a log message was written to the log file and after several minutes, the log event was never triggered. This is not always the case. Sometimes the log event generates the expected result but not always.

The original working theory was related to disk caching and log file rolling. However, if appears that even after a log file has been written (verified by tailing the log file at the command-line) the agent just doesn't detect the log being updated.

Further research revealed that bug 846082 had been logged sometime ago that may explain this sporadic behavior.

Comment 1 John Sanda 2013-09-13 18:09:44 UTC
I created a test to reproduce the issues described in bug 846082. I reverted the changes in Log4JLogEntryProcessor, making the DateFormat fields static again. The test consistently fails. If you make them instance fields, the test consistently passes.

I did my work in the branch bug/1006619 which has been pushed to origin,

https://git.fedorahosted.org/cgit/rhq/rhq.git/log/?h=bug/1006619

For reference, here is what some of the exceptions look like:

java.lang.NumberFormatException: For input string: "E.423021313E4"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
	at java.lang.Double.parseDouble(Double.java:540)
	at java.text.DigitList.getDouble(DigitList.java:168)
	at java.text.DecimalFormat.parse(DecimalFormat.java:1321)
	at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088)
	at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455)
	at java.text.DateFormat.parse(DateFormat.java:355)
	at org.rhq.core.pluginapi.event.log.Log4JLogEntryProcessor.parseDateString(Log4JLogEntryProcessor.java:117)
	at org.rhq.core.pluginapi.event.log.Log4JLogEntryProcessor.processPrimaryLine(Log4JLogEntryProcessor.java:94)
	at org.rhq.core.pluginapi.event.log.MultiLineLogEntryProcessor.processLine(MultiLineLogEntryProcessor.java:98)
	at org.rhq.core.pluginapi.event.log.MultiLineLogEntryProcessor.processLines(MultiLineLogEntryProcessor.java:70)
	at org.rhq.core.pluginapi.event.log.Log4JEventsTest$Producer.run(Log4JEventsTest.java:46)


Keep in mind that due to the lack of exception handling in 3.1.2, these errors would go completely unreported. The changes for bug 846082 adds exception handling that captures any RuntimeExceptions.

Comment 2 Heiko W. Rupp 2013-09-24 20:46:42 UTC
Setting ON_QA

[22:44:25] <loleary>	 Well, 6619 is actually fixed by 846082... 6619 can go to ON_QA as it was already fixed (jsanda can confirm) in ER01.

[22:45:58] <loleary>	 The only reason 9666 related to 6619 was because it is preventing one from actually testing whether 6619 is fixed or not.

Comment 3 jvlasak 2013-10-09 13:00:43 UTC
I found bug 1017214. What should be done with this bug? Thanks

Comment 4 Larry O'Leary 2013-10-09 13:28:20 UTC
I do not think bug 1017214 is related so I will remove from the dependency list.

Comment 5 jvlasak 2013-10-14 17:45:48 UTC
Created attachment 812135 [details]
All events from the log files are reported on the web.

Verified. 
Tested with 15 log files when messages were simultaneously generated into files using a script. After processes generating the messages were stopped. Then number of events (considered levels: INFO, WARN, ERROR, FATAL) on the web and in all the log files was same.
See the attached screenshot.


Note You need to log in before you can comment on or make changes to this bug.