912547 – Events - server doesn't seem to recognize log lines in newly created log file

Bug 912547 - Events - server doesn't seem to recognize log lines in newly created log file

Summary: Events - server doesn't seem to recognize log lines in newly created log file

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Monitoring - Events
Sub Component:
Version:	JON 3.1.2
Hardware:	All
OS:	All
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	JON 3.2.0
Assignee:	RHQ Project Maintainer
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:	RHQ-2069
Blocks:
TreeView+	depends on / blocked

Reported:	2013-02-19 00:06 UTC by Larry O'Leary
Modified:	2018-12-01 17:21 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:	RHQ-2069
Environment:
Last Closed:	2013-02-25 20:38:09 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1006619	0	high	CLOSED	Log events sources are unstable and events are lost due to Log4j log parsing not thread safe	2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution)	315413	0	None	None	None	Never

Internal Links: 1006619

Description Larry O'Leary 2013-02-19 00:06:06 UTC

This issue appears to be very long standing and essentially results in the loss of event data. 

Not sure if it is actually the creation of the log file that is to blame but I do know that this issue will occur anytime we are monitoring a source that rolls its log files.

For example, if monitoring an EAP log4j source that uses a DailyRollingFileAppender and the appender rolls at midnight, log messages will be missed from both the beginning of the new log file and potentially the end of the old log file. This is because we do not pick up the file descriptor change right away and by the time we do, the rolled log has already received a new name that doesn't match our configured name. Furthermore, we miss the log messages that are written at the beginning of the new log file if we do not detect the inode change until after messages have already been written.

The impact of this issue is greatly reduced by using a rolling appender that rolls as infrequently as possible. The problem is magnified when using smaller time based intervals (hourly) or when using size limits when there is a lot of logging happening.

1.  Configure EAP 5.1's logger to roll every 100KB.

    Edit <JBOSS_HOME>/server/all/conf/jboss-log4j.xml and replace the existing FILE appender definition with the following size based appender:

           <!-- A size based file rolling appender -->
           <appender name="FILE" class="org.jboss.logging.appender.RollingFileAppender">
              <errorHandler class="org.jboss.logging.util.OnlyOnceErrorHandler"/>
              <param name="File" value="${jboss.server.log.dir}/server.log"/>
              <param name="Append" value="true"/>
              <param name="MaxFileSize" value="100KB"/>
              <param name="MaxBackupIndex" value="100000"/>

              <layout class="org.apache.log4j.PatternLayout">
                 <!-- The default pattern: Date Priority [Category] (Thread) Message\n -->
                 <param name="ConversionPattern" value="%d %-5p [%c] (%t) %m%n"/>
              </layout>
           </appender>

2.  Increase JBoss ON agent's rhq.agent.plugins.event-report.max-per-source and rhq.agent.plugins.event-report.max-total to 10000 and 50000 respectively.

        <entry key="rhq.agent.plugins.event-report.max-per-source" value="10000"/>
        <entry key="rhq.agent.plugins.event-report.max-total" value="50000"/>

3.  Start EAP 5.1 server with logging threshold set to DEBUG.

        ./run.sh -c all -b 0.0.0.0 -Djboss.server.log.threshold=DEBUG &
        
4.  Start ON system
5.  Import EAP server into ON inventory
6.  Enable the event source for EAP server
7.  Generate debug logging by accessing various resourced from EAP's admin-console WAR - http://localhost:8080/admin-console

After running for a few minutes and generating 100 or so log files, you can verify that the captured events do not match up to the events logged. Many of them are missing.


+++ This bug was initially created as a clone of upstream Bug #535365 +++

How to repeat:

Create a new event source on the RHQ server.  'touch' the file to create it.  Wait 2 minutes.  now echo into the file lines like this (change datestamps if necessary):

2009-05-08 15:57:41,170 INFO [my.class1] AutomationMsg First info level message
2009-05-08 15:57:41,420 DEBUG [my.class2] AutomationMsg First debug level message
2009-05-08 15:57:41,671 WARN [my.class3] AutomationMsg First warn level message
2009-05-08 15:57:41,921 ERROR [my.class4] AutomationMsg First error level message
2009-05-08 15:57:42,171 FATAL [my.class5] AutomationMsg First fatal level message
2009-05-08 15:57:42,421 INFO [your.class1] AutomationMsg Second info level message
2009-05-08 15:57:42,672 DEBUG [your.class2] AutomationMsg Second debug level message
2009-05-08 15:57:42,922 WARN [your.class3] AutomationMsg Second warn level message
2009-05-08 15:57:43,172 ERROR [your.class4] AutomationMsg Second error level message
2009-05-08 15:57:43,422 FATAL [your.class5] AutomationMsg Second fatal level message

wait 2 minutes.  Now check the Events tab on the RHQ server to see if the events are there.  They don't show up.  Now, wait 2 minutes, then run 'touch' again on the log file to update the modification date.  Wait 2 minutes.  Refresh the events page.  Still no events.

Now run "tail -10 [logfile] >> [logfile]" which basically adds another 10 lines to the end of the file that match what's already there (since the file is 10 lines).  Wait 2 minutes.  Refresh.  Now there are 8 events.  The 2 that are missing are the DEBUG level events even though DEBUG is checked in the filter.

'touch' the file again.  Wait 2minutes, refresh.  Still no new entries.

--- Additional comment from Jeff Weiss on 2009-05-08 16:40:41 EDT ---

I now realize the lack of DEBUG messages is expected, I was creating the source with the default level of INFO.  However the other problem remains unexplained.

--- Additional comment from Red Hat Bugzilla on 2009-11-10 15:57:08 EST ---

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2069

Comment 1 Stefan Negrea 2013-02-25 14:54:38 UTC

The current implementation is very generic in the way it monitor file changes, so the user could any logging framework or internally developed code. Because almost each framework has it's own way to rotate logs it is impossible to cover all the scenarios for rotating files. The current code is designed to get as many log events as possible by detecting a file change in several ways.

For this reason, it is impossible to read log entries missed from the file got rotate out because there is no way where/what is the file that got rotated out. The same applies to log entries missed from the beginning of the new files due to disk caching and inode change detection. Here not only the log file rotation mechanism could be different between frameworks but also the way logs are flushed and written to disk.

This KB explains the situation in detail and provides several solutions:
https://access.redhat.com/knowledge/solutions/315413

A very complicated solution would be to rewrite this entire module in a pluggable fashion where the user could choose a provided implementation (for well established logging frameworks) or write their own (for internally developed logging solutions). The limitation here would be the pluggability of the logging framework to allow redirection of logs to RHQ. So even with this complex implementation the default/fallback would be the current implementation if the logging framework is not pluggable.

There is no way to fix the current generic mechanism in a way that is independent of the logging framework.

Note You need to log in before you can comment on or make changes to this bug.