Bug 1009666 - Not all event sources are being polled by the agent
Summary: Not all event sources are being polled by the agent
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Monitoring - Events
Version: JON 3.1.2,JON 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ER02
: JON 3.2.0
Assignee: John Sanda
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1009699
TreeView+ depends on / blocked
 
Reported: 2013-09-18 20:02 UTC by Larry O'Leary
Modified: 2018-12-03 19:59 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1009699 (view as bug list)
Environment:
Last Closed: 2014-01-02 20:35:13 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Displayed events from different sources. (220.76 KB, image/png)
2013-10-02 09:43 UTC, jvlasak
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1006619 0 high CLOSED Log events sources are unstable and events are lost due to Log4j log parsing not thread safe 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 488173 0 None None None Never

Internal Links: 1006619

Description Larry O'Leary 2013-09-18 20:02:04 UTC
Description of problem:
If multiple event sources are configured on an agent, not all of them are polled. It is not clear what actually causes this issue but it happens every time. 

Version-Release number of selected component (if applicable):
4.4.0.JON312GA
4.7.0.JON [adad71f]

How reproducible:
Always

Steps to Reproduce:
1.  Start JBoss ON 3.2 system.
2.  Import RHQ Agent and RHQ Server into inventory.
3.  Create /tmp/generateLog using the script below.
4.  Then run /tmp/generateLog to generate our random log data
        
        chmod +x /tmp/generateLog
        for i in $(seq -w 0 10); do /tmp/generateLog /tmp/file-${i}.log 2>/dev/null & done

5.  Modify RHQ Agent resource configuration Event Report Max Per Source and Event Report Max Total to 1000 and 10000.
6.  Add 11 event logging sources to RHQ Server resource using generated log files /tmp/file-00.log, /tmp/file-01.log, /tmp/file-02.log, ..., /tmp/file-10.log
7.  Restart agent with DEBUG logging enabled.

File: /tmp/generateLog
------------------------------------------------------------------------
#!/bin/sh
#

maxSleepTimePerSleep=2
maxEntriesPerSleep=5
maxEntries=1000000
maxSleepTime=3600





logLevels=("FATAL"
"ERROR"
"WARN "
"INFO "
"DEBUG"
"TRACE")

logThreads=(main
$(echo Thread-{0..99})
$(echo Fake-{0..9}))

logCategories=(apache.cassandra.db.commitlog.CommitLogReplayer
org.jboss.as.server.deployment 
org.jboss.as.messaging 
org.some.other.category)

logMessages=("A log message"
"Another log message"
"A log message with some meaning"
"A log message without any meaning"
"To be a log message or not to be"
"To infinity and beyond"
"Every rose has its throne"
"Come out and pay"
"That thing wasn't supposed to do that."
"Now that's what I'm talking about"'!'
"Friends don't let friends eat grapefruit")

function printRandomLogLevel() {
    numLogLevels=${#logLevels[*]}
    echo -n ${logLevels[$((RANDOM%numLogLevels))]}
}

function printRandomLogThread() {
    numLogThreads=${#logThreads[*]}
    echo -n ${logThreads[$((RANDOM%numLogThreads))]}
}

function printRandomLogCategory() {
    numLogCategories=${#logCategories[*]}
    echo -n ${logCategories[$((RANDOM%numLogCategories))]}
}

function printRandomLogMessage() {
    numLogMessages=${#logMessages[*]}
    echo -n ${logMessages[$((RANDOM%numLogMessages))]}
}


function printLogTimestamp() {
    timestamp=$(date +"%F %T|%N")
    ms="$(printf "%03d" $(( 10#${timestamp#*|}/1000000 )))"
    echo -n "${timestamp%|*},${ms}"
}

function printRandomLogEntry() {
    echo "$(printLogTimestamp) $(printRandomLogLevel) [$(printRandomLogThread)] ($(printRandomLogCategory)) $(printRandomLogMessage)"
}

function printRandomLogEntries() {
    totalSleepTime=0
    totalEntries=0

    while [ true ]; do
        randomSleep=$((RANDOM%${maxSleepTimePerSleep}))
        [ $((${maxSleepTime}-(${totalSleepTime}+${randomSleep}))) -lt 0 ] && randomSleep=$((${randomSleep} +(${maxSleepTime}-(${totalSleepTime}+${randomSleep}))))
        echo >&2 Sleeping for ${randomSleep} seconds...
        sleep ${randomSleep}s
        totalSleepTime=$((totalSleepTime+$randomSleep))
        randomNumEntries=$((RANDOM%${maxEntriesPerSleep}))
        [ $((${maxEntries}-(${totalEntries}+${randomNumEntries}))) -lt 0 ] && randomNumEntries=$((${randomNumEntries} +(${maxEntries}-(${totalEntries}+${randomNumEntries}))))
        echo >&2 Printing ${randomNumEntries} log entries...
        for ((i=0; i<randomNumEntries; i++)); do
            [ $# -gt 0 ] && {
                echo $(printRandomLogEntry) >>"$1"
            } || {
                echo $(printRandomLogEntry)
            }
        done
        totalEntries=$((totalEntries+${randomNumEntries}))
        [ ${totalEntries} -ge ${maxEntries} -o ${totalSleepTime} -ge ${maxSleepTime} ] && break
    done
    echo >&2 Total entries is ${totalEntries} and total sleep time was ${totalSleepTime} seconds.
}

printRandomLogEntries $@
------------------------------------------------------------------------

Actual results:
Agent log (after restart and debug is enabled) will only contain log messages for a single log file source instead of all 11:

2013-09-18 14:54:59,669 DEBUG [EventManager.poller-1] (core.pluginapi.event.log.LogFileEventPoller)- /tmp/file-10.log: 24405 new bytes
2013-09-18 14:55:59,668 DEBUG [EventManager.poller-3] (core.pluginapi.event.log.LogFileEventPoller)- /tmp/file-10.log: 20126 new bytes
2013-09-18 14:56:59,668 DEBUG [EventManager.poller-1] (core.pluginapi.event.log.LogFileEventPoller)- /tmp/file-10.log: 23275 new bytes
2013-09-18 14:57:59,669 DEBUG [EventManager.poller-1] (core.pluginapi.event.log.LogFileEventPoller)- /tmp/file-10.log: 25691 new bytes


Expected results:
Agent log (after restart and debug is enabled) will contain log messages indicated events are being collected from all 11 files.

Additional Information:
If I disable an event source in the JON UI and then re-enable it, events will be collected for that source. This is true for all the sources. But, as soon as the agent is restarted this problem comes back.

Comment 1 Larry O'Leary 2013-09-18 21:53:49 UTC
jsanda did some digging into this and this appears to be the result of a Map being used instead of a List when the callback listeners are registered on agent startup:

org.rhq.core.pc.PluginContainer.addInitializationListener is invoked by  org.rhq.core.pc.event.EventContextImpl.registerEventPollerInternal using the EventContextImpl class name as the listener name. Because a Map is used in the underlying implementation only one event poller can be registered.

Originally, org.rhq.core.pc.PluginContainer.initListeners was of type List but in commit https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=93c0e5f42426e4955b07cc66be8feab7e0e39c73 this was changed to Map.

This is a pretty nasty bug that essentially means that only one log source can be monitored per agent once the agent is restarted.

Comment 2 John Sanda 2013-09-19 00:42:06 UTC
I have pushed the fix to the bug/1009666 branch. As soon as I get the go ahead, I will merge this into master.

https://git.fedorahosted.org/cgit/rhq/rhq.git/log/?h=bug/1009666

Comment 3 John Sanda 2013-09-19 14:00:39 UTC
Changes have been merged to master.

master commit hash: 3843704e61a4

Comment 4 jvlasak 2013-10-02 09:42:57 UTC
Bug verified.
For all 11 log sources were displayed events (see screenshot - contains only a few different sources). Events were created the same way also after restarting the agent.

Comment 5 jvlasak 2013-10-02 09:43:55 UTC
Created attachment 806345 [details]
Displayed events from different sources.


Note You need to log in before you can comment on or make changes to this bug.