Bug 1017214 - Higher number of events is reported than it is in the log files.
Summary: Higher number of events is reported than it is in the log files.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Monitoring - Events
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ER04
: JON 3.3.0
Assignee: RHQ Project Maintainer
QA Contact: Sunil Kondkar
URL:
Whiteboard:
Depends On: 1019677
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-09 12:52 UTC by jvlasak
Modified: 2014-12-11 14:00 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-12-11 14:00:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Screenshot showing bigger number of events on the web than in the log files. It was gained byt the same way as it is in the description. (264.56 KB, image/png)
2013-10-09 12:52 UTC, jvlasak
no flags Details

Description jvlasak 2013-10-09 12:52:31 UTC
Created attachment 809896 [details]
Screenshot showing bigger number of events on the web than in the log files. It was gained byt the same way as it is in the description.

Description of problem:
On the web is reported more events from the source event log files than they really contain.

Version-Release number of selected component (if applicable):
Version: 3.2.0.ER2
Build Number: 9bf6f76:371eac0

How reproducible:
always

Steps to Reproduce:
1. Start JBoss ON 3.2 system.
2. Import RHQ Agent and RHQ Server into inventory.
3. Modify RHQ Agent resource configuration Event Report Max Per Source and Event Report Max Total to 1000 and 10000.
4. Add 11 event logging sources to RHQ Server resource using generated log files /tmp/file-00.log, /tmp/file-01.log, /tmp/file-02.log, ..., /tmp/file-10.log
5. Restart agent with DEBUG logging enabled.
6. Create /tmp/generateLog using the script below.
7. Then run /tmp/generateLog to generate our random log data:
    
    a) chmod +x /tmp/generateLog
    b) for i in $(seq -w 0 10); do /tmp/generateLog /tmp/file-${i}.log 2>/dev/null & done

8. Then click on Inventory, choose Servers - Top Level Imports under Resources. Then click on EAP, choose its tab Events and have this page opened.
9. Go to /tmp folder of your machine and after some time can be found out that number of events on the web is bigger than in the log files.
For checking it use in /tmp folder command:

egrep -c "(DEBUG|INFO|WARN|ERROR|FATAL)" file-* | cut -d ":" -f2 | awk '{ sum += $1 } END { print sum }'

and immediately after that click on Refresh button on displayed web page (from step 8). This can be seen in the attached screenshot.


File: /tmp/generateLog
------------------------------------------------------------------------
#!/bin/sh
#

maxSleepTimePerSleep=2
maxEntriesPerSleep=5
maxEntries=1000000
maxSleepTime=3600





logLevels=("FATAL"
"ERROR"
"WARN "
"INFO "
"DEBUG"
"TRACE")

logThreads=(main
$(echo Thread-{0..99})
$(echo Fake-{0..9}))

logCategories=(apache.cassandra.db.commitlog.CommitLogReplayer
org.jboss.as.server.deployment 
org.jboss.as.messaging 
org.some.other.category)

logMessages=("A log message"
"Another log message"
"A log message with some meaning"
"A log message without any meaning"
"To be a log message or not to be"
"To infinity and beyond"
"Every rose has its throne"
"Come out and pay"
"That thing wasn't supposed to do that."
"Now that's what I'm talking about"'!'
"Friends don't let friends eat grapefruit")

function printRandomLogLevel() {
    numLogLevels=${#logLevels[*]}
    echo -n ${logLevels[$((RANDOM%numLogLevels))]}
}

function printRandomLogThread() {
    numLogThreads=${#logThreads[*]}
    echo -n ${logThreads[$((RANDOM%numLogThreads))]}
}

function printRandomLogCategory() {
    numLogCategories=${#logCategories[*]}
    echo -n ${logCategories[$((RANDOM%numLogCategories))]}
}

function printRandomLogMessage() {
    numLogMessages=${#logMessages[*]}
    echo -n ${logMessages[$((RANDOM%numLogMessages))]}
}


function printLogTimestamp() {
    timestamp=$(date +"%F %T|%N")
    ms="$(printf "%03d" $(( 10#${timestamp#*|}/1000000 )))"
    echo -n "${timestamp%|*},${ms}"
}

function printRandomLogEntry() {
    echo "$(printLogTimestamp) $(printRandomLogLevel) [$(printRandomLogThread)] ($(printRandomLogCategory)) $(printRandomLogMessage)"
}

function printRandomLogEntries() {
    totalSleepTime=0
    totalEntries=0

    while [ true ]; do
        randomSleep=$((RANDOM%${maxSleepTimePerSleep}))
        [ $((${maxSleepTime}-(${totalSleepTime}+${randomSleep}))) -lt 0 ] && randomSleep=$((${randomSleep} +(${maxSleepTime}-(${totalSleepTime}+${randomSleep}))))
        echo >&2 Sleeping for ${randomSleep} seconds...
        sleep ${randomSleep}s
        totalSleepTime=$((totalSleepTime+$randomSleep))
        randomNumEntries=$((RANDOM%${maxEntriesPerSleep}))
        [ $((${maxEntries}-(${totalEntries}+${randomNumEntries}))) -lt 0 ] && randomNumEntries=$((${randomNumEntries} +(${maxEntries}-(${totalEntries}+${randomNumEntries}))))
        echo >&2 Printing ${randomNumEntries} log entries...
        for ((i=0; i<randomNumEntries; i++)); do
            [ $# -gt 0 ] && {
                echo $(printRandomLogEntry) >>"$1"
            } || {
                echo $(printRandomLogEntry)
            }
        done
        totalEntries=$((totalEntries+${randomNumEntries}))
        [ ${totalEntries} -ge ${maxEntries} -o ${totalSleepTime} -ge ${maxSleepTime} ] && break
    done
    echo >&2 Total entries is ${totalEntries} and total sleep time was ${totalSleepTime} seconds.
}

printRandomLogEntries $@
------------------------------------------------------------------------

Actual results:
From certain time there is more events on the web than in the log files.

Expected results:
Number of events on the web should be always <= than in the log files.

Additional info:
For imagine the number of events on the web in my case was 157 370 and in all the the log files 131 074. This was found out when all the processes generating events into the log files finished by themselves.

Comment 1 Larry O'Leary 2013-10-09 13:37:23 UTC
Not sure why these would be different unless the same messages are being picked up multiple times.

I do know that my script uses a log level of TRACE and the UI may have trouble with retrieving/rendering these messages due to TRACE not being a defined type in the JON event UI. However, I would expect the event page to list less events in that case.

If duplicates are being read in there is a good chance that this issue has existed for some time. There doesn't seem to be any code changes in how the log files are being parsed.

We are nearing 3.2 code freeze and the payload is too large at the moment. As this is does not appear to be a regression, I will target it for post 3.2 triage.

Comment 6 Jay Shaughnessy 2014-09-04 18:19:23 UTC
I think we pick up TRACE messages and classify them as DEBUG.  My guess is that the number of DEBUG Events = Sum(DEBUG + TRACE) in the log files.

Can you run this test again and eliminate TRACE messages from the generated files?

Comment 8 Mike Foley 2014-09-25 19:18:53 UTC
based ib cinnebt #6 ...setting to ON_QA so QE will have this on their radar screen

Comment 9 Sunil Kondkar 2014-10-17 15:01:46 UTC
Verified on JON 3.3 ER04 

Eliminating the TRACE messages from generated log files and running the test again shows that number of events in the 'Events' tab of resource (44228) are less than number of events in the log files(44944)


Note You need to log in before you can comment on or make changes to this bug.