Bug 1091134

Summary: Platform process service CPU Percentage metric returns values that are inconsistent with actual CPU load measurements
Product: [JBoss] JBoss Operations Network Reporter: Larry O'Leary <loleary>
Component: Plugin -- OtherAssignee: Thomas Segismont <tsegismo>
Status: CLOSED CURRENTRELEASE QA Contact: Jeeva Kandasamy <jkandasa>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: jkandasa, lzoubek, mfoley, myarboro, tsegismo
Target Milestone: ER01Keywords: Triaged
Target Release: JON 3.2.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1127876 (view as bug list) Environment:
Last Closed: 2014-09-05 15:40:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1109439, 1127875    
Bug Blocks: 1127876    
Attachments:
Description Flags
Scren shot showing 6699.8% CPU usage
none
Scren shot showing 17051.8% CPU usage less then 2 hours later
none
Byteman rule and helper jar
none
Byteman trace output showing single resource over 1 day
none
CPU-percentage greater than 200% with two cpu none

Description Larry O'Leary 2014-04-25 01:20:35 UTC
Created attachment 889497 [details]
Scren shot showing 6699.8% CPU usage

Description of problem:
After importing a platform process service resource into inventory, metric values reported for its *CPU Percentage* measurement are invalid.

In the reported case maximum CPU Percentage value was reported at 6699.8% --as seen in the metric table on the monitoring page -- while average was being reported at 352.5%. This was seen over an 8 hour period which also included a minimum of 0.4% and a live value of 1.3%.

Not even two hours later, the same metric for the same resource reported a value of 17051.8%. This is for a Java process and the process service's plug-in configuration property *Full Process Tree* is set to *Yes*.

The target machine only has 8 CPU cores.



Version-Release number of selected component (if applicable):
3.2.0.GA

How reproducible:
Every few minutes

Additional info:
It is not clear at this time how this issue occurs. It is only clear that the reported values are impossible given the CPU and the actual process' load. It appears that CPU Percentage is a metric returned by the native libraries and seems to relate to the actual CPU the process is running on. However, even that is not clear.

Comment 1 Larry O'Leary 2014-04-25 01:22:23 UTC
Created attachment 889498 [details]
Scren shot showing 17051.8% CPU usage less then 2 hours later

Comment 2 Thomas Segismont 2014-05-21 13:19:21 UTC
Created attachment 897969 [details]
Byteman rule and helper jar

Comment 5 Larry O'Leary 2014-05-29 19:43:57 UTC
Created attachment 900502 [details]
Byteman trace output showing single resource over 1 day

Byteman trace log excerpt showing a process identified as domain5server1 over a 24 hour period. During this period the invalid and high CPU % was reported:

 - At 2014-05-23 22:57:01.926-0500 the CPU percent was returned at 7730.6 % -- cpuPercent=77.30594216184681;
   + This continued every minute until 2014-05-23 23:08:03.007-0500 at which time its value was cpuPercent=0.010536456138238304;
 - At 2014-05-23 23:08:03.005-0500 the process ID changed from pid=[23236] to pid=[32765];
 
The important take away here is that the invalid value seems to correspond with what seems to be a restart of the process. Perhaps the process with id of 23236 is gone during the period of time the invalid values were returned?

Comment 13 Larry O'Leary 2014-06-13 23:21:22 UTC
The fix for bug 1109439 also resolves this bug.

Comment 14 Libor Zoubek 2014-08-08 11:00:45 UTC
Setting to MODIFIED, as fix of this BZ was implemented via  Bug 1100609 - which is already in 3.3 branch

Comment 15 Simeon Pinder 2014-08-15 03:19:10 UTC
Moving to ON_QA as this is available for test in JON 3.2.3 ER01 build:

http://jon01.mw.lab.eng.bos.redhat.com:8042/dist/release/jon/3.2.3.GA/8-14-14/

Comment 16 Thomas Segismont 2014-08-19 10:19:25 UTC
(In reply to Libor Zoubek from comment #14)
> Setting to MODIFIED, as fix of this BZ was implemented via  Bug 1100609 -
> which is already in 3.3 branch

Wrong reference:

The fix has been applied via Bug 1109439 in 3.2.x branch.

See https://bugzilla.redhat.com/show_bug.cgi?id=1109439#c2

Comment 17 Jeeva Kandasamy 2014-08-22 11:57:57 UTC
Created attachment 929562 [details]
CPU-percentage greater than 200% with two cpu

I imported a java process, on the first check it took usage as 206.5%, where I have only two CPUs. Hence I'm reopening this issue.

Steps I followed,
1. Imported a process resource which is down
2. After successful import started the process(java) up and running.
3. very first value of the cpu usage shows as 206.5%, where I had only 2 CPUs
4. I was kept run for an hour,I see problem with only very first value.
5. I restarted the java process service, again I can see the wrong value(570.2%) for the very first value of the CPU usage after restart.

Version: 
JBoss Operations Network
Version : 3.2.0.GA Update 03
Build Number : bca1bc8:e19c43d
GWT Version : 2.5.0
SmartGWT Version : 3.0p


Screen shot is attached.

Comment 22 Jeeva Kandasamy 2014-08-27 13:54:43 UTC
To track this edge case I opened another BZ https://bugzilla.redhat.com/show_bug.cgi?id=1134437