Bug 681006

Summary: Abnormally high cpu usage in taskomatic after upgrade to Satellite 5.4
Product: Red Hat Satellite 5 Reporter: Karl Abbott <kabbott>
Component: OtherAssignee: Jan Pazdziora <jpazdziora>
Status: CLOSED ERRATA QA Contact: Pavel Novotny <pnovotny>
Severity: high Docs Contact:
Priority: urgent    
Version: 540CC: cperry, mmello, pnovotny, xdmoon
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: quartz-1.8.4-1.el5sat Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-21 15:25:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 646488    

Description Karl Abbott 2011-02-28 18:46:13 UTC
Description of problem:

Customer repeatedly experiences very high cpu usage since upgrading to Satellite 5.4 stemming from taskomatic and more notably, it's java process.

CPU usage has not gone down in 3 weeks of running Satellite 5.4

Version-Release number of selected component (if applicable):

Satellite 5.4 -- latest packages.

Problem packages:

quartz-1.8.1
quartz-oracle-1.8.1

How reproducible:

100% at customer's site -- 3 to 4 other customers with this or similar problem. Not 100% sure yet if this is the same problem as those other customers -- working with fellow TAMs to figure that one out.

Steps to Reproduce:
1. Upgrade to Satellite 5.4
2. Be running quartz and quartz-oracle at the 1.8.1 level
3.
  
Actual results:

Unexplained CPU hike that won't die down.

Expected results:

Normal CPU usage across the board.

Additional info:

In this particular case, we started by gathering thread dumps out of the taskomatic java process and noticed that the CPU usage time was being dominated by the Quartz Scheduler around lines 280-300.

Upon researching information on the Quartz Scheduler online, I found this bug:

https://jira.terracotta.org/jira/browse/QTZ-50

The regression referred to by this bug is a 100% cpu utilization in quartz problem that hits right at line 287 -- the very part of the Quartz Scheduler that is acting up on the customer's setup.

This bug was introduced in 1.8.1 and was fixed in 1.8.3. The latest version of quartz is 1.8.4 and so I rolled unsupported rpms for quartz and quartz-oracle of 1.8.4 based on the spec found in brew. Those packages can be found at:

http://people.redhat.com/kabbott/sat-quartz

My customer was willing to test these packages even with the understanding that they were unsupported, testing packages. After upgrading to quartz-1.8.4, the abnormally high cpu usage went away immediately.

I would like to ask that we rebase quartz to 1.8.4 as soon as possible as I would like to move my customer to a supported configuration that doesn't have the abnormally high cpu usage problem.

Comment 2 Jan Pazdziora 2011-03-04 11:05:46 UTC
Rebased to quartz-1.8.4, tagged and built.

Comment 4 Xixi 2011-03-08 21:45:10 UTC
This bug may impact rhn-search too as it uses the same quartz scheduler as taskomatic.

Comment 6 Pavel Novotny 2011-03-14 18:22:45 UTC
Verified.

Old package(s) (quartz-1.8.1-3.el5sat):
  Observed regular 100% CPU usage peaks caused by taskomatic Java processes. 

New package(s) (quartz-1.8.4-1.el5sat):
  The problem has gone, no unusual high CPU usage experienced.

Comment 7 errata-xmlrpc 2011-03-21 15:25:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0367.html