Bug 522827

Summary: Hourly cron job causes high system load under Xen kernel (dom0) without XenStoreD running
Product: Red Hat Enterprise Linux 5 Reporter: Adam Stokes <astokes>
Component: mcelogAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: BaseOS QE <qe-baseos-auto>
Severity: medium Docs Contact:
Priority: low    
Version: 5.5CC: brian.dudek, cward, emcnabb, jaswang, jscotka, ralph, riek, sdodson, tao
Target Milestone: rcKeywords: OtherQA, Regression
Target Release: ---Flags: prarit: needinfo? (astokes)
cward: needinfo? (astokes)
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 08:26:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 525703    
Attachments:
Description Flags
updated RHEL5 mcelog.cron none

Description Adam Stokes 2009-09-11 17:08:45 UTC
Description of problem:
Hourly cron job causes high system load under Xen kernel (dom0) without XenStoreD running

Version-Release number of selected component (if applicable):
mcelog-0.9pre-1.27.el5


How reproducible:
100%

Steps to Reproduce:
1. run cron mcelog on a PV
2. 
3.
  
Actual results:
Load increase on systems and mcelog process blocks

Expected results:
No process block or load increase

Additional info:
I believe this should be a simple fix, The patch in rhbz#511126 is causing the regression.

In my opinion we should be testing for domU in this sense:

If we can read /proc/xen/capabilities and grep the status of `grep -q control_d /proc/xen/capabilities` we can better tell if we are on a guest or not.

Please not this is how we do it in sosreport and you can have a look the xen plugin there for further info:

https://fedorahosted.org/sos/browser/trunk/src/lib/sos/plugins/xen.py

Look under determineXenHost method.

Thanks

Comment 2 Prarit Bhargava 2009-09-14 12:40:48 UTC
Adam, are there some cases where the check implemented in 511126 will fail?  If so, what are they?

P.

Comment 4 Prarit Bhargava 2009-09-14 17:19:31 UTC
So, the code *should* be:

if [ -e /proc/xen/capabilities ]
     # xen
     grep control_d /proc/xen/capabilities >& /dev/null
     if [ $? -ne 0 ]
          # domU -- do not run on xen PV guest
          return 1; 
     fi
fi

Correct?

P.

Comment 5 Adam Stokes 2009-09-14 17:30:35 UTC
Looks right, I don't have a xen box to test atm.

Thanks,
Adam

Comment 7 Issue Tracker 2009-09-14 19:10:01 UTC
Event posted on 09-14-2009 03:02pm EDT by Glen Johnson

------- Comment From hellerda@us.ibm.com 2009-09-14 14:58 EDT-------
Sorry gentlemen, I would offer to test immediately but I am out of town
this week.  I'll be back on friday if no one gets to it before then. 
Thanks - Dave H.

Ticket type changed from 'Problem' to ''

This event sent from IssueTracker by jkachuck 
 issue 339358

Comment 8 Prarit Bhargava 2009-09-15 12:20:10 UTC
I'm attaching a new mcelog.cron for the customer to test (so they don't have to manually edit the file).

P.

Comment 9 Prarit Bhargava 2009-09-15 12:22:25 UTC
Created attachment 361075 [details]
updated RHEL5 mcelog.cron 

Please have customer test this mcelog.cron.

After positive testing results this package will be updated.

Thank you,

P.

Comment 10 Prarit Bhargava 2009-09-15 12:23:14 UTC
Please leave this BZ in NEEDINFO until customer testing results are reported.

Thanks,

P.

Comment 11 Issue Tracker 2009-09-21 18:06:37 UTC
Event posted on 09-21-2009 11:55am EDT by Glen Johnson

------- Comment From hellerda@us.ibm.com 2009-09-21 11:49 EDT-------
The above code works fine if patched as follows:  It allows mcelog to run
under dom0 and does _not_ produce hung processes as did the old code.

-if [ $? -ne 0 ]
+if [ $? -ne 0 ]; then

I also tested the code under a PV domU guest and confirmed it does
_prevent_ mcelog from running there.

If this is the intended logic, I would say the patch works.

FWIW: I should point out again that "/dev/mcelog" exists in neither the
dom0 nor domU enviromnents, so I'm not clear on the logic of allowing
mcelog to run under dom0 but not domU.  True, mcelog does _not_ spit out
the additional error message seen in
https://bugzilla.redhat.com/show_bug.cgi?id=511126 when running under
dom0.  But it is just silently failing on each invocation.  (Due to
presence of the --ignorenodev parm).

If there is no useful info to collect under dom0, why not just prevent
mcelog from running under Xen, period?  Perhaps the behavior is different
in a non-PV environment?  (I only tested on a PV system).

Thanks - Dave H.

Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by jkachuck 
 issue 339358

Comment 13 Chris Ward 2009-10-23 14:09:22 UTC
@IBM

We need to confirm that there is commitment to test 
for the resolution of this request during the RHEL 5.5 Test
Phase, if it is accepted into the release.

RHEL 5.5 Beta Test Phase is expected to begin around February
2010.

Please post a confirmation before Oct 30th, 2009, 
including the contact information for testing engineers.

Comment 14 Scott Dodson 2009-11-05 20:05:40 UTC
*** Bug 525386 has been marked as a duplicate of this bug. ***

Comment 16 Prarit Bhargava 2009-11-09 19:49:15 UTC
Fixed in 0.9pre-1.29.el5.

P.

Comment 18 Chris Ward 2010-02-11 10:25:40 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 19 Chris Ward 2010-02-24 15:16:17 UTC
This issue should be fixed in the latest RHEL 5.5 Beta bits. 

Please test to confirm that this issue works as expected.

RHEL 5.5 test phase is ending very soon, so results by the end
of this week would be greatly appreciated.

Thanks!

Comment 21 Issue Tracker 2010-03-08 17:04:53 UTC
Event posted on 03-01-2010 08:22pm EST by Glen Johnson

------- Comment From hellerda@us.ibm.com 2010-03-01 20:12 EDT-------
Confirmed, the mcelog.cron patch works as described in comments 18 & 19
(comment #3352711 RIT). It allows mcelog to run under dom0 and does _not_
produce hung processes.

I did not test under a domU guest, but since the patch is identical to
that previously tested I assume the domU behavior would be the same as
described in comment 19.

Tested on RHEL 5.5 snap2

[root@ss2 ~]# uname -a
Linux ss2 2.6.18-189.el5xen #1 SMP Tue Feb 16 11:29:10 EST 2010 x86_64
x86_64 x86_64 GNU/Linux


This event sent from IssueTracker by jkachuck 
 issue 339358

Comment 22 errata-xmlrpc 2010-03-30 08:26:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0247.html