Bug 614874

Summary: mcelogd service does not honour already running service
Product: Red Hat Enterprise Linux 6 Reporter: Jan Tluka <jtluka>
Component: mcelogAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Jan Tluka <jtluka>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: dkovalsk, emcnabb, peterm, snagar, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: mcelog-1.0pre3_20101112-0.3.el6 Doc Type: Bug Fix
Doc Text:
The mcelog service did not check whether another instance of mcelog was running, which could result in multiple mcelog service instances on a single system. This could result in lost or over-reported Machine Check Exceptions. mcelog now detects whether another instance is already running, preventing multiple instances from being launched on a single system simultaneously.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 11:51:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mcelog lockfile patch none

Description Jan Tluka 2010-07-15 12:56:52 UTC
Description of problem:

I'm doing LSB compliance  review for mcelog initscript and I found an issue. According to the LSB SysVInit specification starting an already started service should return 0 exit code. Exit code 1 is returned instead and mcelogd tries to start again. This might be the result of mcelogd not honouring already running instance.

I have inspected a bit further. Seems that there's no /var/run/mcelogd.pid file. Therefore initscripts does not detect the mcelogd is already running and attempts to start another instance which fails.

Version-Release number of selected component (if applicable):
RHEL6.0-20100707.4
mcelog-1.0pre3-0.2.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
0. service mcelogd stop; service mcelogd status
1. service mcelogd start; service mcelogd status
2. service mcelogd start

  
Actual results:

# service mcelogd start
Starting mcelog daemon
# service mcelogd status
Checking for mcelog
mcelog (pid 6270) is running...
# service mcelogd start
Starting mcelog daemon
# echo $?
1
# service mcelogd status
Checking for mcelog
mcelog (pid 6270) is running...
# tail /var/log/messages | grep mcelog
Jul 15 08:43:28 intel-s3ea2-04 mcelog: mcelog server already running

# ls -l /var/run/mcelog*
srwxr-xr-x. 1 root root 0 Jul 15 08:43 /var/run/mcelog-client
# find /var/run/ -iname mcelog*
/var/run/mcelog-client


Expected results:
mcelog correctly detects that it's already running and does not start. 

Additional info:
The issue has been found while running test in Beaker:
test name: /CoreOS/mcelog/sanity/lsb-compliance
full log: https://beaker.engineering.redhat.com/recipes/13279#task160088

Comment 1 RHEL Program Management 2010-07-15 14:24:24 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Jan Tluka 2010-07-19 20:02:26 UTC
> Description of problem:
> 
> I'm doing LSB compliance  review for mcelog initscript and I found an issue.
> According to the LSB SysVInit specification starting an already started service
> should return 0 exit code. Exit code 1 is returned instead and mcelogd tries to
> start again. This might be the result of mcelogd not honouring already running
> instance.
> 
> I have inspected a bit further. Seems that there's no /var/run/mcelogd.pid
> file. Therefore initscripts does not detect the mcelogd is already running and
> attempts to start another instance which fails.
> 

Update:

I've talked with Yulia Kopkova about the importance of pid file. She said that more important is to have /var/lock/subsys/mcelog file. And not having it is considered as a bug.

Another thing is that the actual cause of initial bug report is not the absence of pid file (or mcelog subsys lock file). The problem is that mcelogd script should check for the status of the service (e.g. a line containing 'status mcelogd') and depending on the result start the service (if not running) or return 0 (if running). Currently the script is not checking that and any attempt to start the service with already running one returns 1.

Comment 6 Prarit Bhargava 2010-11-15 18:33:34 UTC
Created attachment 460602 [details]
mcelog lockfile patch

Committed to RHEL6 mcelog.

P.

Comment 11 Prarit Bhargava 2010-12-15 14:43:24 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: The mcelogd service does not check to see if other instances are already running.  This may cause multiple mcelogd service instances on a single system.
Consequence: This may cause MCE events to be lost or over-reported.
Fix: Modify the mcelogd script to check for an existing instance of mcelogd before starting
Result: The mcelog service checks to see if other instances are already running.  This prevents multiple instances of mcelogd from running on a single system.

Comment 14 Jan Tluka 2011-02-21 14:04:24 UTC
Hi, the recent version of mcelog contains a typo in init script. 

# rpm -qa mcelog
mcelog-1.0pre3_20101112-0.4.el6.x86_64

# grep subsys /etc/init.d/mcelogd 
LOCKFILE="var/lock/subsys/mcelogd"

The path is ambiguous because of missing slash at the beginning. I've checked other initscripts (iptables, rsyslog) and they contain absolute path (with slash). However running service mcelogd start/stop works ok even with the typo I think this should be fixed.

Comment 15 Prarit Bhargava 2011-02-21 14:05:42 UTC
(In reply to comment #14)
> Hi, the recent version of mcelog contains a typo in init script. 
> 
> # rpm -qa mcelog
> mcelog-1.0pre3_20101112-0.4.el6.x86_64
> 
> # grep subsys /etc/init.d/mcelogd 
> LOCKFILE="var/lock/subsys/mcelogd"
> 
> The path is ambiguous because of missing slash at the beginning. I've checked
> other initscripts (iptables, rsyslog) and they contain absolute path (with
> slash). However running service mcelogd start/stop works ok even with the typo
> I think this should be fixed.

Will fix ASAP.

P.

Comment 18 Jan Tluka 2011-02-23 13:45:15 UTC
# rpm -qa mcelog
mcelog-1.0pre3_20101112-0.5.el6.x86_64

# tail -f /var/log/messages &
# service mcelogd start
Starting mcelog daemon
Feb 23 14:36:14 dhcp-26-223 mcelog: failed to prefill DIMM database from DMI data                [  OK  ]
# echo $?
0
# ls -l /var/lock/subsys/mcelogd 
-rw-r--r-- 1 root root 0 Feb 23 14:41 /var/lock/subsys/mcelogd
# service mcelogd start
# echo $?
0
# service mcelogd stop
Stopping mcelog                        [  OK  ]
# ls -l /var/lock/subsys/mcelogd 
ls: cannot access /var/lock/subsys/mcelogd: No such file or directory

Setting to verified.

Comment 19 Laura Bailey 2011-05-05 03:17:43 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1 @@
-Cause: The mcelogd service does not check to see if other instances are already running.  This may cause multiple mcelogd service instances on a single system.
+The mcelog service did not check whether another instance of mcelog was running, which could result in multiple mcelog service instances on a single system. This could result in lost or over-reported Machine Check Exceptions. mcelog now detects whether another instance is already running, preventing multiple instances from being launched on a single system simultaneously.-Consequence: This may cause MCE events to be lost or over-reported.
-Fix: Modify the mcelogd script to check for an existing instance of mcelogd before starting
-Result: The mcelog service checks to see if other instances are already running.  This prevents multiple instances of mcelogd from running on a single system.

Comment 20 errata-xmlrpc 2011-05-19 11:51:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0519.html