615530 – Getting /etc/cron.hourly/mcelog.cron: read: No such device messages after reboot

Bug 615530 - Getting /etc/cron.hourly/mcelog.cron: read: No such device messages after reboot

Summary: Getting /etc/cron.hourly/mcelog.cron: read: No such device messages after re...

Keywords:
Status:	CLOSED DUPLICATE of bug 595930
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	14
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	628243 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-07-16 22:37 UTC by Orion Poplawski
Modified:	2011-08-25 20:06 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-11-10 04:15:55 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Orion Poplawski 2010-07-16 22:37:44 UTC

Description of problem:

Getting:

/etc/cron.hourly/mcelog.cron:

read: No such device

messages from my kvm vm.

Version-Release number of selected component (if applicable):
mcelog-0.9pre1-0.1.fc13.x86_64

Comment 1 Jerry James 2010-07-22 17:13:39 UTC

Me too, but if I run /etc/cron.hourly/mcelog.cron as root, it works as expected.  Could it be some kind of permission problem?

Comment 2 Orion Poplawski 2010-07-22 17:29:50 UTC

I also only seem to get it once, maybe after boot.

Comment 3 Jerry James 2010-07-27 19:39:23 UTC

If I run it from the command line just after booting, and before the hourly cron has a chance to run, I get this:

[root@localhost ~]# /usr/sbin/mcelog --ignorenodev --filter
read: No such device
[root@localhost ~]# /usr/sbin/mcelog --ignorenodev --filter
[root@localhost ~]# 

So the error message is only generated the first time.

Comment 4 Jerry James 2010-07-27 19:55:57 UTC

I straced /usr/sbin/mcelog just after booting a VM.  The relevant part is this:

open("/dev/mcelog", O_RDONLY)              = 4
ioctl(4, MCE_GET_RECORD_LEN or MTRRIOC_SET_ENTRY, 0x7fff38693bcc) = 0
ioctl(4, MCE_GET_LOG_LEN or MTRRIOC_DEL_ENTRY, 0x7fff38693bc8) = 0
read(4, 0x26511d0, 2816)                = -1 ENODEV (No such device)

Comment 5 Bug Zapper 2010-07-30 12:38:05 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Jon Masters 2010-08-09 23:04:57 UTC

Thanks.

Comment 7 Jeremy Sanders 2010-10-01 09:49:57 UTC

*** Bug 628243 has been marked as a duplicate of this bug. ***

Comment 8 Orion Poplawski 2010-10-14 15:27:46 UTC

Looks like a kernel issue.  On first access, reading /dev/mcelog returns ENODEV.

1281  open("/dev/mcelog", O_RDONLY)     = 4
1281  ioctl(4, MCE_GET_RECORD_LEN or MTRRIOC_SET_ENTRY, 0x7fffb141787c) = 0
1281  ioctl(4, MCE_GET_LOG_LEN or MTRRIOC_DEL_ENTRY, 0x7fffb1417878) = 0
1281  read(4, 0xf17190, 2816)           = -1 ENODEV (No such device)

Comment 9 Jerry James 2010-10-14 15:40:29 UTC

Isn't that what the --ignorenodev argument to mcelog is supposed to handle?

Comment 10 Orion Poplawski 2010-10-14 15:48:31 UTC

Only on open(), not read().

Comment 11 Jon Masters 2010-11-10 04:15:55 UTC


*** This bug has been marked as a duplicate of bug 595930 ***

Comment 12 Jon Masters 2010-11-10 04:17:45 UTC

A previous build of mcelog 1.0pre2 was available but had not been pushed. I cleaned up this package this evening in rawhide, F14, and F13, and updated it to follow the official packaging guidelines. Once you see the 1.0pre3 build land in your updates, please let me know if you have any further problems. The update contains a modified cron script that, as mentioned in #9, does use --ignorenodev.

Links to updated packages in bug referenced in #11.

Jon.

Comment 13 Jon Masters 2010-11-10 09:05:13 UTC

Ok. There are two bugs here. Some systems don't have an MCE device. Some do, and in either case you might have a /dev/mcelog device but the first read from it generally will fail after booting. I added a hack to the version of mcelog that I just built (which also reworks it to use systemd and run as a daemon) such that it will try twice - delaying between opens - as a horrible hack until Andi fixes whatever is broken in the kernel. I will ping him shortly as I have quite a list of things I had to fix in mcelog to make it work as a daemon that obviously had never been tested in the upstream version.

Comment 14 Jon Masters 2010-11-10 09:59:03 UTC

Actually, it's easier (for now only) to have the systemd service start the mcelog process twice. I'm going to ping Andi and find out WTF is wrong upstream.

Comment 15 Göran Uddeborg 2010-11-13 19:10:56 UTC

> Ok. There are two bugs here.

And one of them is obviously not fixed.  I still get this error message on first run after a reboot.

So we have three bugzillas (at least) and two bugs.  Currently two of the bugzillas are closed as duplicates, and the third has a fix in testing.  But it only fixes one of the problem.

The remaining problem is a kernel bug if I understand the comments in these bugzillas correctly.  Does it make sense to reopen this one, for the remaining issue?

Comment 16 Bill Gianopoulos 2010-11-13 19:24:08 UTC

I would just wait until the existing open bug is closed, and if that does not fix your issue, please file a new bug.  Until then, this issue can be avoided by adding:

/usr/sbin/mcelog --ignorenodev --filter >> /var/log/mcelog 2>/dev/null

to your rc.local script.

Comment 17 Jon Masters 2010-11-13 20:39:00 UTC

Yea. The once per boot issue is an upstream kernel bug. I have changed the rawhide package version to always start the (now a daemon) process twice with some special hack logic, but I need Andi Kleen to fix upstream. For F14, you'll have to live with the first cron job after boot failing until we get the kernel fixed.

Comment 18 Göran Uddeborg 2010-11-13 21:06:21 UTC

> The once per boot issue is an upstream kernel bug.

Is there any bugzilla tracking that issue?  I didn't find anything obvious (except for this one) neither in RH bugzilla nor in kernel bugzilla.  But maybe I'm not searching for the right terms?

Comment 19 Jon Masters 2010-11-18 10:28:05 UTC

The kernel bugzilla is about to get an mcelog userspace component (because I requested it), and the kernel-side bug will be reported now I've given Andi chance to comment privately.

Note You need to log in before you can comment on or make changes to this bug.