Description of problem: Getting: /etc/cron.hourly/mcelog.cron: read: No such device messages from my kvm vm. Version-Release number of selected component (if applicable): mcelog-0.9pre1-0.1.fc13.x86_64
Me too, but if I run /etc/cron.hourly/mcelog.cron as root, it works as expected. Could it be some kind of permission problem?
I also only seem to get it once, maybe after boot.
If I run it from the command line just after booting, and before the hourly cron has a chance to run, I get this: [root@localhost ~]# /usr/sbin/mcelog --ignorenodev --filter read: No such device [root@localhost ~]# /usr/sbin/mcelog --ignorenodev --filter [root@localhost ~]# So the error message is only generated the first time.
I straced /usr/sbin/mcelog just after booting a VM. The relevant part is this: open("/dev/mcelog", O_RDONLY) = 4 ioctl(4, MCE_GET_RECORD_LEN or MTRRIOC_SET_ENTRY, 0x7fff38693bcc) = 0 ioctl(4, MCE_GET_LOG_LEN or MTRRIOC_DEL_ENTRY, 0x7fff38693bc8) = 0 read(4, 0x26511d0, 2816) = -1 ENODEV (No such device)
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. Changing version to '14'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Thanks.
*** Bug 628243 has been marked as a duplicate of this bug. ***
Looks like a kernel issue. On first access, reading /dev/mcelog returns ENODEV. 1281 open("/dev/mcelog", O_RDONLY) = 4 1281 ioctl(4, MCE_GET_RECORD_LEN or MTRRIOC_SET_ENTRY, 0x7fffb141787c) = 0 1281 ioctl(4, MCE_GET_LOG_LEN or MTRRIOC_DEL_ENTRY, 0x7fffb1417878) = 0 1281 read(4, 0xf17190, 2816) = -1 ENODEV (No such device)
Isn't that what the --ignorenodev argument to mcelog is supposed to handle?
Only on open(), not read().
*** This bug has been marked as a duplicate of bug 595930 ***
A previous build of mcelog 1.0pre2 was available but had not been pushed. I cleaned up this package this evening in rawhide, F14, and F13, and updated it to follow the official packaging guidelines. Once you see the 1.0pre3 build land in your updates, please let me know if you have any further problems. The update contains a modified cron script that, as mentioned in #9, does use --ignorenodev. Links to updated packages in bug referenced in #11. Jon.
Ok. There are two bugs here. Some systems don't have an MCE device. Some do, and in either case you might have a /dev/mcelog device but the first read from it generally will fail after booting. I added a hack to the version of mcelog that I just built (which also reworks it to use systemd and run as a daemon) such that it will try twice - delaying between opens - as a horrible hack until Andi fixes whatever is broken in the kernel. I will ping him shortly as I have quite a list of things I had to fix in mcelog to make it work as a daemon that obviously had never been tested in the upstream version.
Actually, it's easier (for now only) to have the systemd service start the mcelog process twice. I'm going to ping Andi and find out WTF is wrong upstream.
> Ok. There are two bugs here. And one of them is obviously not fixed. I still get this error message on first run after a reboot. So we have three bugzillas (at least) and two bugs. Currently two of the bugzillas are closed as duplicates, and the third has a fix in testing. But it only fixes one of the problem. The remaining problem is a kernel bug if I understand the comments in these bugzillas correctly. Does it make sense to reopen this one, for the remaining issue?
I would just wait until the existing open bug is closed, and if that does not fix your issue, please file a new bug. Until then, this issue can be avoided by adding: /usr/sbin/mcelog --ignorenodev --filter >> /var/log/mcelog 2>/dev/null to your rc.local script.
Yea. The once per boot issue is an upstream kernel bug. I have changed the rawhide package version to always start the (now a daemon) process twice with some special hack logic, but I need Andi Kleen to fix upstream. For F14, you'll have to live with the first cron job after boot failing until we get the kernel fixed.
> The once per boot issue is an upstream kernel bug. Is there any bugzilla tracking that issue? I didn't find anything obvious (except for this one) neither in RH bugzilla nor in kernel bugzilla. But maybe I'm not searching for the right terms?
The kernel bugzilla is about to get an mcelog userspace component (because I requested it), and the kernel-side bug will be reported now I've given Andi chance to comment privately.