Bug 1337582 - Fails to start with "Cannot read MSR_ERROR_CONTROL from /dev/cpu/0/msr" when running in VM on Xeon E5-2450 host
Summary: Fails to start with "Cannot read MSR_ERROR_CONTROL from /dev/cpu/0/msr" when ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: mcelog
Version: 24
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Prarit Bhargava
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker RejectedFreezeException
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-19 14:19 UTC by Adam Williamson
Modified: 2017-01-22 12:17 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-09 19:19:46 UTC
Type: Bug


Attachments (Terms of Use)

Description Adam Williamson 2016-05-19 14:19:47 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Adam Williamson 2016-05-19 14:23:04 UTC
Sorry, somehow hit 'submit' too early.

We have an openQA test which checks whether all services started successfully after a default install. It seems recently mcelog quite often fails to start up (though not always - sometimes it works fine). e.g.:

https://openqa.fedoraproject.org/tests/18081

I downloaded the logs from that test - you can find them on the 'Logs & Assets' tab - and this seems to be the error:

May 18 12:42:01 localhost.localdomain mcelog[1001]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/0/msr
May 18 12:42:01 localhost.localdomain mcelog[1001]: : Input/output error

the tests run in qemu VMs with '-cpu host'. The host machine for that test has a 'Intel(R) Xeon(R) CPU E5-2450 0 @ 2.10GHz' CPU. I haven't checked yet if the failure is related to the host machine used, I'll look into that.

Comment 2 Adam Williamson 2016-05-19 14:24:53 UTC
Proposing as a Final blocker: "All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present." https://fedoraproject.org/wiki/Fedora_24_Final_Release_Criteria#System_services

Comment 3 Adam Williamson 2016-05-19 14:28:49 UTC
OK, now I look at it, there seems to be quite a strong correlation with the 'worker host' used. We have two types of 'worker host' machine: one with that Xeon E5-2450 and three with Xeon E5540 CPUs. It looks a lot like the test always fails when run on the E5-2450 host, and always passes when run on the E5540 hosts.

Comment 4 Geoffrey Marr 2016-05-23 18:16:13 UTC
Discussed during the 2016-05-23 blocker review meeting: [1]

The decision to delay the classification of this bug as a blocker has been made so that the mcelog packager and developers can be consulted as this could be a hardware-specific issue. This will be discussed at next week's blocker-review meeting.

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2016-05-23/f24-blocker-review.2016-05-23-16.00.txt

Comment 5 Adam Williamson 2016-05-26 23:25:44 UTC
Prarit, I believe this is resolved upstream by these two commits:

https://github.com/andikleen/mcelog/commit/89d3888aada31103d433e14f7c8a9c45ea1c3011
https://github.com/andikleen/mcelog/commit/b2d6f0c4e762e28d877a8e0e744a6946cd1cae4f

I've sent the latest tag (v137) to Rawhide, that seemed pretty obvious. But I'm not sure whether it's better for F24 - since we're quite late, it's nearly Final freeze time - to do the same thing, or just to backport those two patches. WDYT? Could you please make a decision? Thanks!

Comment 6 Geoffrey Marr 2016-05-30 18:15:32 UTC
Discussed during the 2016-05-30 blocker review meeting: [1]

Decision was made to not classify this as either a Blocker or a FreezeException due to the fact that the bug is very hardware-specfic and the impact is not broad enough to violate the release criteria.

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2016-05-30/f24-blocker-review.2016-05-30-16.01.txt

Comment 7 Adam Williamson 2016-12-09 19:19:46 UTC
This should be OK on F25 and Rawhide now, and I really don't care much about F24 any more.

Comment 8 Sergei LITVINENKO 2017-01-22 12:17:08 UTC
Fedora-25

[root@homedesk sysctl.d]# uname -r
4.9.4-201.fc25.x86_64 

Issue is not fixed...

янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/0/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/1/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/2/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/3/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/4/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/5/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/6/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/7/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/8/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/9/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/10/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/11/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/12/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/13/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/14/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/15/msr
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: warning: 16 bytes ignored in each record
янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: consider an update


Note You need to log in before you can comment on or make changes to this bug.