Bug 1337582
Summary: | Fails to start with "Cannot read MSR_ERROR_CONTROL from /dev/cpu/0/msr" when running in VM on Xeon E5-2450 host | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> |
Component: | mcelog | Assignee: | Prarit Bhargava <prarit> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 24 | CC: | gmarr, jsedlak, prarit, robatino, sergei.litvinenko |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | RejectedBlocker RejectedFreezeException | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-12-09 19:19:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Adam Williamson
2016-05-19 14:19:47 UTC
Sorry, somehow hit 'submit' too early. We have an openQA test which checks whether all services started successfully after a default install. It seems recently mcelog quite often fails to start up (though not always - sometimes it works fine). e.g.: https://openqa.fedoraproject.org/tests/18081 I downloaded the logs from that test - you can find them on the 'Logs & Assets' tab - and this seems to be the error: May 18 12:42:01 localhost.localdomain mcelog[1001]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/0/msr May 18 12:42:01 localhost.localdomain mcelog[1001]: : Input/output error the tests run in qemu VMs with '-cpu host'. The host machine for that test has a 'Intel(R) Xeon(R) CPU E5-2450 0 @ 2.10GHz' CPU. I haven't checked yet if the failure is related to the host machine used, I'll look into that. Proposing as a Final blocker: "All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present." https://fedoraproject.org/wiki/Fedora_24_Final_Release_Criteria#System_services OK, now I look at it, there seems to be quite a strong correlation with the 'worker host' used. We have two types of 'worker host' machine: one with that Xeon E5-2450 and three with Xeon E5540 CPUs. It looks a lot like the test always fails when run on the E5-2450 host, and always passes when run on the E5540 hosts. Discussed during the 2016-05-23 blocker review meeting: [1] The decision to delay the classification of this bug as a blocker has been made so that the mcelog packager and developers can be consulted as this could be a hardware-specific issue. This will be discussed at next week's blocker-review meeting. [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2016-05-23/f24-blocker-review.2016-05-23-16.00.txt Prarit, I believe this is resolved upstream by these two commits: https://github.com/andikleen/mcelog/commit/89d3888aada31103d433e14f7c8a9c45ea1c3011 https://github.com/andikleen/mcelog/commit/b2d6f0c4e762e28d877a8e0e744a6946cd1cae4f I've sent the latest tag (v137) to Rawhide, that seemed pretty obvious. But I'm not sure whether it's better for F24 - since we're quite late, it's nearly Final freeze time - to do the same thing, or just to backport those two patches. WDYT? Could you please make a decision? Thanks! Discussed during the 2016-05-30 blocker review meeting: [1] Decision was made to not classify this as either a Blocker or a FreezeException due to the fact that the bug is very hardware-specfic and the impact is not broad enough to violate the release criteria. [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2016-05-30/f24-blocker-review.2016-05-30-16.01.txt This should be OK on F25 and Rawhide now, and I really don't care much about F24 any more. Fedora-25 [root@homedesk sysctl.d]# uname -r 4.9.4-201.fc25.x86_64 Issue is not fixed... янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/0/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/1/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/2/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/3/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/4/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/5/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/6/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/7/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/8/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/9/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/10/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/11/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/12/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/13/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/14/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: Cannot read MSR_ERROR_CONTROL from /dev/cpu/15/msr янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: : Input/output error янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: warning: 16 bytes ignored in each record янв 22 13:51:42 homedesk.homedesk.org.ua mcelog[1032]: mcelog: consider an update |