Bug 1474786 - F26 server - system freeze on MCE - cpu J3455
F26 server - system freeze on MCE - cpu J3455
Status: CLOSED CANTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
26
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-25 07:49 EDT by rej
Modified: 2017-07-25 10:31 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-07-25 10:31:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
var/spool/abrt directory zip (415.14 KB, application/zip)
2017-07-25 07:49 EDT, rej
no flags Details

  None (edit)
Description rej 2017-07-25 07:49:08 EDT
Created attachment 1304177 [details]
var/spool/abrt directory zip

Fedora server 26 freeze on MCE 

kenel 4.11.10-300.fc26.x86_64


How reproducible:


Steps to Reproduce:
system randomly freeze - sometimes in few minutes, sometimes in few hours, sometimes under load, sometimes when idle


Additional info:

seems like problem with new Intel CPU

The kernel log indicates that hardware errors were detected.
System log may have more information.
The last 20 mcelog lines of system log are:
==========================================
Jul 22 14:27:52 backup mcelog: mcelog: warning: 24 bytes ignored in each record
Jul 22 14:27:52 backup mcelog: mcelog: consider an update
Jul 22 14:57:56 backup mcelog: mcelog: Family 6 Model 5c CPU: only decoding architectural errors
Jul 22 14:57:56 backup mcelog: mcelog: Family 6 Model 5c CPU: only decoding architectural errors
Jul 22 14:57:56 backup mcelog: Hardware event. This is not a software error.
Jul 22 14:57:56 backup mcelog: MCE 0
Jul 22 14:57:56 backup mcelog: CPU 0 BANK 4
Jul 22 14:57:56 backup mcelog: ADDR fef135c0
Jul 22 14:57:56 backup mcelog: TIME 1500728265 Sat Jul 22 14:57:45 2017
Jul 22 14:57:56 backup mcelog: MCG status:
Jul 22 14:57:56 backup mcelog: MCi status:
Jul 22 14:57:56 backup mcelog: Uncorrected error
Jul 22 14:57:56 backup mcelog: MCi_ADDR register valid
Jul 22 14:57:56 backup mcelog: Processor context corrupt
Jul 22 14:57:56 backup mcelog: MCA: Internal unclassified error: 408
Jul 22 14:57:56 backup mcelog: STATUS a600000000020408 MCGSTATUS 0
Jul 22 14:57:56 backup mcelog: MCGCAP c07 APICID 0 SOCKETID 0
Jul 22 14:57:56 backup mcelog: CPUID Vendor Intel Family 6 Model 92
Jul 22 14:57:56 backup mcelog: mcelog: warning: 24 bytes ignored in each record
Jul 22 14:57:56 backup mcelog: mcelog: consider an update

$ cat cpuinfo
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               92
Model name:          Intel(R) Celeron(R) CPU J3455 @ 1.50GHz
Stepping:            9
CPU MHz:             797.790
CPU max MHz:         2300.0000
CPU min MHz:         800.0000
BogoMIPS:            2995.20
Virtualization:      VT-x
L1d cache:           24K
L1i cache:           32K
L2 cache:            1024K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cat_l2 intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts
Comment 1 Laura Abbott 2017-07-25 10:31:47 EDT
As the message indicates, MCEs are not a software issue. This is something your hardware is reporting so you probably have hardware issues.

Note You need to log in before you can comment on or make changes to this bug.