Created attachment 1344559 [details] kernel panic, captured via serial port Description of problem: My new ryzen7 visualization lab is randomly crashing with a panic. It took several weeks until i was able to get a serial port and capture the kernel panic. Without the serial port it was impossible to see the panic. Always a black screen when i plug the DVI connector. This panic happens after about a day, more or less. Sometimes is faster. Due to this i am not using it much. And yesterday i got the serial port. I do not know the nature of "list_add corruption" class of issues. Thanks in advance. Version-Release number of selected component (if applicable): F26 Server How reproducible: always (but random) Steps to Reproduce: 1. boot 2. use system (libvirt) 3. wait 4. panic Actual results: kernel panic Expected results: no panic Additional info #1: CPU : Ryzen7 1700X RAM : 64 GB [Corsair LXP CMK16GX4M1B30000C15 x] MOBO : Asus PRIME X370-PRO [BIOS = 0902] 28:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730] Subsystem: XFX Pine Group Inc. Device 3061 Kernel driver in use: radeon Kernel modules: radeon New: CPU, MOBO, RAM, some HDDs. Old: VGA, one HDD, 1 SDD. Let's call this "system-2" I have 1 monitor and 2 systems. (system-1 is my primary, system-2 is the virtualization lab). Additional info #2: Getting the panic without configuring the serial console (grub2/os) was almos impossible. There is another overlapping issue (imho, unrelated to the panic issue) The monitor button to switch displays does not work very well, so the DVI is only connected to the system-1 Before replacing the system-2 hardware (old mobo, old cpu, etc), i noticed (some months ago, but i could not report it) that when i connected the DVI, there was no display. That used to work in the past but broke some unknown number of kernels ago. There was display only after a boot. No DVI port sensing. That was working since ages. So i don't think this is related, but it made getting the panic very difficult. Additional info #3: Without configuring the serial port, most boots end up in black screen. Now, it always boots.
F26 Server Kernel is: 4.13.9-200.fc26.x86_64
Created attachment 1344560 [details] lspci
Created attachment 1344573 [details] messages logfile
This seems similar to what I'm seeing. Symptoms: About a day or two after boot the computer appears to lock up. So far it has always happened while I was away and the console was locked. When I return and turn on the monitor the system does not respond to keypresses or mouse movements. The screen remains black. In some cases the system has responded to ping but not to SSH. In other cases it hasn't responded even to ping. To get more data I've configured a serial console and attached a null-modem cable. The messages I get there vary. In two cases so far there has been list corruption like in this bug report. This may or may not be related to bug 1450769, which seems similar to other cases I've seen. Linux 4.13.11-300.fc27.x86_64 Motherboard: Asus Prime X370-pro Processor: AMD Ryzen Memory: Kingston KVR24E17D8/16MA with ECC support Graphics card: AMD/XFX Radeon RX 460
Created attachment 1355999 [details] kernel log from boot to panic, showing list_del corruption
Created attachment 1356001 [details] kernel log from boot to hang, showing list_add corruption In this case no kernel panic message was printed.
Created attachment 1366582 [details] kernel log from boot to panic, showing list_del corruption another kernel panic, this time in Linux 4.13.16-302.fc27.x86_64
The rcu_nocbs workaround that is discussed at https://bugzilla.kernel.org/show_bug.cgi?id=196683 seems to prevent both these kernel panics and the soft lockups of bug 1450769.
We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. The kernel moves very fast so bugs may get fixed as part of a kernel update. Due to this, we are doing a mass bug update across all of the Fedora 26 kernel bugs. Fedora 26 has now been rebased to 4.15.4-200.fc26. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 27, and are still experiencing this issue, please change the version to Fedora 27. If you experience different issues, please open a new bug report for those.
It still happens in Linux 4.15.9-300.fc27.x86_64 without rcu_nocbs.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs. Fedora 27 has now been rebased to 4.17.7-100.fc27. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.