Bug 1578846 - unstable production kernels 4.16.x under Xenserver
Summary: unstable production kernels 4.16.x under Xenserver
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1588395
TreeView+ depends on / blocked
 
Reported: 2018-05-16 13:31 UTC by customercare
Modified: 2022-07-13 16:39 UTC (History)
17 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-07-13 16:39:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description customercare 2018-05-16 13:31:09 UTC
Description of problem:

Kernel 4.16.x do crash after different runtimes inside a XENServer VM.

#####
##### 4.16.5+ are HIGHLY UNSTABLE ###
#####

We have different VM and Xenversions running, it's not bound to a specific one.

Check this boot log : 

reboot   system boot  4.15.17-200.fc26 Mon May  7 00:24   still running
reboot   system boot  4.16.5-200.fc27. Sun May  6 19:29 - 23:49  (04:19)
reboot   system boot  4.16.5-200.fc27. Sun May  6 16:57 - 23:49  (06:52)
reboot   system boot  4.16.5-200.fc27. Sun May  6 15:17 - 23:49  (08:32)
reboot   system boot  4.16.5-200.fc27. Sun May  6 13:48 - 23:49  (10:00)
reboot   system boot  4.16.5-200.fc27. Sat May  5 21:53 - 23:49 (1+01:55)
reboot   system boot  4.16.5-200.fc27. Sat May  5 11:59 - 23:49 (1+11:50)
reboot   system boot  4.16.5-200.fc27. Fri May  4 12:24 - 23:49 (2+11:25)
reboot   system boot  4.16.5-200.fc27. Fri May  4 11:54 - 23:49 (2+11:55)
reboot   system boot  4.15.17-200.fc26 Fri May  4 11:04 - 11:53  (00:48)
reboot   system boot  4.15.12-201.fc26 Thu Mar 29 12:12 - 11:03 (35+22:51)
reboot   system boot  4.15.9-200.fc26. Wed Mar 14 23:40 - 12:10 (14+11:30)
reboot   system boot  4.14.14-200.fc26 Sat Jan 27 04:33 - 23:39 (46+19:05)
reboot   system boot  4.14.13-200.fc26 Mon Jan 15 17:10 - 04:33 (11+11:22)
reboot   system boot  4.13.13-200.fc26 Thu Nov 23 20:47 - 17:10 (52+20:22)
reboot   system boot  4.13.13-100.fc25 Thu Nov 23 20:13 - 20:47  (00:33)
reboot   system boot  4.11.12-200.fc25 Thu Jul 27 14:46 - 20:13 (119+06:27)

next server:

reboot   system boot  4.15.9-200.fc26. Wed May 16 14:05   still running
reboot   system boot  4.16.7-100.fc26. Wed May 16 13:36 - 14:06  (00:30)
reboot   system boot  4.16.7-100.fc26. Wed May 16 10:33 - 14:06  (03:33)


Version-Release number of selected component (if applicable):

4.16.x

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

crash around 30-90 Minutes into running

Expected results:

stable running as 4.15.17

Additional info:


For more informations how we discovered the bug, see here:

https://bugzilla.redhat.com/show_bug.cgi?id=1575403

Comment 1 Jeremy Cline 2018-05-30 19:54:27 UTC
Hi,

Please attach the complete kernel log from a boot that crashes. You can get the log with "journalctl -k" and use the "-b" flag to select a previous boot log.

Thanks.

Comment 2 customercare 2018-10-02 16:34:49 UTC
kernel 4.18.10-100 crashing after 8 1/2 hours on a low traffic production server.

reboot   system boot  4.15.17-200.fc26 Tue Oct  2 18:30   still running
reboot   system boot  4.18.10-100.fc27 Tue Oct  2 18:19 - 18:30  (00:11)
reboot   system boot  4.18.10-100.fc27 Tue Oct  2 09:51 - 18:30  (08:39)

Comment 3 customercare 2019-04-08 15:11:49 UTC
atm: testing stability for kernel 5.0.5 against XenServer 7.6 .. result pending.

Comment 4 customercare 2019-04-19 15:20:17 UTC
temporary result: 5.0.6 kernels seem to be stable again with xen.

Comment 5 customercare 2019-04-19 15:27:47 UTC
@Jeremy:

due to the nature of the crashes, all logs get lost in the imminent filesystem corruption. Its a spontane reset of the entire vm. As a result, you not even see a oops message on the connected display,it gets overwritten by the restart of the server.

I have upgraded several Vms last week to kernels 5.0.x and all seem to run stable again. Whatever fixed the problem, must be implemented in the last few releases in 4.20.x or 5.0.x, as i tried several 4.20.x kernels and all (tested) failed.

Comment 6 customercare 2020-06-06 06:59:44 UTC
Request to close


Note You need to log in before you can comment on or make changes to this bug.