| Summary: | system reset with 2.6.18-274.* | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Kapetanakis Giannis <bilias> | ||||
| Component: | kernel-xen | Assignee: | Xen Maintainance List <xen-maint> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 5.7 | CC: | drjones, lersek, xen-maint | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-11-01 11:29:27 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Try running the latest kernel (-286) on it to see if it still happens. Also set it up with crashdump to get a core next time it fails (see the instructions below how to do that). You can also poke through your /var/log/messages now to see if there's any clues. Look for similar logs that popped up at or before the reboots.
You can set up the host to capture a dump as follows
1) set crashkernel=128M@32M on the xen.gz line
(256M instead of 128M may be necessary)
2) Make sure kexec-tools is installed
3) Make sure the bare-metal kernel is installed
(it will kexec into the bare-metal kernel)
4) 'service kdump start' and/or turn it on for runlevels needed
You can test that it works by triggering the crash in some way, such as
"echo c > /proc/sysrq-trigger"
or by using ctrl-a ctrl-a ctrl-a on the console for Xen's debug prompt, then
the 'C' command to generate the dump
The dump will be in /var/crash/<date>/vmcore
(check that there's enough disk space for it before)
Hello Giannis, did you have any luck with capturing a dump (comment 1)? Thanks. Hi Giannis, if you have the kernel dump, please reopen, and also state whether you have a RHEL subscription. Thank you very much. |
Created attachment 526044 [details] lspci -vvv Hi, Since I upgraded to kernel-xen 2.6.18-274 and 2.6.18-274.3.1 I experience hard system resets on this server. This is Dell PowerEdge 1950 running 5.7 up2date and and is running XEN with 5 VMs. Logs do not show anything. No hung, no reboot. It's like a reset every 1 or two days without any reason. No special load on the host or the vms. With previous kernel 2.6.18-238.19.1 I don't have this kind of problem. # last | grep reboot reboot system boot 2.6.18-274.3.1.e Mon Oct 3 12:26 (02:10) reboot system boot 2.6.18-274.3.1.e Sat Oct 1 22:49 (1+15:47) reboot system boot 2.6.18-274.3.1.e Sat Oct 1 19:47 (1+18:49) reboot system boot 2.6.18-274.3.1.e Sat Oct 1 15:30 (1+23:06) reboot system boot 2.6.18-274.3.1.e Sat Oct 1 01:30 (2+13:07) reboot system boot 2.6.18-274.3.1.e Fri Sep 30 19:48 (2+18:48) ... reboot system boot 2.6.18-274.el5xe Mon Sep 26 12:45 (00:13) reboot system boot 2.6.18-274.el5xe Sun Sep 25 19:58 (17:00) reboot system boot 2.6.18-274.el5xe Sun Sep 25 15:40 (21:17) reboot system boot 2.6.18-274.el5xe Sat Sep 24 14:33 (1+22:25) ... reboot system boot 2.6.18-238.19.1. Sun Aug 21 15:20 (24+00:24) reboot system boot 2.6.18-238.12.1. Wed Jul 20 14:20 (32+00:57) Any help on debugging this? System's hardware seems ok at least from Dell management software. best regards, Giannis