Bug 742928 - system reset with 2.6.18-274.*
Summary: system reset with 2.6.18-274.*
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-03 12:13 UTC by Kapetanakis Giannis
Modified: 2011-11-01 11:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-01 11:29:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
lspci -vvv (39.17 KB, text/plain)
2011-10-03 12:13 UTC, Kapetanakis Giannis
no flags Details

Description Kapetanakis Giannis 2011-10-03 12:13:55 UTC
Created attachment 526044 [details]
lspci -vvv

Hi,

Since I upgraded to kernel-xen 2.6.18-274 and 2.6.18-274.3.1 
I experience hard system resets on this server.

This is Dell PowerEdge 1950 running 5.7 up2date and and is running XEN with 5 VMs.

Logs do not show anything. No hung, no reboot. It's like a reset every 1 or two days without any reason. No special load on the host or the vms.

With previous kernel 2.6.18-238.19.1 I don't have this kind of problem.

# last | grep reboot

reboot   system boot  2.6.18-274.3.1.e Mon Oct  3 12:26          (02:10)    
reboot   system boot  2.6.18-274.3.1.e Sat Oct  1 22:49         (1+15:47)   
reboot   system boot  2.6.18-274.3.1.e Sat Oct  1 19:47         (1+18:49)   
reboot   system boot  2.6.18-274.3.1.e Sat Oct  1 15:30         (1+23:06)   
reboot   system boot  2.6.18-274.3.1.e Sat Oct  1 01:30         (2+13:07)   
reboot   system boot  2.6.18-274.3.1.e Fri Sep 30 19:48         (2+18:48)   
...
reboot   system boot  2.6.18-274.el5xe Mon Sep 26 12:45          (00:13)    
reboot   system boot  2.6.18-274.el5xe Sun Sep 25 19:58          (17:00)    
reboot   system boot  2.6.18-274.el5xe Sun Sep 25 15:40          (21:17)    
reboot   system boot  2.6.18-274.el5xe Sat Sep 24 14:33         (1+22:25)   
...
reboot   system boot  2.6.18-238.19.1. Sun Aug 21 15:20         (24+00:24)  
reboot   system boot  2.6.18-238.12.1. Wed Jul 20 14:20         (32+00:57)  

Any help on debugging this? System's hardware seems ok at least from Dell management software.

best regards,

Giannis

Comment 1 Andrew Jones 2011-10-03 12:23:36 UTC
Try running the latest kernel (-286) on it to see if it still happens. Also set it up with crashdump to get a core next time it fails (see the instructions below how to do that). You can also poke through your /var/log/messages now to see if there's any clues. Look for similar logs that popped up at or before the reboots.


You can set up the host to capture a dump as follows

1)  set crashkernel=128M@32M on the xen.gz line
        (256M instead of 128M may be necessary)
2)  Make sure kexec-tools is installed
3)  Make sure the bare-metal kernel is installed
        (it will kexec into the bare-metal kernel)
4)  'service kdump start' and/or turn it on for runlevels needed

You can test that it works by triggering the crash in some way, such as
        "echo c > /proc/sysrq-trigger"
or by using ctrl-a ctrl-a ctrl-a on the console for Xen's debug prompt, then
        the 'C' command to generate the dump

The dump will be in /var/crash/<date>/vmcore
        (check that there's enough disk space for it before)

Comment 2 Laszlo Ersek 2011-10-12 12:47:10 UTC
Hello Giannis,

did you have any luck with capturing a dump (comment 1)?

Thanks.

Comment 3 Laszlo Ersek 2011-11-01 11:29:27 UTC
Hi Giannis,

if you have the kernel dump, please reopen, and also state whether you have a RHEL subscription. Thank you very much.


Note You need to log in before you can comment on or make changes to this bug.