Bug 561370 - KVM guest crashed during a multi guest database run
Summary: KVM guest crashed during a multi guest database run
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.6
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Marcelo Tosatti
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: Rhel5KvmTier2
TreeView+ depends on / blocked
 
Reported: 2010-02-03 14:45 UTC by Sanjay Rao
Modified: 2013-01-09 22:16 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-05-27 18:22:17 UTC


Attachments (Terms of Use)

Description Sanjay Rao 2010-02-03 14:45:00 UTC
Description of problem:

KVM guest crashes during multi guest run (running database workload). The host is AMD  (Six-Core AMD Opteron(tm) Processor 8431).


Version-Release number of selected component (if applicable):

Host and guests running 2.6.18-186.el5
File system used for the testing - ext4 (e4fsprogs-1.41.9-3.el5)


How reproducible:

Happened just once. Do not know if this can be reproduced.


Steps to Reproduce:
1. Started 4 KVM guests (6 cpus - 14G each)
2. Ran database workload
3. One of the guests crashed.
  
Actual results:

Message in /var/log/messages in the guest at the time of the crash.


Feb  2 18:02:48 dhcp47-99 kernel: list_add corruption. prev->next should be ffff81038f7d3e28, but was 0000000000497000
Feb  2 18:02:48 dhcp47-99 kernel: ----------- [cut here ] --------- [please bite here ] ---------
Feb  2 18:02:48 dhcp47-99 kernel: Kernel BUG at lib/list_debug.c:31
Feb  3 08:37:19 dhcp47-99 syslogd 1.4.1: restart.



Expected results:

The guest should continue to run.

Additional info:

The screen shot of the console is attached.

Comment 1 Marcelo Tosatti 2010-02-19 04:54:08 UTC
Sanjay,

Can you please attempt to reproduce the bug, and save the entire oops message (also there's no screenshot attached?).

Will look for possible candidates in the meantime. Sorry for the late reply.

Thanks

Comment 2 Marcelo Tosatti 2010-02-19 04:56:53 UTC
Also, were hugepages being used?

Comment 3 Sanjay Rao 2010-02-19 12:59:31 UTC
I will try to reproduce the problem when I get a chance. But I am not sure that this issue is reproducible. That's why I captured everything that was reported hoping that it might give some clues.

Also there is an issue with ext4 running oracle databases. (BZ 562219). I am not sure if the two are related. 

This test was not using huge pages.

Comment 6 Marcelo Tosatti 2010-02-24 20:41:30 UTC
Postponing to RHEL 5.6.

Comment 8 Marcelo Tosatti 2010-05-27 18:22:17 UTC
Closing the bug on the grounds its a one time memory corruption report, there's not much that can be done without a reproducible case.

Please reopen if necessary.


Note You need to log in before you can comment on or make changes to this bug.