Bug 561370 - KVM guest crashed during a multi guest database run
KVM guest crashed during a multi guest database run
Status: CLOSED CANTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm (Show other bugs)
5.6
x86_64 Linux
low Severity high
: rc
: ---
Assigned To: Marcelo Tosatti
Virtualization Bugs
:
Depends On:
Blocks: Rhel5KvmTier2
  Show dependency treegraph
 
Reported: 2010-02-03 09:45 EST by Sanjay Rao
Modified: 2013-01-09 17:16 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-05-27 14:22:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Sanjay Rao 2010-02-03 09:45:00 EST
Description of problem:

KVM guest crashes during multi guest run (running database workload). The host is AMD  (Six-Core AMD Opteron(tm) Processor 8431).


Version-Release number of selected component (if applicable):

Host and guests running 2.6.18-186.el5
File system used for the testing - ext4 (e4fsprogs-1.41.9-3.el5)


How reproducible:

Happened just once. Do not know if this can be reproduced.


Steps to Reproduce:
1. Started 4 KVM guests (6 cpus - 14G each)
2. Ran database workload
3. One of the guests crashed.
  
Actual results:

Message in /var/log/messages in the guest at the time of the crash.


Feb  2 18:02:48 dhcp47-99 kernel: list_add corruption. prev->next should be ffff81038f7d3e28, but was 0000000000497000
Feb  2 18:02:48 dhcp47-99 kernel: ----------- [cut here ] --------- [please bite here ] ---------
Feb  2 18:02:48 dhcp47-99 kernel: Kernel BUG at lib/list_debug.c:31
Feb  3 08:37:19 dhcp47-99 syslogd 1.4.1: restart.



Expected results:

The guest should continue to run.

Additional info:

The screen shot of the console is attached.
Comment 1 Marcelo Tosatti 2010-02-18 23:54:08 EST
Sanjay,

Can you please attempt to reproduce the bug, and save the entire oops message (also there's no screenshot attached?).

Will look for possible candidates in the meantime. Sorry for the late reply.

Thanks
Comment 2 Marcelo Tosatti 2010-02-18 23:56:53 EST
Also, were hugepages being used?
Comment 3 Sanjay Rao 2010-02-19 07:59:31 EST
I will try to reproduce the problem when I get a chance. But I am not sure that this issue is reproducible. That's why I captured everything that was reported hoping that it might give some clues.

Also there is an issue with ext4 running oracle databases. (BZ 562219). I am not sure if the two are related. 

This test was not using huge pages.
Comment 6 Marcelo Tosatti 2010-02-24 15:41:30 EST
Postponing to RHEL 5.6.
Comment 8 Marcelo Tosatti 2010-05-27 14:22:17 EDT
Closing the bug on the grounds its a one time memory corruption report, there's not much that can be done without a reproducible case.

Please reopen if necessary.

Note You need to log in before you can comment on or make changes to this bug.