Bug 753698 - RHEV-H host kernel panic after 4 days running with 128 VMs in the cluster
Summary: RHEV-H host kernel panic after 4 days running with 128 VMs in the cluster
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: rhev-hypervisor6
Version: 6.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Mike Burns
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-14 08:06 UTC by Guohua Ouyang
Modified: 2016-04-26 14:14 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-29 12:01:43 UTC
Target Upstream Version:


Attachments (Terms of Use)
logs (2.66 MB, application/x-gzip)
2011-11-14 08:07 UTC, Guohua Ouyang
no flags Details
kernel_panic screenshot (60.18 KB, image/png)
2011-11-14 09:36 UTC, Guohua Ouyang
no flags Details
kernel panic messages (2.32 MB, application/octet-stream)
2011-11-29 02:30 UTC, Guohua Ouyang
no flags Details

Description Guohua Ouyang 2011-11-14 08:06:54 UTC
Description of problem:
Have a cluster with two RHEV-H hosts, and created 128 VMs in the cluster, each RHEV-H hosts 60+ VMs, after 4 days running, one host kernel panic.

Version-Release number of selected component (if applicable):
6.2-20111103.1
vdsm-4.9-110.el6.x86_64
RHEVM IC148

How reproducible:
5%.

Steps to Reproduce:
1. Create a cluster with two RHEV-H hosts.
2. Create 128 VMs in the cluster, each RHEV-H hosts 60+ VMs.
3. Keep it running.

Actual results:
after 4 days running, one RHEV-H host kernel panic.

Expected results:
RHEV-H host should work long time.

Additional info:
no vmcore file available.

Comment 1 Guohua Ouyang 2011-11-14 08:07:38 UTC
Created attachment 533452 [details]
logs

Comment 3 Guohua Ouyang 2011-11-14 09:36:47 UTC
Created attachment 533470 [details]
kernel_panic screenshot

Comment 4 Guohua Ouyang 2011-11-29 02:30:28 UTC
Created attachment 537734 [details]
kernel panic messages

Kernel panic again after 18days running, attach the full message log file.

Comment 6 Perry Myers 2011-11-29 20:53:59 UTC
ycui: Is this bug specific to RHEV-H?  The stacktrace has qemu-kvm in it, so somehow I doubt this is RHEV-H specific.  

dor, can you have the kvm guys take a look at this?

Comment 8 Ying Cui 2011-12-02 02:45:33 UTC
(In reply to comment #6)
> ycui: Is this bug specific to RHEV-H?  The stacktrace has qemu-kvm in it, so
> somehow I doubt this is RHEV-H specific.  
> 

We are trying it on RHEL 6.2 env now, any update info, I will add comments into the bug. 

Thanks
Ying

Comment 11 Perry Myers 2012-02-01 16:31:06 UTC
(In reply to comment #8)
> We are trying it on RHEL 6.2 env now, any update info, I will add comments into
> the bug. 

Were you able to reproduce this on RHEV-H 6.2 GA?  What about RHEL 6.2?  If we can't reproduce it, I think we should just close this bug.

Comment 12 Guohua Ouyang 2012-02-02 06:16:15 UTC
(In reply to comment #11)
> (In reply to comment #8)
> > We are trying it on RHEL 6.2 env now, any update info, I will add comments into
> > the bug. 
> 
> Were you able to reproduce this on RHEV-H 6.2 GA?  What about RHEL 6.2?  If we
> can't reproduce it, I think we should just close this bug.


1. Have tested on RHEL6.2 also, running with 10 VMs about 20 days, not see this issue.

2. installed RHEV-H 6.2 GA (20111117.0) on the same machine and installed 17 VMs, it's been running about 37days now, not see this issue. (this environment will be keep running and more VMs will be adding to it, it holds our several RHEVM VMs).

lower the priority and severity firstly and we will keep watching this issue.

Comment 14 Dave Allan 2012-06-28 16:29:12 UTC
(In reply to comment #12)
> lower the priority and severity firstly and we will keep watching this issue.

It seems like we haven't seen this panic in months where it was reproducing in days before.  Do you think the BZ can be closed now?

Comment 15 Guohua Ouyang 2012-06-29 01:51:41 UTC
(In reply to comment #14)
> (In reply to comment #12)
> > lower the priority and severity firstly and we will keep watching this issue.
> 
> It seems like we haven't seen this panic in months where it was reproducing
> in days before.  Do you think the BZ can be closed now?

I agree to close this at the moment, I have not reproduce it so far.


Note You need to log in before you can comment on or make changes to this bug.