Bug 753698

Summary: RHEV-H host kernel panic after 4 days running with 128 VMs in the cluster
Product: Red Hat Enterprise Linux 6 Reporter: Guohua Ouyang <gouyang>
Component: rhev-hypervisor6Assignee: Mike Burns <mburns>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: acathrow, bsarathy, cshao, dallan, gouyang, jboggs, juzhang, leiwang, mburns, michen, moli, ovirt-maint, ycui, yeylon
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-29 12:01:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
logs
none
kernel_panic screenshot
none
kernel panic messages none

Description Guohua Ouyang 2011-11-14 08:06:54 UTC
Description of problem:
Have a cluster with two RHEV-H hosts, and created 128 VMs in the cluster, each RHEV-H hosts 60+ VMs, after 4 days running, one host kernel panic.

Version-Release number of selected component (if applicable):
6.2-20111103.1
vdsm-4.9-110.el6.x86_64
RHEVM IC148

How reproducible:
5%.

Steps to Reproduce:
1. Create a cluster with two RHEV-H hosts.
2. Create 128 VMs in the cluster, each RHEV-H hosts 60+ VMs.
3. Keep it running.

Actual results:
after 4 days running, one RHEV-H host kernel panic.

Expected results:
RHEV-H host should work long time.

Additional info:
no vmcore file available.

Comment 1 Guohua Ouyang 2011-11-14 08:07:38 UTC
Created attachment 533452 [details]
logs

Comment 3 Guohua Ouyang 2011-11-14 09:36:47 UTC
Created attachment 533470 [details]
kernel_panic screenshot

Comment 4 Guohua Ouyang 2011-11-29 02:30:28 UTC
Created attachment 537734 [details]
kernel panic messages

Kernel panic again after 18days running, attach the full message log file.

Comment 6 Perry Myers 2011-11-29 20:53:59 UTC
ycui: Is this bug specific to RHEV-H?  The stacktrace has qemu-kvm in it, so somehow I doubt this is RHEV-H specific.  

dor, can you have the kvm guys take a look at this?

Comment 8 Ying Cui 2011-12-02 02:45:33 UTC
(In reply to comment #6)
> ycui: Is this bug specific to RHEV-H?  The stacktrace has qemu-kvm in it, so
> somehow I doubt this is RHEV-H specific.  
> 

We are trying it on RHEL 6.2 env now, any update info, I will add comments into the bug. 

Thanks
Ying

Comment 11 Perry Myers 2012-02-01 16:31:06 UTC
(In reply to comment #8)
> We are trying it on RHEL 6.2 env now, any update info, I will add comments into
> the bug. 

Were you able to reproduce this on RHEV-H 6.2 GA?  What about RHEL 6.2?  If we can't reproduce it, I think we should just close this bug.

Comment 12 Guohua Ouyang 2012-02-02 06:16:15 UTC
(In reply to comment #11)
> (In reply to comment #8)
> > We are trying it on RHEL 6.2 env now, any update info, I will add comments into
> > the bug. 
> 
> Were you able to reproduce this on RHEV-H 6.2 GA?  What about RHEL 6.2?  If we
> can't reproduce it, I think we should just close this bug.


1. Have tested on RHEL6.2 also, running with 10 VMs about 20 days, not see this issue.

2. installed RHEV-H 6.2 GA (20111117.0) on the same machine and installed 17 VMs, it's been running about 37days now, not see this issue. (this environment will be keep running and more VMs will be adding to it, it holds our several RHEVM VMs).

lower the priority and severity firstly and we will keep watching this issue.

Comment 14 Dave Allan 2012-06-28 16:29:12 UTC
(In reply to comment #12)
> lower the priority and severity firstly and we will keep watching this issue.

It seems like we haven't seen this panic in months where it was reproducing in days before.  Do you think the BZ can be closed now?

Comment 15 Guohua Ouyang 2012-06-29 01:51:41 UTC
(In reply to comment #14)
> (In reply to comment #12)
> > lower the priority and severity firstly and we will keep watching this issue.
> 
> It seems like we haven't seen this panic in months where it was reproducing
> in days before.  Do you think the BZ can be closed now?

I agree to close this at the moment, I have not reproduce it so far.