Bug 553249 - hypercall device - Vm becomes non responsive on Sysmark benchmark (when more than 7 vm's running simultaneously)
Summary: hypercall device - Vm becomes non responsive on Sysmark benchmark (when more ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.4
Hardware: All
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Gleb Natapov
QA Contact: Oded Ramraz
URL:
Whiteboard:
Depends On: 503759
Blocks: 553441
TreeView+ depends on / blocked
 
Reported: 2010-01-07 14:27 UTC by RHEL Program Management
Modified: 2013-12-09 00:45 UTC (History)
15 users (show)

Fixed In Version: kvm-83-105.el5_4.18
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-02-09 10:02:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0088 0 normal SHIPPED_LIVE Important: kvm security and bug fix update 2010-02-09 10:01:51 UTC

Description RHEL Program Management 2010-01-07 14:27:08 UTC
This bug has been copied from bug #503759 and has been proposed
to be backported to 5.4 z-stream (EUS).

Comment 9 Yaniv Kaul 2010-01-19 16:44:05 UTC
I hope you have the agent + driver on the guest.

Comment 10 lihuang 2010-01-19 16:50:37 UTC
(In reply to comment #9)
> I hope you have the agent + driver on the guest.    

Yes. both installed and running.
install from rhevm-guest-tools-2.1-39917.iso

Comment 17 Yaniv Kaul 2010-01-20 15:45:23 UTC
1. "-uuid `uuidgen`" means you are giving the VM a different UUID every run. That's strange.
2. "-usbdevice tablet" is not needed if you have the drivers and agents of Spice.

Comment 18 lihuang 2010-01-20 16:01:37 UTC
(In reply to comment #17)
> 1. "-uuid `uuidgen`" means you are giving the VM a different UUID every run.
> That's strange.
 Is the uuid saved and reused after I quit and restart vm ?  (since no migration involved ).difficult for me to remember every uuid i have used when start vm from command line directly ... 
  

> 2. "-usbdevice tablet" is not needed if you have the drivers and agents of
> Spice.    

  If this affect current test. I will restart it. otherwise, I will keep the vm running for 12hour.

Comment 24 Oded Ramraz 2010-01-26 07:33:43 UTC
I don't really understand why you tried to reproduce this bug without "Sysmark" utility.
I managed to reproduce this issue with 20 1GB VM's which are running "Sysmark" utility simultaneously .
I reopen this bug.
Please contact me if you need help with reproducing this issue.

Comment 25 Yaniv Kaul 2010-01-26 07:44:19 UTC
(In reply to comment #24)
> I don't really understand why you tried to reproduce this bug without "Sysmark"
> utility.
> I managed to reproduce this issue with 20 1GB VM's which are running "Sysmark"
> utility simultaneously .
> I reopen this bug.
> Please contact me if you need help with reproducing this issue.    

Oded - which KVM version have you used? Also, it's a good idea to try with newer hypercall drivers (which fix the issue on their side) - they should be available only on the next build (or you can take from nightly - talk to Barak first - I WHQL'ed them already).

Comment 26 Oded Ramraz 2010-01-26 07:53:10 UTC
## Host CPU information:

$h.GetCpuStatistics()

Cores  : 4
Model  : Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz
Speed  : 1596

## KVM version:

kvm-83-147.el5

I talk to Barak about the new hypercall drivers.

Comment 27 Oded Ramraz 2010-01-26 12:34:01 UTC
I managed to reproduce this issue with the new hypercall driver,
I'm not sure if it's the same bug because the VM is stuck but it does not consumes 100 percent CPU.

gnatapov is investigating the stuck process.

Comment 28 Eduardo Habkost 2010-01-26 14:36:30 UTC
(In reply to comment #27)
> gnatapov is investigating the stuck process.    

Reassigning to Gleb, then.

Comment 29 Dor Laor 2010-01-27 11:55:06 UTC
Please check if the driver calls the reset device when it unloads (hope that it try to unload)

Comment 30 Oded Ramraz 2010-01-28 08:58:48 UTC
I think that there are two different issues here (both occurs during "Sysmark" benchmark execution):

In both cases the VM is stuck but in one case it consumes 100 percent CPU ( probably the hypercall issue) and in the second it does not consumes 100 percent CPU.

The second case is quite common and i'm able to reproduce it ( also with the new hypercall drivers )

I have few questions:

1. Do you want me to open separate busy for the second case?
2. Can you assign someone to check this issue immediately , Gleb is quite busy with other things and we have a RHEVM release in few weeks.

I'll try to add memory dump for my stuck Guests.

Comment 32 Oded Ramraz 2010-02-07 16:41:23 UTC
When the hypercall issue occurs , the VM is consuming 100 percent CPU.
The phenomenon i see is a little bit different: VM stuck after boot but it does not consumes 100 percent CPU. I'm moving this bug back to ON_QA and open a new one.

Comment 34 Oded Ramraz 2010-02-08 10:50:25 UTC
I run the test with 15 1GB guests ( on 5.5 host - see comment 7 )
Didn't manage to reproduce the 100 percent CPU issue.
You can move this bug status to VERIFIED

Comment 35 lihuang 2010-02-08 11:34:00 UTC
5.5 and 5.4-z shared the same patch. so setting to verified.

Comment 38 errata-xmlrpc 2010-02-09 10:02:40 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0088.html


Note You need to log in before you can comment on or make changes to this bug.