1165336 – FC20 qemu needs kvmclock bugfixes

Bug 1165336 - FC20 qemu needs kvmclock bugfixes

Summary: FC20 qemu needs kvmclock bugfixes

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	vdsm
Sub Component:
Version:	3.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.5.1
Assignee:	Francesco Romani
QA Contact:	Gil Klein
Docs Contact:
URL:
Whiteboard:	virt
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-11-18 20:22 UTC by Markus Stockhausen
Modified:	2016-02-10 19:48 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-11-25 09:13:14 UTC
oVirt Team:	Virt
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	35326	0	master	ABANDONED	vm: re-enable HPET clock for non-windows guests	Never

Description Markus Stockhausen 2014-11-18 20:22:48 UTC

Description of problem:

Environment qemu 1.6.2 / fedora 20 hypervisor host

Since upgrade to OVirt 3.5 VDSM disables hpet timer for VM:
Details here: https://bugzilla.redhat.com/show_bug.cgi?id=1053846

The descision to disable hpet seems strange to me. It is the only reliable timer for online migrations. Especially with SLES 11 SP3 guests we see 100% CPU after one of five live migrations if kvm-clock is in use. This leaves the VMs unusable and a hard restart is required.

After a lot of tests we switched all of our linux VMs to hpet. But since the last VDSM upgrade we see VM lockups after live migration and found the above change. 

Either give a hint, why hpet must be disabled or explain what to do to get VM online migration stable with kvm clocksource.

Comment 1 Francesco Romani 2014-11-19 12:10:15 UTC

(In reply to Markus Stockhausen from comment #0)
> Description of problem:
> 
> Environment qemu 1.6.2 / fedora 20 hypervisor host
> 
> Since upgrade to OVirt 3.5 VDSM disables hpet timer for VM:
> Details here: https://bugzilla.redhat.com/show_bug.cgi?id=1053846
> 
> The descision to disable hpet seems strange to me. It is the only reliable
> timer for online migrations. Especially with SLES 11 SP3 guests we see 100%
> CPU after one of five live migrations if kvm-clock is in use. This leaves
> the VMs unusable and a hard restart is required.
> 
> After a lot of tests we switched all of our linux VMs to hpet. But since the
> last VDSM upgrade we see VM lockups after live migration and found the above
> change. 
> 
> Either give a hint, why hpet must be disabled or explain what to do to get
> VM online migration stable with kvm clocksource.

Hi,

As you can read in the linked BZ we were following the KVM recommendations.
Now we have evidence these recommendations may need some review, at least in the case of live migrations, so I need to check with the source.

Meantime, live migrations are very important to us, so I posted this draft patch:
http://gerrit.ovirt.org/#/c/35326/

Please note this is a draft patch because I don't know yet this is the right direction, but this can somehow help you nevertheless.

Comment 2 Francesco Romani 2014-11-19 13:23:09 UTC

I checked as promised.

The core issue is not the HPET unavailability, because HPET support can be broken and generelly unrecommended - hence the recommendation, which is still valid nowadays.

The problem is that kvm-clock is performing worse than HPET on that qemu version.
So, the actual fix is on kvm-clock, but to get these you'll need a very recent of QEMU (>= 2.2.0rc0), which is not available yet in F20 AFAIK.

The virt-preview includes more up-to-date virt-related packages, but it doesn't include it yet.

Another option is to switch to qemu-kvm-rhev as found on oVirt repo, or as found on CentOS or RHEL, which includes patches to improve stability.

Summary:
- no need to reenable HPET on VDSM
- HPET usage is not recommended anyway
- root issue is kvm-clock being buggy on QEMU

Comment 3 Markus Stockhausen 2014-11-19 15:20:36 UTC

Thanks for taking a look.

For me it is no problem do modify a vm.py to reenable hpet. But this will not fix the root cause. So from an end user perspective the argumentation might look like:

- OVirt supports FC20 hypervisors 
- so it should be stable running on top of it
- If you isntall a Linux VM kvm-clock will be the default clocksource
- But in this setup live migration might fail.

For the unexperienced admin the answer should neither be "go over to virt-preview" nor "go over to qemu-kvm-rhev". Maybe with this BZ we can do a little bit to improve the FC20 out of the box experience.

Can you confirm that those patches fix the issue that we observed - maybe you can have a look at the qemu-kvm-rhev-1.5.3 git history.

http://git.qemu.org/?p=qemu.git;a=commit;h=9a48bcd1b82494671c111109b0eefdb882581499
http://git.qemu.org/?p=qemu.git;a=commit;h=317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c

I will hand them over to FC qemu maintainer (Cole Robinson iirc).

Comment 4 Francesco Romani 2014-11-19 17:04:01 UTC

(In reply to Markus Stockhausen from comment #3)
> Thanks for taking a look.
> 
> For me it is no problem do modify a vm.py to reenable hpet. But this will
> not fix the root cause. So from an end user perspective the argumentation
> might look like:
> 
> - OVirt supports FC20 hypervisors 

This is correct, even though CentOS is usually recommended. But this is another story and do not invalidates your point.

> - so it should be stable running on top of it
> - If you isntall a Linux VM kvm-clock will be the default clocksource
> - But in this setup live migration might fail.
> 
> For the unexperienced admin the answer should neither be "go over to
> virt-preview" nor "go over to qemu-kvm-rhev".

Although I agree with your main point, I although I have mixed feelings about the usage of virt-preview, I do not see the issue about recommending qemu-kvm-rhev, since it is provided in the oVirt repo -unless I'm missing something.

Can you elaborate a bit about what you find wrong on the recommendation of qemu-kvm-rhev?

> Can you confirm that those patches fix the issue that we observed - maybe
> you can have a look at the qemu-kvm-rhev-1.5.3 git history.
> 
> http://git.qemu.org/?p=qemu.git;a=commit;
> h=9a48bcd1b82494671c111109b0eefdb882581499
> http://git.qemu.org/?p=qemu.git;a=commit;
> h=317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c
> 
> I will hand them over to FC qemu maintainer (Cole Robinson iirc).

Sure, checking.

Comment 5 Markus Stockhausen 2014-11-19 17:19:38 UTC

Of course you are right. There is no problem to switch over to distro-foreign packages. Either into a stable direction (CentOS) or a unstable (virt-preview). So both of them are valid choices depending on your needs.

But both of them require manual sync. At least during initial setup and in the worst case if we have kernel/library depencies that might interfere. So the right way should be to fix distro's own package.

Thanks for spending time to have a look at the required bugfixes.

Comment 6 Dan Kenigsberg 2014-11-19 23:13:12 UTC

Markus, regardless to the former discussion, you can re-enable hpet immediately in your guests by placing a before_vm_start hook on each of your hosts doing something like

#!/bin/sh
sed 's:name="hpet" present="no":name="hpet" present="yes"'

(untested, typos possible)

Comment 7 Francesco Romani 2014-11-24 11:39:25 UTC

(In reply to Francesco Romani from comment #4)
> > For the unexperienced admin the answer should neither be "go over to
> > virt-preview" nor "go over to qemu-kvm-rhev".
> 
> Although I agree with your main point, I although I have mixed feelings
> about the usage of virt-preview, I do not see the issue about recommending
> qemu-kvm-rhev, since it is provided in the oVirt repo -unless I'm missing
> something.
> 
> Can you elaborate a bit about what you find wrong on the recommendation of
> qemu-kvm-rhev?
> 
> > Can you confirm that those patches fix the issue that we observed - maybe
> > you can have a look at the qemu-kvm-rhev-1.5.3 git history.
> > 
> > http://git.qemu.org/?p=qemu.git;a=commit;
> > h=9a48bcd1b82494671c111109b0eefdb882581499
> > http://git.qemu.org/?p=qemu.git;a=commit;
> > h=317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c
> > 
> > I will hand them over to FC qemu maintainer (Cole Robinson iirc).
> 
> Sure, checking.

The above two seems the most important. I'm checking in background other patches.

BTW, maybe it is better and simpler to just ask to package QEMU 2.2.0 once it is out for FC21 and FC20.

Comment 8 Nikolai Sednev 2014-11-24 14:21:32 UTC

Hi Markus,
May exact SLES version be provided for retest purposes?
Thanks in advance.

Comment 9 Markus Stockhausen 2014-11-24 18:38:28 UTC

Cluster compatibility: Intel Nehalem Family

VM infos:

cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3

Linux testvm 3.0.76-0.11-default #1 SMP Fri Jun 14 08:21:43 UTC 2013 (ccab990) x86_64 x86_64 x86_64 GNU/Linux


cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 2
model name      : Intel Core i7 9xx (Nehalem Class Core i7)
stepping        : 3
microcode       : 1
cpu MHz         : 2666.666
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips        : 5333.33
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 2
model name      : Intel Core i7 9xx (Nehalem Class Core i7)
stepping        : 3
microcode       : 1
cpu MHz         : 2666.666
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
bogomips        : 5333.33
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Comment 10 Francesco Romani 2014-11-25 09:13:14 UTC

(In reply to Francesco Romani from comment #7)

> > > Can you confirm that those patches fix the issue that we observed - maybe
> > > you can have a look at the qemu-kvm-rhev-1.5.3 git history.
> > > 
> > > http://git.qemu.org/?p=qemu.git;a=commit;
> > > h=9a48bcd1b82494671c111109b0eefdb882581499
> > > http://git.qemu.org/?p=qemu.git;a=commit;
> > > h=317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c
> > > 
> > > I will hand them over to FC qemu maintainer (Cole Robinson iirc).
> > 
> > Sure, checking.
> 
> The above two seems the most important. I'm checking in background other
> patches.
> 
> BTW, maybe it is better and simpler to just ask to package QEMU 2.2.0 once
> it is out for FC21 and FC20.

Yes, the above two are the relevant patches. Feel free to propose them to Fedora maintainer if you like, but I recommend to "just" package qemu 2.2.0 because that will guarantee more fixes and a better infrastructure.

Closing the BZ, not a VDSM bug, general improvements discussed.

Note You need to log in before you can comment on or make changes to this bug.