Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 524702

Summary: kvm_clock patches are slowing guests' shutdown to unusable levels
Product: Red Hat Enterprise Linux 5 Reporter: Gurhan Ozen <gozen>
Component: kernelAssignee: Glauber Costa <gcosta>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.5CC: dhoward, dzickus, gcosta, jburke, jpirko, riel, tburke
Target Milestone: betaKeywords: TestBlocker
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 07:15:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 533192    

Description Gurhan Ozen 2009-09-21 19:43:18 UTC
Description of problem:
  This was brought to my attention when kvm guests' reboot operations failed in the rhts environment. A closer look into the problem revealed that the guests' shutdown process were taking unusually long periods of time on -165 and -166 kernels. Running the guests on -164 kernel process worked just fine. Furthermore, -164.2.1 kernel had the same very issue of long shutdown process. 

We have also tried clock=pmtimer argument on -166 kernel and when booted with clock=pmtimer argument, -166 kernel was able to shutdown in a reasonably short amount of time. 

It looks like the the difference between -164 and the rest of the kernels are the kvmclock patches. 


Version-Release number of selected component (if applicable):
kernel-2.6.18-164.2.1.el5 and above. 

How reproducible:
Every time. Not sure how much the host hardware matters for this, but this is what the host was: 
http://lab.rhts.bos.redhat.com/cgi-bin/rhts/system.cgi?id=1113

Steps to Reproduce:
1.Install a kvm guest from a tree that was 
kernel-2.6.18-164.2.1.el5 or newer kernel, or just upgrade the kernel in the guest after installing with an older kernel. Boot with it and try to reboot the guest.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Rik van Riel 2009-09-21 21:36:09 UTC
I see that the cpuflags in that system have constant_tsc, but not tsc_reliable or nonstop_tsc (which my system both have).  I don't know whether that's the cause, but it could be related.

Comment 2 Glauber Costa 2009-09-22 13:48:39 UTC
They do have nonstop_tsc, I've just checked the host.

But not tsc_reliable. To be quite honest, I don't have the slightest idea of what does means...

Situation is as follows:

 * As expected, it works fine if the guest is UP.
 * Time drifts in the guest. So this is probably the cause.

Comment 3 Glauber Costa 2009-09-22 14:00:59 UTC
More to add:

 * disabling cpufreq on the host does not help at all. drift is still present.
 * turning off ntp on the guest does not help either.
 * offlining all host cpus but 2 (guest is smp=2) seem to help the issue.

giving the last result, I believe it might be related to boucing of vcpus on pcpus. I will try pinning cpus, and report back

Comment 5 RHEL Program Management 2009-10-15 14:49:38 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Bill Burns 2009-12-22 02:00:09 UTC
We think this is a duplicate, can you re-test with -182. It should be ok there.

Comment 7 Bill Burns 2009-12-22 14:20:49 UTC
Possible dup of bug 542612.

Comment 8 Gurhan Ozen 2009-12-23 20:41:20 UTC
So I tried this with both 5.5 nightly trees that had -182 already and by installing 5.4 host/guests and upgrading their kernels to -183 and in both cases the issue is fixed.

Comment 10 errata-xmlrpc 2010-03-30 07:15:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html