Bug 710609

Summary: Kernel trace on m2.4xlarge or m2.2xlarge instances in EC2
Product: Red Hat Enterprise Linux 6 Reporter: Ken Reilly <kreilly>
Component: kernelAssignee: Frantisek Hrbata <fhrbata>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.1CC: behoward, bsarathy, clalance, cmorgan, dhoward, drjones, dtian, fhrbata, jgregusk, kzhang, leiwang, lersek, mhideo, mmcallis, mzywusko, pbonzini, pm-eus, qwan, sforsber, sghosh, tburke, tcapek, whayutin, yugzhang
Target Milestone: rcKeywords: EC2, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: kernel-2.6.32-131.3.1.el6 Doc Type: Bug Fix
Doc Text:
Xen guests cannot make use of all CPU features, and in some cases they are even risky to be advertised. One such feature is CONSTANT_TSC. This feature prevents the TSC (Time Stamp Counter) from being marked as unstable, which allows the sched_clock_stable option to be enabled. Having the sched_clock_stable option enabled is problematic for Xen PV guests because the sched_clock() function has been overridden with the xen_sched_clock() function, which is not synchronized between virtual CPUs. This update provides a patch, which sets all x86_power features to 0 as a preventive measure against other potentially dangerous assumptions the kernel could make based on the features, fixing this issue.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-15 12:09:39 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 709856    
Bug Blocks:    
Description Flags
tier1 & tier2 kernel qe tests none

Description Ken Reilly 2011-06-03 16:32:51 EDT
This bug has been copied from bug #709856 and has been proposed
to be backported to 6.1 z-stream (EUS).
Comment 5 Chris Morgan 2011-06-09 09:18:45 EDT
So is this an ack from kernel QE?
Comment 6 Qixiang Wan 2011-06-09 09:29:42 EDT
(In reply to comment #5)
> So is this an ack from kernel QE?

no, this is from Virt QE, we performed these tests from xen userspace with RHEL6.1 guest to avoid regression. Kernel QE are updating the kernel Tier1/2 test results in https://errata.devel.redhat.com/errata/show/11253 . Both the sides should get pass before verify this bug.
Comment 10 wes hayutin 2011-06-13 22:34:49 EDT
Have any tier1, tier2 tests been run against the new kernel?  Are there any beaker tests that may test the specific issue of the guest crashing?

If there are any I will run them in the ec2 env.  Thanks
Comment 11 Dayong Tian 2011-06-13 22:49:18 EDT
Kernel Tier1 tests passed, Tier2 tests was still running.
We chose some specific tests for the bug in Tier2 tests, following tests were included:
Comment 12 Igor Zhang 2011-06-13 23:36:30 EDT
I ever reproduced the bug in-house twice.
RHEL5.3 and kernel 2.6.18-128.1.10.el5

RHEL6.1 and kernel 2.6.32-131.3.1.el6

See the log rhel6u1_x86_64_pv_install.log in

And rhel6u1_i386_pv_install.log in

At the same time, we found user-space packages xen and xen-libs in RHEL5.3 don't support RHEL6.1 installation as a guest. Then I retested under another configuration:
RHEL5.6 and kernel 2.6.18-238.12.1.el5

RHEL6.1 and kernel 2.6.32-131.4.1.el6

Now the jobs on architectures Intel Nehalem and Intel system without nonstop_tsc flag are still queuing. The finished ones have passed our regression tests. For instance:
Comment 13 Andrew Jones 2011-06-14 05:07:18 EDT
(In reply to comment #12)
> At the same time, we found user-space packages xen and xen-libs in RHEL5.3
> don't support RHEL6.1 installation as a guest.

The best config for testing would be 5.3 kernel-xen and 5.6/7 xen userspace, and 2.6.32-131.4.1.el6 for the guest kernel.
Comment 16 errata-xmlrpc 2011-06-15 12:09:39 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Comment 17 wes hayutin 2011-06-15 12:46:13 EDT
Created attachment 504907 [details]
tier1 & tier2 kernel qe tests

tier1 & tier2 kernel qe tests
all pass 

executed in ec2 us-east-1 w/ m2.2xlarge