663755 – RHEL6 Xen domU freeze after migrate to lower (MHz) CPU

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 663755 - RHEL6 Xen domU freeze after migrate to lower (MHz) CPU

Summary: RHEL6 Xen domU freeze after migrate to lower (MHz) CPU

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Andrew Jones
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	613513 658720 663881 (view as bug list)
Depends On:
Blocks:	523117
TreeView+	depends on / blocked

Reported:	2010-12-16 18:50 UTC by Josh West
Modified:	2018-11-14 16:22 UTC (History)
CC List:	16 users (show)
Fixed In Version:	kernel-2.6.32-112.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-05-23 20:32:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0542	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update	2011-05-19 11:58:07 UTC

Description Josh West 2010-12-16 18:50:44 UTC

Description of problem:

When migrating a RHEL6 domU under Xen (3.4.3) from a server with different CPU MHz frequencies -- from one with a higher MHz to a lower MHz -- the virtual machine migrates and then becomes unresponsive. Console to the VM, network traffic to the VM, etc yields no response.

However, eventually (perhaps 5-10 minutes later), the VM finally wakes up and resumes responding. A kernel message is printed like so:

Clocksource tsc unstable (delta = -214281714 ns)

Whats interesting is migration of the virtual machine from a server with a *lower* CPU MHz speed to one with higher does not encounter this problem. Its only when moving from high to low. It does not matter if the source/destination CPU is an older or newer model, as I can replicate the problem with the following:

* Migrating from X5450 @ 3.00GHz to X5355 @ 2.66GHz fails, but the
opposite (increasing in CPU frequency) succeeds.
* Migrating from Xeon(TM) CPU 2.80GHz to E5310 @ 1.60GHz fails, but
the opposite (increasing in CPU frequency) succeeds.

This is using the latest native/stock RHEL6 kernel (vmlinuz-2.6.32-71.7.1.el6.x86_64) that makes use of paravirt_ops for virtualization under Xen 3.4.3. I have heard reports of this also being the case under Xen Cloud Platform (XCP) 1.0, and thus I'm assuming the problem will occur with XenServer as well.

I have tested features of Xen like cpuid masking to make the virtual machine believe its has a generic i686 CPU, but the problem still persisted.

Version-Release number of selected component (if applicable):

* Xen 3.4.3
* Red Hat Enterprise Linux 6.0
* Kernel 2.6.32-71.7.1.el6.x86_64

How reproducible:

Easily reproducible. Have confirmed reports of this occurring when running RHEL6 and its native kernel as a virtual machine under Xen Cloud Platform (XCP) 1.0 too.

Steps to Reproduce:
1. Boot RHEL6 virtual machine, using RHEL6's kernel, under Xen 3.4.x.
2. Migrate (via xm migrate) the virtual machine to a server with a lower CPU clock speed.
3. Connect to the VM's console (xm console) and see it does not respond.
4. Wait 5-10 minutes for the virtual machine to finally wake up; console begins to respond.

Actual results:
* Freeze for 5-10 minutes.

Expected results:
* Continue working...

Additional info:

The console message about the tsc clocksource is interesting, as the domU is making use of the 'xen' clocksource. This can be seen in /sys/devices/system/clocksource/clocksource0/current_clocksource and the Xen clocksource appears as loaded in 'dmesg' output and in /proc/timer_list.

For testing, I tried switching the clocksource to options like 'tsc' and 'jiffies' but the problem persisted.

Comment 2 Andrew Jones 2010-12-16 20:43:02 UTC

It would be interesting to see if you can scale down the faster processor using
/sys/devices/system/cpu/cpu0/cpufreq on dom0, and then succeed in a migration
from the normally faster (but now slower) machine to the normally slower (but now relatively faster) one. That would confirm we're really looking at a cpu frequency difference issue.

Comment 3 Josh West 2010-12-16 20:50:40 UTC

(In reply to comment #2)
> It would be interesting to see if you can scale down the faster processor using
> /sys/devices/system/cpu/cpu0/cpufreq on dom0, and then succeed in a migration
> from the normally faster (but now slower) machine to the normally slower (but
> now relatively faster) one. That would confirm we're really looking at a cpu
> frequency difference issue.

Hmm I don't believe I have CPU frequency scaling capabilities built into my dom0 kernel (XenLinux 2.6.18.8).  Is there anything else I can try?

Comment 4 Josh West 2010-12-16 21:05:59 UTC

Actually it looks like I can't even compile in / enable CPU frequency scaling in 2.6.18.8 dom0 kernels :-/

Comment 5 Stephen Gelman 2010-12-16 21:08:41 UTC

The option doesn't seem to be there in XCP 1.0 Beta either...

Comment 6 Andrew Jones 2010-12-16 23:27:11 UTC

Drat, too bad we can't do that quick experiment. I'll try and find some machines to test with.

Comment 7 Josh West 2010-12-17 21:13:01 UTC

Ok.  If you want to try on my machines too, you're more then welcome.

Just a reminder, this is using Xen 3.4.3 and also reports (Stephen Gelman above) of this issue on XCP 1.0 beta.  I've heard there are other problems when doing live migration across on the stock RHEL5 Xen but in that case I don't know if thats just migration in general or when going from faster to slower CPU's like my issue.

Thanks for your help Andrew.

Comment 8 Paras Pradhan 2010-12-20 20:14:05 UTC

Live migration doesn't work with the same CPU as well. Same symptoms as Josh has explained.


Thanks
Paras.

Comment 9 Josh West 2010-12-20 20:43:30 UTC

(In reply to comment #8)
> Live migration doesn't work with the same CPU as well. Same symptoms as Josh
> has explained.
> 
> 
> Thanks
> Paras.

Hey Paras,

What version of Xen are you using?  XCP (1.0 or 0.5?), Xen 3.4.x, Xen 4.0.x, or Xen that comes with RHEL5?

Thanks!

Comment 10 Paras Pradhan 2010-12-22 15:21:32 UTC

Xen version: xen-3.0.3-105.el5_5.5

Dom0 kernel: 
2.6.18-194.26.1.el5xen  

OS: RHEL 5.5 64 bit 

Redhat6 DomU  kernel: 2.6.32-71.7.1.el6.x86_64

Symptoms:

* Start the domU node in node1. No problem
* Live migrate to Node2 (same cpu, 100% same hardware). domU
unresponsive, no output in xm console, can ping ,can see ssh banner
but dead slow
* Migrate back in node1, No problem


Thanks
Paras.

Comment 11 Rich Graves 2010-12-30 17:11:17 UTC

My bug# 658720 goes the other way -- I can migrate from newer/faster CPUs to older/slower, but not from slower to faster.

However, since I am running RHEL5's Xen and DomU, I have the option of frequency scaling. I'll try that.

I'll also see about building kernel 2.6.32.2 from source. I don't think I've done that since 1999...

Comment 12 Rich Graves 2010-12-30 20:06:26 UTC

Both workaround/diagnostic attempts failed.

CPU-scaling both of my RHEL 5.5 Xen hosts to 2.261MHz does not change the situation. I can live-migrate from the X5560 to the L5520 but not vice versa.

kernel 2.6.32.2 crashes (make oldconfig && make binrpm based on /usr/src/kernels/2.6.32-71.7.1.el6.x86_64/.config, hitting Enter to accept default answer for all new options). I can provide my kernel and/or the full backtrace of the "BUG: unable to handle kernel NULL pointer dereference at (null)" if anyone would find them interesting, but I wouldn't think you would.

I noticed this unanswered problem reported with RHEL6 guest on an Ubuntu Dom0: http://forums.citrix.com/thread.jspa?threadID=277696&tstart=105&messageID=1514226

Comment 13 Paolo Bonzini 2011-01-04 14:22:20 UTC

This bug and the similar ones suggest to me that the MHz differences are a red herring.  I would try building the latest 2.6.37 rc.  The comments at the Citrix forum suggest the 2.6.32 kernels don't work; however, maybe you could be more lucky using Jeremy Fitzhardinge's repository at git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git ("git checkout origin/xen/stable-2.6.32.x" will put you on the correct branch).

After we found one kernel that passes, we can start bisecting the point where the failure was fixed.

---

In the meanwhile, I suggest attaching the output of /proc/cpuinfo and "x86info -r" for both hosts.

Comment 14 Pasi Karkkainen 2011-01-05 10:44:47 UTC

Some comments from Ian Campbell:

"Looking at kernel/time/clocksource.c it seems as if the clocksource watchdog checks every possible clocksource, not just the active one, for stability.

Hence the tsc message is probably a red-herring.

(not all clocksources, but all those with CLOCK_SOURCE_MUST_VERIFY set, which == TSC AFAICT)."

Comment 15 RHEL Program Management 2011-01-07 04:32:35 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 16 Suzanne Logcher 2011-01-07 16:25:19 UTC

This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 17 Andrew Jones 2011-01-10 15:11:48 UTC

I was able to find a couple machines where I am able to migrate one direction, but not the other. save/restore works on both. After trying to build kernels with select patches and actually getting worse results, i.e. not able to migrate any direction), I have just tried 2.6.37-2.fc15.x86_64 from Fedora rawhide. This kernel migrates in both directions. Now we need to figure out what patches allow for it to work.

Comment 18 Pasi Karkkainen 2011-01-10 15:30:03 UTC

Ok.

It might be helpful to also try for example kernel.org 2.6.32.28 and see if that works or not..

Comment 19 Rich Graves 2011-01-10 20:01:02 UTC

kernel-2.6.37-2.fc15.x86_64.rpm appears to support live-migrate among my Nehalemv2 X5680 and Nehalem L5520, in both directions. This is an improvement over stock RHEL6, which in my experience can only migrate "up."

It also supports booting from an old Core Duo 5030 and live-migrating "up" to the newer CPUs.

HOWEVER, I am unable to migrate "down" from the X5680 or L5520 to the very old 5030. RHEL4 and RHEL5 guests are fine.

In a limited number of unscientific trials, I got multiple user-mode crashes, but a usable system, if the VM was first booted under the 5030; and a tight CPU loop responding only to xm destroy if the VM was first booted on the X5680.

user-mode crash of a logged-on-console:

[root@xen0 ~]# xm console rhel6
[  229.781014] PM: early resume of devices complete after 0.169 msecs
[  229.805827] PM: resume of devices complete after 22.665 msecs
[  229.811528] Setting capacity to 104857600
[  229.811756] Setting capacity to 104857600
x[  245.394533] login[1006] trap invalid opcode ip:3d92721490 sp:7fff2ef4f9f8 error:0 in libc-2.12.so[3d92600000+175000]
[  245.523438] console-kit-dae[1029] trap invalid opcode ip:3d927211cb sp:7fffc9c98c38 error:0 in libc-2.12.so[3d92600000+175000]

system very unhappy, unresponsive:

[root@xen0 ~]# xm console rhel6
[12603581.791889] PM: early resume of devices complete after 0.094 msecs
[12603581.792165] rsyslogd[792] trap invalid opcode ip:3d9272052f sp:7f1e67702d38 error:0 in libc-2.12.so[3d92600000+175000]
[12603581.792489] auditd[772] trap invalid opcode ip:7f40beb7f578 sp:7fffffaa4ce8 error:0 in libc-2.12.so[7f40bea5f000+175000]
[12603581.792807] invalid opcode: 0000 [#1] SMP 
[12603581.792857] last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
[12603581.792904] CPU 0 
[12603581.792924] Modules linked in: ip6_tables ipv6 joydev xen_netfront xen_blkfront [last unloaded: scsi_wait_scan]
[12603581.793042] 
[12603581.793063] Pid: 1114, comm: kworker/u:1 Not tainted 2.6.37-2.fc15.x86_64 #1 /
[12603581.793117] RIP: 0010:[<ffffffff81234a81>]  [<ffffffff81234a81>] hweight_long+0x1/0xb
[12603581.793187] RSP: 001b:ffff88001f5d3e80  EFLAGS: 00010003
[12603581.793227] RAX: 0000000000000004 RBX: ffffffff8160ba50 RCX: 0000000000000040
[12603581.793282] RDX: ffffffff8160ba50 RSI: 0000000000000100 RDI: ffffffffffffffff
[12603581.793335] RBP: ffff88001f5d3ec0 R08: 0000000000000008 R09: 0000000000000001
[12603581.793389] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[12603581.794455] R13: 0000000000000100 R14: 0000000000000004 R15: 0000000000000001
[12603581.795466] FS:  00007f40bf89e700(0000) GS:ffff88003e799000(0000) knlGS:0000000000000000
[12603581.796647] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[12603581.797796] CR2: 0000000000000000 CR3: 0000000003b87000 CR4: 0000000000002660
[12603581.798918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[12603581.800014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
[12603581.801189] Process kworker/u:1 (pid: 1114, threadinfo ffff88001f5d2000, task ffff880037f75c80)
[12603581.803197] Stack:
[12603581.804203]  ffff88001f5d3ec0 ffffffff81234ac9 ffffffff8160ba50 ffff880037f75c80
[12603581.805271]  ffffffff8160ba50 ffff88001f5d3ef8 ffff88003e7ac780 0000000000000000
[12603581.806332]  ffff88001f5d3ed0 ffffffff8103efed ffff88001f5d3f20 ffffffff81043027
[12603581.807410] Call Trace:
[12603581.808543]  [<ffffffff81234ac9>] ? __bitmap_weight+0x3e/0x8a
[12603581.809645]  [<ffffffff8103efed>] cpumask_weight+0x13/0x15
[12603581.810689]  [<ffffffff81043027>] set_cpus_allowed_ptr+0xcc/0x17c
[12603581.811753]  [<ffffffff81042e23>] ? mmdrop+0x1a/0x2a
[12603581.812925]  [<ffffffff810641d2>] ____call_usermodehelper+0x5c/0x93
[12603581.814040]  [<ffffffff8100bae4>] kernel_thread_helper+0x4/0x10
[12603581.815108]  [<ffffffff8100aee3>] ? int_ret_from_sys_call+0x7/0x1b
[12603581.816158]  [<ffffffff81477e5d>] ? retint_restore_args+0x5/0x6
[12603581.817179]  [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
[12603581.818167] Code: 63 d0 ff cb e8 61 4f ff ff 44 39 e0 0f 9c c2 85 db 7e 04 84 d2 75 e0 85 db 75 04 84 d2 75 02 31 c0 5b 41 5c 41 5d 41 5e c9 c3 55 <f3> 48 0f b8 c7 48 89 e5 c9 c3 55 89 f0 b9 40 00 00 00 99 48 89 
[12603581.821555] RIP  [<ffffffff81234a81>] hweight_long+0x1/0xb
[12603581.929363]  RSP <ffff88001f5d3e80>
[12603581.930389] ---[ end trace 453a07e62e647743 ]---

Comment 20 Andrew Jones 2011-01-11 19:35:32 UTC

This appears to be the patch that fixes the issue. I've applied it to a clean rhel6 build and it fixed the migration issues I saw with the machines I found. We'll get it pulled into RHEL.

commit e7a3481c0246c8e45e79c629efd63b168e91fcda
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge>
Date:   Mon Oct 25 16:53:46 2010 -0700

    x86/pvclock: Zero last_value on resume
    
    If the guest domain has been suspend/resumed or migrated, then the
    system clock backing the pvclock clocksource may revert to a smaller
    value (ie, can be non-monotonic across the migration/save-restore).
    
    Make sure we zero last_value in that case so that the domain
    continues to see clock updates.
    
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge>
    Signed-off-by: Ingo Molnar <mingo>

Rich,

Maybe you're seeing a different issue with the F15 kernel on the 5030. Were you able to run a rhel6 kernel on there? If so, then testing a rhel6 kernel with the above patch would be a good test to see if all problems go away. If you need me to supply an rpm, then I'll make one tomorrow.

Drew

Comment 21 Rich Graves 2011-01-11 20:03:26 UTC

> Maybe you're seeing a different issue with the F15 kernel on the 5030.
> Were you able to run a rhel6 kernel on there? 

Yes. With the stock rhel6 kernel, I was able to migrate from a X5680, L5520, or 5160 to the 5030, but not from the 5030 to anything newer. Quite likely, this breakage is something else introduced in F15. Quite certainly, the number of sites sharing production workloads among 4.5-month-old and 4.5-year-old servers is small.

> If you need me to supply an rpm, then I'll make one tomorrow.

"Need" is an exaggeration, but given the nontrivial compile time, I glady accept.

Comment 22 Andrew Jones 2011-01-12 15:46:58 UTC

s/need/would like/

http://people.redhat.com/drjones/663755/

you welcome,
Drew

Comment 24 RHEL Program Management 2011-01-12 16:11:49 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 25 Andrew Jones 2011-01-14 07:21:04 UTC

*** Bug 663881 has been marked as a duplicate of this bug. ***

Comment 26 Andrew Jones 2011-01-14 14:28:49 UTC

*** Bug 613513 has been marked as a duplicate of this bug. ***

Comment 27 Rich Graves 2011-01-15 03:36:07 UTC

(In reply to comment #20)

If 2.6.32-95.el6fixmigrate.x86_64 is originally booted on the 5030, then I can live-migrate at will among the various CPU generations. I made 9 circuits of 3 hosts without incident. Opening xm console often pops up messages like "Clocksource tsc unstable (delta = -160004294 ns)," but I think that's just a noticce, right?

If  2.6.32-95.el6fixmigrate.x86_64 is originally booted on the L5520 or X5680, then I can ping-pong between the L5520 and X55680, but got a crash when migrating to the 5030. I noticed "abrtd: Crash is in database already (dup of /var/spool/abrt/ccpp-1294688849-1378)" -- the first crash coming under the fc15 kernel. Is that directory of interest? It probably contains nothing sensitive, but I'd rather send it privately.

Comment 28 Andrew Jones 2011-01-21 15:35:16 UTC

(In reply to comment #27)
> "Clocksource tsc unstable (delta = -160004294 ns)," but I think that's just a
> noticce, right?

Hmm... I'd rather the message wasn't there (I don't see it on my machines), but it's true that it shouldn't matter, as xen guests aren't using the tsc clock. I don't believe there is anything that can be done about it, other than manipulating the kernel to not watchdog the tsc, but it's not worth it to just eliminate the message.

> 
> If  2.6.32-95.el6fixmigrate.x86_64 is originally booted on the L5520 or X5680,
> then I can ping-pong between the L5520 and X55680, but got a crash when
> migrating to the 5030. I noticed "abrtd: Crash is in database already (dup of
> /var/spool/abrt/ccpp-1294688849-1378)" -- the first crash coming under the fc15
> kernel. Is that directory of interest?

The crash would certainly be of interest. It's also interesting that abrt thinks it already exists even though the first time was from the fc15 kernel and now it's from a rhel6 kernel. The heuristics must not consider the addresses, only the symbols? Anyway, can you get a backtrace after the crash, either from a core or from the xenctx util? We should open a new bug with that bactrace though, as the patch posted for this bug seems to generally fix the migration issues. 

To use xenctx to get the backtrace for each vcpu do the following. First change the guest config to "preserve" for 'on_crash', then run

/usr/lib64/xen/bin/xenctx -s System.map-2.6.32-95.el6fixmigrate.x86_64 <domid> 0
/usr/lib64/xen/bin/xenctx -s System.map-2.6.32-95.el6fixmigrate.x86_64 <domid> 1
/usr/lib64/xen/bin/xenctx -s System.map-2.6.32-95.el6fixmigrate.x86_64 <domid> <vcpu-number...>

Comment 29 Rich Graves 2011-01-21 16:11:05 UTC

I'll create another bug on the down-migrate-to-5030 case. Not very important to me since that machine is slated for retirement anyway.

So this can be closed as dupe, or whatever state is most a appropriate until engineering & product & QA agree to release as ERRATA.

Aside: I believe Josh is going to create another bug noting that rhel6 guests lose time. Mine was 3-4 seconds behind at boot, and after 3 days the lag has grown to 8 seconds. This cannot be worked around with ntpd because /proc/sys/xen/independent_wallcklock is not available.

Comment 30 Josh West 2011-01-21 17:23:36 UTC

Hi Andrew,

The patched kernel you provided works great for Xen domU migration now!  I'm looking forward to it being included as an errata update.

But yes as Rich stated, the domU's are losing time.  One of my VM's has been up for 3 days and is behind 10 seconds from dom0.  Any thoughts on what would cause domU's to lose time or ideas for a fix?  Does this need a separate bug report?

Thanks.

Comment 31 Andrew Jones 2011-01-24 07:42:54 UTC

Hi Josh,

Yes, please open a new bug for the time loss. When you open it, please give as many details you know, such as if it just started showing up with the pvclock patch that fixed migration, or if you've also seen it on older kernels?

Thanks,
Drew

Comment 32 Andrew Jones 2011-01-24 07:46:50 UTC

*** Bug 658720 has been marked as a duplicate of this bug. ***

Comment 33 Aristeu Rozanski 2011-02-03 16:17:54 UTC

Patch(es) available on kernel-2.6.32-112.el6

Comment 36 Pål-Kristian Hamre 2011-03-10 10:15:58 UTC

(In reply to comment #33)
> Patch(es) available on kernel-2.6.32-112.el6

When will this patch be available?

Comment 37 Andrew Jones 2011-03-10 11:07:25 UTC

(In reply to comment #36)
> (In reply to comment #33)
> > Patch(es) available on kernel-2.6.32-112.el6
> 
> When will this patch be available?

I pointed to the upstream patch in comment 20, which has been available since Oct 25th. It will certainly be in the RHEL-6.1 kernel when it's released. If you feel you need it in 6.0.z, then you can make arrangements with your Red Hat representative.

Comment 39 Jinxin Zheng 2011-04-08 03:19:50 UTC

I have reproduced this on -71 kernel migrating from Intel W3520 @ 2.67GHz to E5504 @ 2.00GHz.

Host config is: kernel-xen-2.6.18-255.el5, xen-3.0.3-127.el5.

With kernel -71 the RHEL 6 domU stopped responding after the migration was complete. Could not ping, xm console, or vnc to it.

With kernel -128 the domU prints a line of message to the serial console:
Clocksource tsc unstable (delta = -125940234 ns)
it continued to work though. Then this could be put in VERIFIED.

Comment 41 errata-xmlrpc 2011-05-23 20:32:11 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.