Description of problem:
When migrating a RHEL6 domU under Xen (3.4.3) from a server with different CPU MHz frequencies -- from one with a higher MHz to a lower MHz -- the virtual machine migrates and then becomes unresponsive. Console to the VM, network traffic to the VM, etc yields no response.
However, eventually (perhaps 5-10 minutes later), the VM finally wakes up and resumes responding. A kernel message is printed like so:
Clocksource tsc unstable (delta = -214281714 ns)
Whats interesting is migration of the virtual machine from a server with a *lower* CPU MHz speed to one with higher does not encounter this problem. Its only when moving from high to low. It does not matter if the source/destination CPU is an older or newer model, as I can replicate the problem with the following:
* Migrating from X5450 @ 3.00GHz to X5355 @ 2.66GHz fails, but the
opposite (increasing in CPU frequency) succeeds.
* Migrating from Xeon(TM) CPU 2.80GHz to E5310 @ 1.60GHz fails, but
the opposite (increasing in CPU frequency) succeeds.
This is using the latest native/stock RHEL6 kernel (vmlinuz-2.6.32-71.7.1.el6.x86_64) that makes use of paravirt_ops for virtualization under Xen 3.4.3. I have heard reports of this also being the case under Xen Cloud Platform (XCP) 1.0, and thus I'm assuming the problem will occur with XenServer as well.
I have tested features of Xen like cpuid masking to make the virtual machine believe its has a generic i686 CPU, but the problem still persisted.
Version-Release number of selected component (if applicable):
* Xen 3.4.3
* Red Hat Enterprise Linux 6.0
* Kernel 2.6.32-71.7.1.el6.x86_64
Easily reproducible. Have confirmed reports of this occurring when running RHEL6 and its native kernel as a virtual machine under Xen Cloud Platform (XCP) 1.0 too.
Steps to Reproduce:
1. Boot RHEL6 virtual machine, using RHEL6's kernel, under Xen 3.4.x.
2. Migrate (via xm migrate) the virtual machine to a server with a lower CPU clock speed.
3. Connect to the VM's console (xm console) and see it does not respond.
4. Wait 5-10 minutes for the virtual machine to finally wake up; console begins to respond.
* Freeze for 5-10 minutes.
* Continue working...
The console message about the tsc clocksource is interesting, as the domU is making use of the 'xen' clocksource. This can be seen in /sys/devices/system/clocksource/clocksource0/current_clocksource and the Xen clocksource appears as loaded in 'dmesg' output and in /proc/timer_list.
For testing, I tried switching the clocksource to options like 'tsc' and 'jiffies' but the problem persisted.
It would be interesting to see if you can scale down the faster processor using
/sys/devices/system/cpu/cpu0/cpufreq on dom0, and then succeed in a migration
from the normally faster (but now slower) machine to the normally slower (but now relatively faster) one. That would confirm we're really looking at a cpu frequency difference issue.
(In reply to comment #2)
> It would be interesting to see if you can scale down the faster processor using
> /sys/devices/system/cpu/cpu0/cpufreq on dom0, and then succeed in a migration
> from the normally faster (but now slower) machine to the normally slower (but
> now relatively faster) one. That would confirm we're really looking at a cpu
> frequency difference issue.
Hmm I don't believe I have CPU frequency scaling capabilities built into my dom0 kernel (XenLinux 18.104.22.168). Is there anything else I can try?
Actually it looks like I can't even compile in / enable CPU frequency scaling in 22.214.171.124 dom0 kernels :-/
The option doesn't seem to be there in XCP 1.0 Beta either...
Drat, too bad we can't do that quick experiment. I'll try and find some machines to test with.
Ok. If you want to try on my machines too, you're more then welcome.
Just a reminder, this is using Xen 3.4.3 and also reports (Stephen Gelman above) of this issue on XCP 1.0 beta. I've heard there are other problems when doing live migration across on the stock RHEL5 Xen but in that case I don't know if thats just migration in general or when going from faster to slower CPU's like my issue.
Thanks for your help Andrew.
Live migration doesn't work with the same CPU as well. Same symptoms as Josh has explained.
(In reply to comment #8)
> Live migration doesn't work with the same CPU as well. Same symptoms as Josh
> has explained.
What version of Xen are you using? XCP (1.0 or 0.5?), Xen 3.4.x, Xen 4.0.x, or Xen that comes with RHEL5?
Xen version: xen-3.0.3-105.el5_5.5
OS: RHEL 5.5 64 bit
Redhat6 DomU kernel: 2.6.32-71.7.1.el6.x86_64
* Start the domU node in node1. No problem
* Live migrate to Node2 (same cpu, 100% same hardware). domU
unresponsive, no output in xm console, can ping ,can see ssh banner
but dead slow
* Migrate back in node1, No problem
My bug# 658720 goes the other way -- I can migrate from newer/faster CPUs to older/slower, but not from slower to faster.
However, since I am running RHEL5's Xen and DomU, I have the option of frequency scaling. I'll try that.
I'll also see about building kernel 126.96.36.199 from source. I don't think I've done that since 1999...
Both workaround/diagnostic attempts failed.
CPU-scaling both of my RHEL 5.5 Xen hosts to 2.261MHz does not change the situation. I can live-migrate from the X5560 to the L5520 but not vice versa.
kernel 188.8.131.52 crashes (make oldconfig && make binrpm based on /usr/src/kernels/2.6.32-71.7.1.el6.x86_64/.config, hitting Enter to accept default answer for all new options). I can provide my kernel and/or the full backtrace of the "BUG: unable to handle kernel NULL pointer dereference at (null)" if anyone would find them interesting, but I wouldn't think you would.
I noticed this unanswered problem reported with RHEL6 guest on an Ubuntu Dom0: http://forums.citrix.com/thread.jspa?threadID=277696&tstart=105&messageID=1514226
This bug and the similar ones suggest to me that the MHz differences are a red herring. I would try building the latest 2.6.37 rc. The comments at the Citrix forum suggest the 2.6.32 kernels don't work; however, maybe you could be more lucky using Jeremy Fitzhardinge's repository at git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git ("git checkout origin/xen/stable-2.6.32.x" will put you on the correct branch).
After we found one kernel that passes, we can start bisecting the point where the failure was fixed.
In the meanwhile, I suggest attaching the output of /proc/cpuinfo and "x86info -r" for both hosts.
Some comments from Ian Campbell:
"Looking at kernel/time/clocksource.c it seems as if the clocksource watchdog checks every possible clocksource, not just the active one, for stability.
Hence the tsc message is probably a red-herring.
(not all clocksources, but all those with CLOCK_SOURCE_MUST_VERIFY set, which == TSC AFAICT)."
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
This request was erroneously denied for the current release of Red Hat
Enterprise Linux. The error has been fixed and this request has been
re-proposed for the current release.
I was able to find a couple machines where I am able to migrate one direction, but not the other. save/restore works on both. After trying to build kernels with select patches and actually getting worse results, i.e. not able to migrate any direction), I have just tried 2.6.37-2.fc15.x86_64 from Fedora rawhide. This kernel migrates in both directions. Now we need to figure out what patches allow for it to work.
It might be helpful to also try for example kernel.org 184.108.40.206 and see if that works or not..
kernel-2.6.37-2.fc15.x86_64.rpm appears to support live-migrate among my Nehalemv2 X5680 and Nehalem L5520, in both directions. This is an improvement over stock RHEL6, which in my experience can only migrate "up."
It also supports booting from an old Core Duo 5030 and live-migrating "up" to the newer CPUs.
HOWEVER, I am unable to migrate "down" from the X5680 or L5520 to the very old 5030. RHEL4 and RHEL5 guests are fine.
In a limited number of unscientific trials, I got multiple user-mode crashes, but a usable system, if the VM was first booted under the 5030; and a tight CPU loop responding only to xm destroy if the VM was first booted on the X5680.
user-mode crash of a logged-on-console:
[root@xen0 ~]# xm console rhel6
[ 229.781014] PM: early resume of devices complete after 0.169 msecs
[ 229.805827] PM: resume of devices complete after 22.665 msecs
[ 229.811528] Setting capacity to 104857600
[ 229.811756] Setting capacity to 104857600
x[ 245.394533] login trap invalid opcode ip:3d92721490 sp:7fff2ef4f9f8 error:0 in libc-2.12.so[3d92600000+175000]
[ 245.523438] console-kit-dae trap invalid opcode ip:3d927211cb sp:7fffc9c98c38 error:0 in libc-2.12.so[3d92600000+175000]
system very unhappy, unresponsive:
[root@xen0 ~]# xm console rhel6
[12603581.791889] PM: early resume of devices complete after 0.094 msecs
[12603581.792165] rsyslogd trap invalid opcode ip:3d9272052f sp:7f1e67702d38 error:0 in libc-2.12.so[3d92600000+175000]
[12603581.792489] auditd trap invalid opcode ip:7f40beb7f578 sp:7fffffaa4ce8 error:0 in libc-2.12.so[7f40bea5f000+175000]
[12603581.792807] invalid opcode: 0000 [#1] SMP
[12603581.792857] last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
[12603581.792904] CPU 0
[12603581.792924] Modules linked in: ip6_tables ipv6 joydev xen_netfront xen_blkfront [last unloaded: scsi_wait_scan]
[12603581.793063] Pid: 1114, comm: kworker/u:1 Not tainted 2.6.37-2.fc15.x86_64 #1 /
[12603581.793117] RIP: 0010:[<ffffffff81234a81>] [<ffffffff81234a81>] hweight_long+0x1/0xb
[12603581.793187] RSP: 001b:ffff88001f5d3e80 EFLAGS: 00010003
[12603581.793227] RAX: 0000000000000004 RBX: ffffffff8160ba50 RCX: 0000000000000040
[12603581.793282] RDX: ffffffff8160ba50 RSI: 0000000000000100 RDI: ffffffffffffffff
[12603581.793335] RBP: ffff88001f5d3ec0 R08: 0000000000000008 R09: 0000000000000001
[12603581.793389] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[12603581.794455] R13: 0000000000000100 R14: 0000000000000004 R15: 0000000000000001
[12603581.795466] FS: 00007f40bf89e700(0000) GS:ffff88003e799000(0000) knlGS:0000000000000000
[12603581.796647] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[12603581.797796] CR2: 0000000000000000 CR3: 0000000003b87000 CR4: 0000000000002660
[12603581.798918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[12603581.800014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
[12603581.801189] Process kworker/u:1 (pid: 1114, threadinfo ffff88001f5d2000, task ffff880037f75c80)
[12603581.804203] ffff88001f5d3ec0 ffffffff81234ac9 ffffffff8160ba50 ffff880037f75c80
[12603581.805271] ffffffff8160ba50 ffff88001f5d3ef8 ffff88003e7ac780 0000000000000000
[12603581.806332] ffff88001f5d3ed0 ffffffff8103efed ffff88001f5d3f20 ffffffff81043027
[12603581.807410] Call Trace:
[12603581.808543] [<ffffffff81234ac9>] ? __bitmap_weight+0x3e/0x8a
[12603581.809645] [<ffffffff8103efed>] cpumask_weight+0x13/0x15
[12603581.810689] [<ffffffff81043027>] set_cpus_allowed_ptr+0xcc/0x17c
[12603581.811753] [<ffffffff81042e23>] ? mmdrop+0x1a/0x2a
[12603581.812925] [<ffffffff810641d2>] ____call_usermodehelper+0x5c/0x93
[12603581.814040] [<ffffffff8100bae4>] kernel_thread_helper+0x4/0x10
[12603581.815108] [<ffffffff8100aee3>] ? int_ret_from_sys_call+0x7/0x1b
[12603581.816158] [<ffffffff81477e5d>] ? retint_restore_args+0x5/0x6
[12603581.817179] [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
[12603581.818167] Code: 63 d0 ff cb e8 61 4f ff ff 44 39 e0 0f 9c c2 85 db 7e 04 84 d2 75 e0 85 db 75 04 84 d2 75 02 31 c0 5b 41 5c 41 5d 41 5e c9 c3 55 <f3> 48 0f b8 c7 48 89 e5 c9 c3 55 89 f0 b9 40 00 00 00 99 48 89
[12603581.821555] RIP [<ffffffff81234a81>] hweight_long+0x1/0xb
[12603581.929363] RSP <ffff88001f5d3e80>
[12603581.930389] ---[ end trace 453a07e62e647743 ]---
This appears to be the patch that fixes the issue. I've applied it to a clean rhel6 build and it fixed the migration issues I saw with the machines I found. We'll get it pulled into RHEL.
Author: Jeremy Fitzhardinge <firstname.lastname@example.org>
Date: Mon Oct 25 16:53:46 2010 -0700
x86/pvclock: Zero last_value on resume
If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.
Signed-off-by: Jeremy Fitzhardinge <email@example.com>
Signed-off-by: Ingo Molnar <firstname.lastname@example.org>
Maybe you're seeing a different issue with the F15 kernel on the 5030. Were you able to run a rhel6 kernel on there? If so, then testing a rhel6 kernel with the above patch would be a good test to see if all problems go away. If you need me to supply an rpm, then I'll make one tomorrow.
> Maybe you're seeing a different issue with the F15 kernel on the 5030.
> Were you able to run a rhel6 kernel on there?
Yes. With the stock rhel6 kernel, I was able to migrate from a X5680, L5520, or 5160 to the 5030, but not from the 5030 to anything newer. Quite likely, this breakage is something else introduced in F15. Quite certainly, the number of sites sharing production workloads among 4.5-month-old and 4.5-year-old servers is small.
> If you need me to supply an rpm, then I'll make one tomorrow.
"Need" is an exaggeration, but given the nontrivial compile time, I glady accept.
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update release.
*** Bug 663881 has been marked as a duplicate of this bug. ***
*** Bug 613513 has been marked as a duplicate of this bug. ***
(In reply to comment #20)
If 2.6.32-95.el6fixmigrate.x86_64 is originally booted on the 5030, then I can live-migrate at will among the various CPU generations. I made 9 circuits of 3 hosts without incident. Opening xm console often pops up messages like "Clocksource tsc unstable (delta = -160004294 ns)," but I think that's just a noticce, right?
If 2.6.32-95.el6fixmigrate.x86_64 is originally booted on the L5520 or X5680, then I can ping-pong between the L5520 and X55680, but got a crash when migrating to the 5030. I noticed "abrtd: Crash is in database already (dup of /var/spool/abrt/ccpp-1294688849-1378)" -- the first crash coming under the fc15 kernel. Is that directory of interest? It probably contains nothing sensitive, but I'd rather send it privately.
(In reply to comment #27)
> "Clocksource tsc unstable (delta = -160004294 ns)," but I think that's just a
> noticce, right?
Hmm... I'd rather the message wasn't there (I don't see it on my machines), but it's true that it shouldn't matter, as xen guests aren't using the tsc clock. I don't believe there is anything that can be done about it, other than manipulating the kernel to not watchdog the tsc, but it's not worth it to just eliminate the message.
> If 2.6.32-95.el6fixmigrate.x86_64 is originally booted on the L5520 or X5680,
> then I can ping-pong between the L5520 and X55680, but got a crash when
> migrating to the 5030. I noticed "abrtd: Crash is in database already (dup of
> /var/spool/abrt/ccpp-1294688849-1378)" -- the first crash coming under the fc15
> kernel. Is that directory of interest?
The crash would certainly be of interest. It's also interesting that abrt thinks it already exists even though the first time was from the fc15 kernel and now it's from a rhel6 kernel. The heuristics must not consider the addresses, only the symbols? Anyway, can you get a backtrace after the crash, either from a core or from the xenctx util? We should open a new bug with that bactrace though, as the patch posted for this bug seems to generally fix the migration issues.
To use xenctx to get the backtrace for each vcpu do the following. First change the guest config to "preserve" for 'on_crash', then run
/usr/lib64/xen/bin/xenctx -s System.map-2.6.32-95.el6fixmigrate.x86_64 <domid> 0
/usr/lib64/xen/bin/xenctx -s System.map-2.6.32-95.el6fixmigrate.x86_64 <domid> 1
/usr/lib64/xen/bin/xenctx -s System.map-2.6.32-95.el6fixmigrate.x86_64 <domid> <vcpu-number...>
I'll create another bug on the down-migrate-to-5030 case. Not very important to me since that machine is slated for retirement anyway.
So this can be closed as dupe, or whatever state is most a appropriate until engineering & product & QA agree to release as ERRATA.
Aside: I believe Josh is going to create another bug noting that rhel6 guests lose time. Mine was 3-4 seconds behind at boot, and after 3 days the lag has grown to 8 seconds. This cannot be worked around with ntpd because /proc/sys/xen/independent_wallcklock is not available.
The patched kernel you provided works great for Xen domU migration now! I'm looking forward to it being included as an errata update.
But yes as Rich stated, the domU's are losing time. One of my VM's has been up for 3 days and is behind 10 seconds from dom0. Any thoughts on what would cause domU's to lose time or ideas for a fix? Does this need a separate bug report?
Yes, please open a new bug for the time loss. When you open it, please give as many details you know, such as if it just started showing up with the pvclock patch that fixed migration, or if you've also seen it on older kernels?
*** Bug 658720 has been marked as a duplicate of this bug. ***
Patch(es) available on kernel-2.6.32-112.el6
(In reply to comment #33)
> Patch(es) available on kernel-2.6.32-112.el6
When will this patch be available?
(In reply to comment #36)
> (In reply to comment #33)
> > Patch(es) available on kernel-2.6.32-112.el6
> When will this patch be available?
I pointed to the upstream patch in comment 20, which has been available since Oct 25th. It will certainly be in the RHEL-6.1 kernel when it's released. If you feel you need it in 6.0.z, then you can make arrangements with your Red Hat representative.
I have reproduced this on -71 kernel migrating from Intel W3520 @ 2.67GHz to E5504 @ 2.00GHz.
Host config is: kernel-xen-2.6.18-255.el5, xen-3.0.3-127.el5.
With kernel -71 the RHEL 6 domU stopped responding after the migration was complete. Could not ping, xm console, or vnc to it.
With kernel -128 the domU prints a line of message to the serial console:
Clocksource tsc unstable (delta = -125940234 ns)
it continued to work though. Then this could be put in VERIFIED.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.