Hide Forgot
This bug has been copied from Engineeringbug #586285 and has been proposed to be backported to 5.5 z-stream (EUS).
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: This change fixes forward time drift which has been observed with 64-bit Red Hat Enterprise Linux 5 virtual machines using PM timer based kernel tick accounting while running on KVM or HYPER-V hypervisor. Virtual machines that are booted with the "divider=x" kernel parameter set to a value greater than 1 and that show the following line of text in the kernel boot messages are prone to the problem: time.c: Using 3.579545 MHz WALL PM GTOD PM timer. However, this change also uncovered a bug in the Xen hypervisor, possibly causing backward time drift when this change is applied. If such time drift is observed with fully-virtualized Xen guests satisfying the aforementioned conditions, the kernel parameter "pmtimer_fine_grained=0" can be used as a temporary workaround until the hypervisor is updated. Fixing the hypervisor for Red Hat Enterprise Linux 5 hosts is being tracked by EngineeringBZ #633028.
in kernel 2.6.18-194.21.1.el5 linux-2.6-time-implement-fine-grained-accounting-for-pm-timer.patch linux-2.6-time-initialize-tick_nsec-based-on-kernel-parameters.patch linux-2.6-time-introduce-pmtimer_fine_grained-kernel-parameter.patch
Reproduce this bug with 2.6.18-194.el5, and verify this bug with 2.6.18-194.24.1.el5 as following result. mode| 2.6.18-194.el5 |AMD:2.6.18-194.24.1.el5 | intel:2.6.18-194.24.1.el5 -------------+-----------------------------+-------------------------+--------- TSC |falls ~3s in 2 mins | 1s advance seen in 2min |no drift seen in 2min | | | 6s advance seen in 20min|no drift seen in 20min | | |22s advance seen in 80min|1s advance seenin 80min | -------------+-----------------------------+-------------------------+-------- PMTMR|falls ~4s in 2 mins | no drift seen in 2 min |no drift seen in 2 min | | | no drift seen in 20 min |no drift seen in 20 min | | | no drift seen in 80 min |no drift seen in 80 min | -------------+-----------------------------+-------------------------+-------- AMD host: processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : AMD Phenom(tm) 9600B Quad-Core Processor stepping : 3 cpu MHz : 1200.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 4587.45 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] Intel host: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz stepping : 10 cpu MHz : 2660.001 cache size : 3072 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm bogomips : 5320.14 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Jiri, has this bug been fixed from above test result?
Chris, could you please reply to comment #6 ? Thanks!
changing needinfo to the person who actually did the patch
According to comment 11, retest this bug as following: Test config: CLI:/usr/libexec/qemu-kvm -M pc -m 2048 -smp 2 -name rhel55-64-amd -monitor stdio -drive file=/root/RHEL5.5-64.img,if=virtio,boot=on -net nic,macaddr=52:00:12:31:4A:16,vlan=0 -net tap,scprit=/etc/qemu-ifup,vlan=0 -serial pty -parallel none -usb -usbdevice tablet -vnc :1 -boot c # cat /proc/cmdline ro root=/dev/VolGroup00/LogVol00 rhgb quiet divider=10 clock=pmtimer single # dmesg|grep time.c time.c: Using 3.579545 MHz WALL PM GTOD PM timer. time.c: Detected 2293.700 MHz processor. 1. reproduce this bug on kernel-2.6.18-194.17.4.el5 on AMD host as: crash> pd tick_divider tick_divider = $2 = 10 crash> pd tick_nsec tick_nsec = $3 = 999848 crash> pd pmtimer_fine_grained No symbol "pmtimer_fine_grained" in current context. p: gdb request failed: p pmtimer_fine_grained Before run qdly script on host: crash> sym lost_count symbol not found: lost_count possible alternatives: ffffffff804500a8 (b) lost_count.21981 crash> rd -d ffffffff804500a8 ffffffff804500a8: 4 After run qdly script for 15 mins: crash> rd -d ffffffff804500a8 ffffffff804500a8: 7062 crash> pd monotonic_base/1000000000 $9 = 1046 crash> pd xtime.tv_sec+wall_to_monotonic.tv_sec $8 = 1115 crash> pd 1115/60 $3 = 18 Note: guest time is 1 minute and 8 secs ahead of host time xtime is 1 minutes and 9 seconds ahead of monotonic_base # uptime 04:14:29 up 18 min, 1 user, load average: 0.08, 0.03, 0.01 2. Verify this bug with kernel-2.6.18-194.24.1.el5 AMD and Intel hosts crash> pd pmtimer_fine_grained pmtimer_fine_grained = $2 = 1 AMD: Before: crash> sym lost_count symbol not found: lost_count possible alternatives: ffffffff804500a8 (b) lost_count.21984 crash> rd -d ffffffff804500a8 ffffffff804500a8: 961 After 15 mins crash> pd monotonic_base/1000000000 $14 = 1214 crash> pd xtime.tv_sec+wall_to_monotonic.tv_sec $15 = 1214 crash> pd 1214/60 $2 = 20 [root@localhost ~]# uptime 05:08:12 up 20 min, 1 user, load average: 0.00, 0.02, 0.06 Note: no visible difference between guest time and KVM host time xtime is consistent with monotonic_base After 30mins. [root@dhcp-91-149 ~]# ./qdly_new 17000 Mon Nov 8 04:55:10 EST 2010 delaying qemu PID 19181 by 17000 microseconds press control C to terminate the test Mon Nov 8 05:25:10 EST 2010 crash> rd -d ffffffff804500a8 ffffffff804500a8: 14305 crash> pd monotonic_base/1000000000 $9 = 2261 crash> pd xtime.tv_sec+wall_to_monotonic.tv_sec $10 = 2261 crash> pd 2261/60 $11 = 37 [root@localhost ~]# uptime 05:25:48 up 37 min, 1 user, load average: 0.00, 0.00, 0.00 Note: no visible difference between guest time and KVM host time xtime is consistent with monotonic_base Intel: Before: crash> sym lost_count symbol not found: lost_count possible alternatives: ffffffff804500a8 (b) lost_count.21984 crash> rd -d ffffffff804500a8 ffffffff804500a8: 618 After 15mins crash> rd -d ffffffff804500a8 ffffffff804500a8: 6871 crash> pd monotonic_base/1000000000 $12 = 1182 crash> pd xtime.tv_sec+wall_to_monotonic.tv_sec $13 = 1182 crash> pd 1182/60 $14 = 19 crash> q [root@localhost ~]# uptime 05:44:31 up 19 min, 1 user, load average: 0.00, 0.00, 0.0 Note: no visible difference between guest time and KVM host time xtime is consistent with monotonic_base After 30mins crash> sym lost_count symbol not found: lost_count possible alternatives: ffffffff804500a8 (b) lost_count.21984 crash> rd -d ffffffff804500a8 ffffffff804500a8: 13786 crash> pd xtime.tv_sec+wall_to_monotonic.tv_sec $6 = 2077 crash> pd monotonic_base/1000000000 $7 = 2077 crash> pd 2077/60 $8 = 34 [root@localhost ~]# uptime 05:59:30 up 34 min, 1 user, load average: 0.05, 0.01, 0.00 Note: no visible difference between guest time and KVM host time xtime is consistent with monotonic_base So this bug has been fixed.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0839.html
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,9 +1,5 @@ -This change fixes forward time drift which has been observed with 64-bit Red Hat Enterprise Linux 5 virtual machines using PM timer based kernel tick accounting while running on KVM or HYPER-V hypervisor. +Previously, a forward time drift was observed on 64-bit Red Hat Enterprise Linux 5 virtual guests which were using a PM timer based kernel tick accounting and running on KVM or HYPER-V hypervisor. Virtual guests that were booted with the divider=x kernel parameter set to a value greater than 1 and that showed the following line of text in the kernel boot messages were the subject of the aforementioned behavior: -Virtual machines that are booted with the "divider=x" kernel parameter set to a value greater than 1 and that show the following line of text in the kernel boot messages are prone to the problem: +time.c: Using 3.579545 MHz WALL PM GTOD PM timer - time.c: Using 3.579545 MHz WALL PM GTOD PM timer. +However, this flaw also uncovered a bug in the Xen hypervisor, possibly causing backward time drift. With this update, the fine grained accounting for the PM timer is introduced which eliminates the time difference issues.- -However, this change also uncovered a bug in the Xen hypervisor, possibly causing backward time drift when this change is applied. If such time drift is observed with fully-virtualized Xen guests satisfying the aforementioned conditions, the kernel parameter "pmtimer_fine_grained=0" can be used as a temporary workaround until the hypervisor is updated. - -Fixing the hypervisor for Red Hat Enterprise Linux 5 hosts is being tracked by EngineeringBZ #633028.
> However, this flaw also uncovered a bug in the Xen hypervisor, possibly > causing backward time drift. With this update, the fine grained accounting for > the PM timer is introduced which eliminates the time difference issues. This adjustment to the technical notes is incomplete. It doesn't specify how to avoid the bug in the Xen hypervisor. I suggest something like this: With this update, fine grained accounting for the PM timer is introduced which eliminates the time drift issues. However, this fix also uncovered a bug in the Xen hypervisor, which can in turn cause backward time drift. If Xen HVM guests are using the PM timer, it is suggested that the host uses the kernel-xen-2.6.18-194.21.1.el5 package or a newer version.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -2,4 +2,4 @@ time.c: Using 3.579545 MHz WALL PM GTOD PM timer -However, this flaw also uncovered a bug in the Xen hypervisor, possibly causing backward time drift. With this update, the fine grained accounting for the PM timer is introduced which eliminates the time difference issues.+With this update, fine grained accounting for the PM timer is introduced which eliminates the time drift issues. However, this fix also uncovered a bug in the Xen hypervisor, which can in turn cause backward time drift. If Xen HVM guests are using the PM timer, it is suggested that the host uses the kernel-xen-2.6.18-194.21.1.el5 package or a newer version.