Login
[x]
Log in using an account from:
Fedora Account System
Red Hat Associate
Red Hat Customer
Or login using a Red Hat Bugzilla account
Forgot Password
Login:
Hide Forgot
Create an Account
Red Hat Bugzilla – Attachment 317537 Details for
Bug 463573
Patches to improve timekeeping for RHEL kernels running under VMware.
[?]
New
Simple Search
Advanced Search
My Links
Browse
Requests
Reports
Current State
Search
Tabular reports
Graphical reports
Duplicates
Other Reports
User Changes
Plotly Reports
Bug Status
Bug Severity
Non-Defaults
|
Product Dashboard
Help
Page Help!
Bug Writing Guidelines
What's new
Browser Support Policy
5.0.4.rh83 Release notes
FAQ
Guides index
User guide
Web Services
Contact
Legal
This site requires JavaScript to be enabled to function correctly, please enable it.
[patch]
x86_64: Add a minimal TSC based clocksource implementation for 64bit.
minimal-clocksource-for-64bit.patch (text/plain), 8.15 KB, created by
Alok Kataria
on 2008-09-23 21:37:16 UTC
(
hide
)
Description:
x86_64: Add a minimal TSC based clocksource implementation for 64bit.
Filename:
MIME Type:
Creator:
Alok Kataria
Created:
2008-09-23 21:37:16 UTC
Size:
8.15 KB
patch
obsolete
>x86_64: Add a minimal TSC based clocksource implementation for 64bit. > >From: Alok N Kataria <akataria@vmware.com> > >The 2.6.18 64bit RHEL based linux kernel keeps time by counting timer >interrupts. > >This is problematic when running in a virtual machine. The VM can be >descheduled for some portion of time. When the VM is rescheduled, the >hypervisor needs to "catch up" delivering timer interrupts so that the >kernel can determine the correct time. > >Until the VM is caught up, the kernel's time will be behind, causing >short term divergence of the kernel's time with wallclock time. >Additionally, under certain overcommitment conditions, it may not be >possible for the hypervisor to fully catch up. In this case, the kernel >time can fall behind over the long term. > >The solution is to change the kernel's timekeeping algorithm to keep >time based on how much time has elapsed according to a time counter >rather than by counting interrupts. This is similar to the timeofday >algorithm used by clocksource enabled mainline kernels or the RHEL5 >32bit kernel. > >The time counter that is used to keep time is the virtual TSC. The >virtual TSC is an idealized TSC that does not suffer from the issues >that many physical TSCs suffer from. However, measuring the frequency >of the TSC inside a VM is difficult. So, when running on a VMware >hypervisor, query the hypervisor to discover the TSC frequency. > >Note that this new TSC-based time keeping algorithm is enabled by >default only after a VMware hypervisor has been detected, eliminating >any effect when running on non-VMware systems (besides executing the >VMware hypervisor detection code). > >Signed-off-by: Alok N Kataria <akataria@vmware.com> >Signed-off-by: Dan Hecht <dhecht@vmware.com> >--- > > Documentation/kernel-parameters.txt | 8 ++ > arch/x86_64/kernel/time.c | 124 +++++++++++++++++++++++++++++++---- > 2 files changed, 118 insertions(+), 14 deletions(-) > > >diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt >index bd4e2a9..8dd58ca 100644 >--- a/Documentation/kernel-parameters.txt >+++ b/Documentation/kernel-parameters.txt >@@ -1676,6 +1676,14 @@ running once the system is up. > thash_entries= [KNL,NET] > Set number of hash buckets for TCP connection > >+ timekeeping_use_tsc >+ Enable TSC based timekeeping mode, which keeps time >+ using tsc's rather than counting interrupts. >+ >+ no_timekeeping_use_tsc >+ Disable autoconfiguring of the above timekeeping >+ mode on a VMware Hypervisor. >+ > time Show timing data prefixed to each printk message line > > clocksource= [GENERIC_TIME] Override the default clocksource >diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c >index fab019d..3bf6012 100644 >--- a/arch/x86_64/kernel/time.c >+++ b/arch/x86_64/kernel/time.c >@@ -11,6 +11,7 @@ > * Copyright (c) 2002,2006 Vojtech Pavlik > * Copyright (c) 2003 Andi Kleen > * Copyright (c) 2008 VMware, Inc. get TSC frequency from the hypervisor. >+ TSC timekeeping gtod mode. > * RTC support code taken from arch/i386/kernel/timers/time_hpet.c > */ > >@@ -91,6 +92,10 @@ unsigned long __wall_jiffies __section_wall_jiffies = INITIAL_JIFFIES; > struct timespec __xtime __section_xtime; > struct timezone __sys_tz __section_sys_tz; > >+/* -1=>disabled, 0=>autoconfigure, 1=>enabled */ >+static int timekeeping_use_tsc; >+static cycles_t cycles_per_jiffy, cycles_accounted_limit; >+ > extern unsigned long preset_lpj; > /* > * do_gettimeoffset() returns microseconds since last timer interrupt was >@@ -362,21 +367,11 @@ static noinline void handle_lost_ticks(int lost, struct pt_regs *regs) > #endif > } > >-void main_timer_handler(struct pt_regs *regs) >+static void do_timer_account_lost_ticks(struct pt_regs *regs) > { >- static unsigned long rtc_update = 0; > unsigned long tsc; > int delay = 0, offset = 0, lost = 0, i; > >-/* >- * Here we are in the timer irq handler. We have irqs locally disabled (so we >- * don't need spin_lock_irqsave()) but we don't know if the timer_bh is running >- * on the other CPU, so we need a lock. We also need to lock the vsyscall >- * variables, because both do_timer() and us change them -arca+vojtech >- */ >- >- write_seqlock(&xtime_lock); >- > if (vxtime.hpet_address) > offset = hpet_readl(HPET_COUNTER); > >@@ -446,12 +441,65 @@ void main_timer_handler(struct pt_regs *regs) > jiffies += (u64)lost - (tick_divider - 1); > } > >+ /* Do the timer stuff */ >+ for (i = 0; i < tick_divider; i++) >+ do_timer(regs); >+} >+ >+/* >+ * Measure time based on the TSC, rather than counting interrupts. >+ */ >+static void do_timer_tsc_timekeeping(struct pt_regs *regs) >+{ >+ int i; >+ cycles_t tsc, tsc_accounted, tsc_not_accounted; >+ >+ tsc = get_cycles_sync(); >+ tsc_accounted = vxtime.last_tsc; >+ >+ if (unlikely(tsc < tsc_accounted)) >+ return; >+ >+ tsc_not_accounted = tsc - tsc_accounted; >+ >+ if (tsc_not_accounted > cycles_accounted_limit) { >+ /* Be extra safe and limit the loop below. */ >+ tsc_accounted += tsc_not_accounted - cycles_accounted_limit; >+ tsc_not_accounted = cycles_accounted_limit; >+ } >+ >+ while (tsc_not_accounted >= cycles_per_jiffy) { >+ for (i = 0; i < tick_divider; i++) >+ do_timer(regs); >+ tsc_not_accounted -= cycles_per_jiffy; >+ tsc_accounted += cycles_per_jiffy; >+ } >+ >+ monotonic_base += ((tsc_accounted - vxtime.last_tsc) * >+ 1000000 / cpu_khz); >+ vxtime.last_tsc = tsc_accounted; >+} >+ >+void main_timer_handler(struct pt_regs *regs) >+{ >+ int i; >+ static unsigned long rtc_update = 0; >+ > /* >- * Do the timer stuff. >+ * Here we are in the timer irq handler. We have irqs locally disabled (so we >+ * don't need spin_lock_irqsave()) but we don't know if the timer_bh is running >+ * on the other CPU, so we need a lock. We also need to lock the vsyscall >+ * variables, because both do_timer() and us change them -arca+vojtech > */ > >+ write_seqlock(&xtime_lock); >+ >+ if (timekeeping_use_tsc > 0) >+ do_timer_tsc_timekeeping(regs); >+ else >+ do_timer_account_lost_ticks(regs); >+ > for (i = 0; i < tick_divider; i++) { >- do_timer(regs); > #ifndef CONFIG_SMP > update_process_times(user_mode(regs), regs); > #endif >@@ -1007,11 +1055,42 @@ void __init time_init(void) > tsc_khz = vm_tsc_khz; > cpu_khz = tsc_khz; > preset_lpj = (vm_tsc_khz * 1000) / HZ; >+ if (timekeeping_use_tsc >= 0) { >+ >+ /* Enable "lazy" timer emulation. Rather than holding back virtual >+ * time when timer interrupt delivery falls behind and attempting >+ * to "catch up", in lazy mode, missed periodic interrupts are >+ * skipped and virtual time always reflects real time. This is >+ * possible with timekeeping_use_tsc. >+ */ >+ if (vmware_enable_lazy_timer_emulation()) >+ timekeeping_use_tsc = 1; >+ else { >+ printk(KERN_WARNING >+ "time.c: failed to enable lazy timer " >+ "emulation. Disabling tsc based " >+ "timekeeping\n"); >+ timekeeping_use_tsc = 0; >+ } >+ } > } else > printk(KERN_WARNING "time.c: failed to get tsc " > "frequency from hypervisor.\n"); > } > >+ /* Keep time based on the TSC rather than by counting interrupts. */ >+ if (timekeeping_use_tsc > 0) { >+ cycles_per_jiffy = (cpu_khz * 1000) / HZ; >+ /* >+ * The maximum cycles we will account per >+ * timer interrupt is 10 minutes. >+ */ >+ cycles_accounted_limit = cycles_per_jiffy * HZ * 60 * 10; >+ tick_nsec = NSEC_PER_SEC / HZ; >+ printk(KERN_INFO >+ "time.c: Using tsc for timekeeping HZ %d\n", HZ); >+ } >+ > vxtime.mode = VXTIME_TSC; > vxtime.quot = (USEC_PER_SEC << US_SCALE) / vxtime_hz; > vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz; >@@ -1074,7 +1153,10 @@ void time_init_gtod(void) > else > vgetcpu_mode = VGETCPU_LSL; > >- if (vxtime.hpet_address && notsc) { >+ if (timekeeping_use_tsc > 0) { >+ timetype = "TSC Timekeeping"; >+ vxtime.mode = VXTIME_TSC; >+ } else if (vxtime.hpet_address && notsc) { > timetype = hpet_use_timer ? "HPET" : "PIT/HPET"; > if (hpet_use_timer) > vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick_real; >@@ -1451,3 +1533,17 @@ static int __init divider_setup(char *s) > > __setup("divider=", divider_setup); > #endif >+ >+static int __init timekeeping_use_tsc_setup(char *s) >+{ >+ timekeeping_use_tsc = 1; >+ return 0; >+} >+__setup("timekeeping_use_tsc", timekeeping_use_tsc_setup); >+ >+static int __init no_timekeeping_use_tsc_setup(char *s) >+{ >+ timekeeping_use_tsc = -1; >+ return 0; >+} >+__setup("no_timekeeping_use_tsc", no_timekeeping_use_tsc_setup);
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Diff
View Attachment As Raw
Actions:
View
|
Diff
Attachments on
bug 463573
:
317536
|
317537
|
317538
|
317727
|
317728
|
317734
|
317735
|
319716
|
319717
|
319718
|
319719
|
319720
|
319799
|
319817
|
320098
|
320133
|
320485
|
321043
|
325724
|
325725
|
325726
|
325727
|
325728
|
326672
|
329135
|
329213
|
332979