Bug 181780
Summary: | Gettimeofday() timer related slowdown and scaling issue | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Andrew Bond <andrew.bond> |
Component: | kernel | Assignee: | Jim Paradis <jparadis> |
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | jbaron, jturner, k.georgiou, peterm, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHSA-2006-0575 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-08-10 22:18:35 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 181409 | ||
Attachments: |
Description
Andrew Bond
2006-02-16 15:30:43 UTC
Created attachment 124758 [details]
program to measure gettimeofday performance and scaling
This is the program I used to generate the gettimeofday() perf data.
Created attachment 124759 [details]
Chart showing bmark test data from U1, U2, and custom U2.
The U2-PIT test was run with a kernel that had the PM timer config file option
turned off.
Is this amd system or intel? AMD. Created attachment 124794 [details]
gettimeof day fixes
i think with this patch, and passing 'nohpet' and 'nopmtimer' at the command
line the tsc will be used. you can veryify the gettimeofday timer via 'dmesg |
grep timer.c".
I think the straight line is just going to be the default, unless the
commandline arguments are passed for AMD systems. it be interesting to see how
upstream benchmarks too. thanks.
Created attachment 124817 [details]
upstream reference patch
Also, passing 'nopmtimer' and 'nohpet' on the commandline with the shipping kernel should get the better scaling behavior. what issue do these patches in comment #5 & #6 address? this was a regression introduced when the PMtimer changes went into U2. looks the regression is probably in time.c being that the regression exists even when using PIT. being that the PMTimer patches got us inline with upstream the lack of scalability almost certainly exists there as well (although it would certainly be usefully to know this for sure). The patch in comment #5 address the fact that 'unsynchronized_tsc' will almost always return true due to the fact that the clustermap data structure, is not properly initialized. This mean that Intel chips, never use the tsc for gtod, which they really should. I agree that the 'flatline' in the chart is now going to be the default for most x86_64 systems now. However, i suspect the 'PIT' line in the chart is incorrect, and is likely one of the other timers. the patch in comment #5 may be a good thing to investigate for Intel related time scalibility issues, but this BZ affects AMD not Intel. AMD doesnt use tsc for timekeeping being its deemed unreliable. this is why the PMTimer patch went into U2. it would be nice to see a graph of PIT, HPET and PMtimer scalability for U2 and U3 to see if this is truly a generic regression or a regression that only affects one method of timekeeping How can I force hpet timer usage in the stock kernel? I'm using RHEL4 U3 .29 nohpet and nopmtimer forced use of PIT/TSC according to time.c npmtimer only also forced use of PIT/TSC according to time.c no options used PM timer as expected. Created attachment 124829 [details]
Graph of RHEL4 U3 versus 2.6.15.4
When using PM timer the 2.6.15.4 kernel maps directly with RHEL4 U3. Using
nopmtimer in RHEL4 U3 shows scaling unlike my initial test with RHEL4 U2.
However, 2.6.15.4 with nopmtimer starts over 2x higher at 1 thread (6,736,148
vs 3,145,266).
ok. that looks much better :) I suspect that the tsc code might run faster with this patch: http://marc.theaimsgroup.com/?l=git-commits-head&m=113705338714125&w=2. The comment says its Intel specific. but it really isn't. If you think its important for us to scale better we might try backporting it. I'm also curious how you encountered this issue in the first place. I went back and ran the stock U2 kernel with nopmtimer flag and it performs just like U3 did in my latest comparision graph. It scales linearly starting from 3.1 million gtime()/sec at 1 thread. In my original test with U2 forcing PIT, I had recompiled the U2 kernel with the CONFIG_X86_PM_TIMER unset. That is what produced a higher 1 thread number in my first chart, but exhibited the same flat line across multiple theads. We had originally been doing some testing with IA-64 gettimeofday() calls to solve a similar non-scaling issue with that architecture. A comparison test was run with an Opteron box running U3 and we were surprised to find out it didn't show any scaling. We stepped back to U2 and U1 to see how they performed and discovered the U1/U2 delta. Created attachment 128670 [details]
patch to resolve "nopmtimer" not calling the virtual gettimeofday syscall
committed in stream U4 build 35.3. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ Created attachment 128895 [details]
Partner help from Andy to plot the new RHEL4 U4 kernel against other data
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html |