Description of problem: I think I reported this once before. The jitter time recorded by the clock test is dependent on the total number of cpus --- the last cpu through the loop's time stamp is compared against the time stamp for cpu#0 -- the first one through the loop. Working on a patch. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 585859 [details] Added some debug printfs w/ start, end time
On a small system with the attached patch I get: [root@uvmid5-sys ~]# ./clocktest Testing for clock jitter on 48 cpus using CPU_CALLOC DEBUG time_start=1337614424785436706 time_end1337614424799660345 total_nsec=14223639 cpu=0 nsec=1353771 cpu=1 nsec=1507386 cpu=2 nsec=1901677 cpu=3 nsec=2050957 cpu=4 nsec=2246597 cpu=5 nsec=2309142 cpu=6 nsec=2418847 cpu=7 nsec=2488762 cpu=8 nsec=2538832 cpu=9 nsec=2592332 cpu=10 nsec=3432573 cpu=11 nsec=3963604 cpu=12 nsec=4027399 cpu=13 nsec=4240009 cpu=14 nsec=4287979 cpu=15 nsec=4386449 cpu=16 nsec=4489709 cpu=17 nsec=4553434 cpu=18 nsec=5522405 cpu=19 nsec=6032876 cpu=20 nsec=6233891 cpu=21 nsec=6283141 cpu=22 nsec=6363866 cpu=23 nsec=6451086 cpu=24 nsec=6586511 cpu=25 nsec=7511572 cpu=26 nsec=7581832 cpu=27 nsec=8124978 cpu=28 nsec=8269608 cpu=29 nsec=8322173 cpu=30 nsec=8414973 cpu=31 nsec=8533763 cpu=32 nsec=8621619 cpu=33 nsec=9510609 cpu=34 nsec=9643709 cpu=35 nsec=10158940 cpu=36 nsec=10333850 cpu=37 nsec=10433930 cpu=38 nsec=10526055 cpu=39 nsec=11590916 cpu=40 nsec=11740792 cpu=41 nsec=12225402 cpu=42 nsec=12377092 cpu=43 nsec=12446897 cpu=44 nsec=12518567 cpu=45 nsec=13501693 cpu=46 nsec=13760364 cpu=47 nsec=14223164 DEBUG: max jitter for pass 0 was 0.012869 (cpu 0,47) PASSED, largest jitter seen was 0.012869 clock direction test: start time 1337614424, stop time 1337614484, sleeptime 60, delta 0 PASSED
The following loop from clocktest.c is bogus: slow_cpu = fast_cpu = 0; for (cpu=0; cpu < num_cpus; cpu++) { nsec = NSEC(time[cpu]); if (nsec < NSEC(time[slow_cpu])) { slow_cpu = cpu; } if (nsec > NSEC(time[fast_cpu])) { fast_cpu = cpu; } } jitter = ((double)(NSEC(time[fast_cpu]) - NSEC(time[slow_cpu])) / (double)NSEC_PER_SEC); Assume that the clock is *perfect* and that no jitter at all exists just the actual time delay required to execute instructions and copy memory to move the thread to a new CPU. A few things to note: 1) The first time through the loop cpu == slow_cpu == fast_cpu == 0 2) Consider the first conditional the second time and all subsequent times through the loop. Note that nsec must always be greater than NSEC(time[0]) so the conditional fails. Slow_cpu will always be 0 at the end of the loop. 3) Consider the second conditional, here we are comparing NSEC(time[cpu+1]) to NSEC(time[cpu]) which will always be true. So fast_cpu is always going to be the last cpu (num_cpus - 1). Now it is possible that clock skew could change the above slightly, but never if the jitter is in fact within bounds. In practice, because SGI has inter-node synchronization of real-time clocks as part of the hub, the times recorded for each cpu is always linearly increasing in the cpu's number. Now look at the calculation of jitter where I have substituted the slow_cpu and fast_cpu values: jitter = ((double)(NSEC(time[0]) - NSEC(time[num_cpus - 1])) / (double)NSEC_PER_SEC); My point is that even with a *perfect* clock jitter is going to scale linearly with the number of cpus!! The *average* jitter would be jitter = ((double)(NSEC(time[0]) - NSEC(time[num_cpus - 1])) / (double)NSEC_PER_SEC / (double) num_cpus;
If I understand correctly, the suggested fix is to change the test's requirement to scale with cpu count, rather than impose an absolute value independant of cpu count?
Not precisely. An absolute value between any arbitrary pair of sockets is appropriate. That way any two processes on the system are basically looking at the same clock value. The current measurement technique (I don't know of another one) moves a process from cpu A to cpu B -- the time (actual wall time) for the move is part of the 'jitter'. What is wrong with the current approach is that on a 16 core system the process moves only 31 times (HT turned on) but on a 2048 core system it is moved 4095 times, So the wall clock time consumed actually moving the process is 132 times greater than in the 16 core case. Recording the delta time for each move would allow a tighter absolute value on each move which is actually a stronger guarantee for the customer. I had a sample program that filled in every off-diagonal location in an nr_cpus * nr_cpus matrix; so O(n**2). I found that it wasn't too bad if I was careful to start most jumps from the current cpu. Does this make sense? George
Would it be an improvement to traverse the cpus (by number) and take the sum of the deltas, divide it by cpu count, and compare it to a limit? Something like: total = 0 delta = minDelta = maxDelta = 0 for (cpu=1; cpu < num_cpus; cpu++) { delta += NSEC(time[cpu]) - NSEC(time[cpu-1]); if (delta < minDelta) minDelta = delta; if (delta > maxDelta) maxDelta = delta; total += delta } Perhaps set a bound on total "jitter" per cpu, then a min and a max bound on the delta? Does some other traversal make sense across all types of systems?
Not resolved in v7-1.6.4 proposing for 6.4.1
Created attachment 734372 [details] changes the jitter measurement to test adjacent cpus and average the result This patch changes the way jitter is calculated. The old test in essence calculates jitter from the first to the last cpu. This change is to calculate jitter between adjacent (in numbering) cpus. It adds the adjacent jitter each pass, and divides it by the number of measurements (the number of cpus minus one). It sets the standard at 0.2 sec. jitter, for both the average, and the worst adjacent cpu measurement.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1139.html