433659 – ib_send_bw test program fails on intel-s6e4533-01-mm

Bug 433659 - ib_send_bw test program fails on intel-s6e4533-01-mm

Summary: ib_send_bw test program fails on intel-s6e4533-01-mm

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	openib
Sub Component:
Version:	5.2
Hardware:	ia64
OS:	Linux
Priority:	high
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Doug Ledford
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	RHEL5u2_relnotes RHEL5u3_relnotes
TreeView+	depends on / blocked

Reported:	2008-02-20 18:17 UTC by Gurhan Ozen
Modified:	2013-11-04 01:35 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHBA-2008-0432
Doc Type:	Bug Fix
Doc Text:	(ia64) Running perftest will fail if different CPU speeds are detected. As such, you should disable CPU speed scaling before running perftest.
Clone Of:
Environment:
Last Closed:	2008-05-21 17:25:21 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
get_clock.c (4.49 KB, text/plain) 2008-04-08 14:01 UTC, Doug Ledford	no flags	Details
get_clock.h (2.49 KB, text/plain) 2008-04-08 14:02 UTC, Doug Ledford	no flags	Details
possibly fixed get_clock.c (4.93 KB, text/plain) 2008-04-08 14:22 UTC, Doug Ledford	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0432	0	normal	SHIPPED_LIVE	openib bug fix and enhancement update	2008-05-20 16:47:22 UTC

Description Gurhan Ozen 2008-02-20 18:17:33 UTC

Description of problem:
Following error is given on an ia64 box:

------------------------------------------------------------------
 #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec]  
Conflicting CPU frequency values detected: 1466.000000 != 1667.000000
  65536        1000               0.00                  0.00
------------------------------------------------------------------

Note that it only happens when ia64 is on the client side. When it's used as
server, things work fine. 

Version-Release number of selected component (if applicable):
RHEL5.2-Server-20080212.0 tree

How reproducible:
Everytime.

Steps to Reproduce:
1. Run ib_send_bw utility where the client is an ia64 node
2.
3.
  
Actual results:


Expected results:


Additional info:
This is a regression

Comment 1 Doug Ledford 2008-03-28 20:39:42 UTC

OK, this isn't an openib issue, it's a kernel issue.  Basically, the output of
/proc/cpuinfo on the 16 way Itamium this bug comes from is not, in any way,
constant.  The nominal cpu MHz is 1466, but by running this command:

while true; do grep "cpu MHz" /proc/cpuinfo | grep 1667; done

you do in fact get an occasional jump up from 1466.  What's more, the bogomips
in this file can be totally horked.  This is the 16 bogomips values from a
single cat of /proc/cpuinfo:

BogoMIPS   : 1658.88
BogoMIPS   : 1671.16
BogoMIPS   : 16.35
BogoMIPS   : 16.35
BogoMIPS   : 3301.37
BogoMIPS   : 3309.56
BogoMIPS   : 3309.56
BogoMIPS   : 3309.56
BogoMIPS   : 1671.16
BogoMIPS   : 1683.45
BogoMIPS   : 1662.97
BogoMIPS   : 1667.07
BogoMIPS   : 3309.56
BogoMIPS   : 3309.56
BogoMIPS   : 3309.56
BogoMIPS   : 3301.37

In any case, it really looks like the contents of /proc/cpuinfo on ia64 isn't
reliable/trustable, and the perftest program is noticing that and refusing to
make bandwidth numbers from an inconsistent divisor (the numbers don't have much
value if you don't have a reference between wall time and cpu time).

Comment 4 Prarit Bhargava 2008-04-01 12:21:11 UTC

Changing severity to low.  This Intel whitebox system appears to be the only
system with this issue.  In the past, Intel whiteboxes have shown other strange
behavior.  I suspect that an SCI is being issued during system init and that is
causing problems with the bogomips calibration.

P.

Comment 6 Prarit Bhargava 2008-04-01 12:22:11 UTC

Gurhan -- is this system connected to a serial console?

P.

Comment 7 Gurhan Ozen 2008-04-01 12:51:00 UTC

(In reply to comment #6)
> Gurhan -- is this system connected to a serial console?
> 
> P.

Prarit, 
No, not yet:(
Right now the box is being used by mjenner, you can grab me or him to show you
in the lab where the box is if you are in the office.

Comment 8 Gurhan Ozen 2008-04-02 18:00:10 UTC

Ok, so trying this on another ia64 box, I can get the client program working,
however it also prints out this warning message:

Warning: measured timestamp frequency 399.187 differs from nominal 1594 MHz

Prarit, I'll let you decide what to do about it since I don't know what could be
 causing it or if it's a bug. 

This was, by the way, on hp-sapphire-02.rhts.boston.redhat.com box, borrowed
from dchapman .

Comment 9 Luming Yu 2008-04-07 05:44:58 UTC

why does the tool use the bogomips in /proc/cpuinfo as a definitive information
to decide run or not run?  Instead of using the boot-time data, the tool should
calibrate the value by itself to reflect the most recent status.

Comment 10 Doug Ledford 2008-04-07 06:59:38 UTC

The tool doesn't use the bogomips value.  I only posted the bogomips value to
demonstrate how screwed up the values in proc/cpuinfo are on the machine in
question.  The tool uses the CPU MHz value only, and that value varies on this
particular machine.

Comment 11 Luming Yu 2008-04-07 08:22:32 UTC

It is also an iffy decision to use CPU MHz value for your test program, because
With the DBS and cpuspeed enabled, the proc CPU MHZ value is to be consistent
with current CPU p-state which will be adjusted from time to time based on
workload of the time.

Comment 13 Luming Yu 2008-04-08 02:30:46 UTC

I think it is not proper to use CPU MHZ of proc/cpuinfo in the ib_send_bw test
program based on it's changeability... The tool should calibrate the CPU MHZ
value by itself. 

After bootup, if you still can calibrate the bad bogomips value as comment# 1,
then that is a real problem , and we need to worry about it.

Comment 14 Luming Yu 2008-04-08 02:32:51 UTC

moving it back to openib issue for now.

Comment 15 Doug Ledford 2008-04-08 03:40:18 UTC

In response to comment #13, the program *is* generating its own CPU MHz rating.
 That's the whole point of the message in comment #8.  Contrary to Prarit's
comment #12, the program isn't comparing an itc clock to a cpu clock, it's
comparing the CPU MHz as generated using this method:

/*
 Use linear regression to calculate cycles per microsecond.
 http://en.wikipedia.org/wiki/Linear_regression#Parameter_estimation
*/

versus the value reported in /proc/cpuinfo and is then reporting the variance
whenever the variance is > 1%.

Basically, the program has two checks it performs on CPU MHz.  The first is that
it reads all of the CPU MHz values from /proc/cpuinfo.  It makes sure that all
CPUs report the same speed.  If, in a single reading of /proc/cpuinfo, some CPUs
have one speed and others have another, then it reports the message that
originally caused this bug to be opened.  Once the reading of /proc/cpuinfo has
passed the "all cpus are identical speed" test, then the code generates its own
value of CPU MHz based upon the linear regression technique and compares that to
the value report in the /proc/cpuinfo file and if the difference between the two
is greater than 1%, you get the second message.

Now, I should point out that these same tests have never produced any problems
anywhere other than ia64, so I'm pretty sure the linear regression method is
working properly (at least on i686/x86_64 and probably ppc64 too).  That would
seem to indicate that, contrary to Luming's comment #11, the values in
/proc/cpuinfo are *not* in fact being kept consistent with the current processor
state.

All that makes me think that this is still a kernel problem, not a problem with
the ib test code.

Comment 16 Luming Yu 2008-04-08 05:01:02 UTC

ok, could you share the test case that I can try on my ia64 box ...

Comment 17 Doug Ledford 2008-04-08 14:00:44 UTC

OK, it looks like Prarit's comment #12 was correct.  My mistake on that.  While
looking through the header file get_clock.h from the source code, it appears
that the method by which the code gets the cycle count on ia64 is to access
ar.itc, which I can only assume is the itc clock that Prarit referred to.

I'm attaching get_clock.c and get_clock.h to this report.  These contain the
functions the perftest programs use to calibrate/check the cpu cycle times.

Now, if the itc clock isn't always the same as the cpu clock, is it true that
they are always a set multiple of each other, and if so what are the possible
multiples?  I could write the code so that on ia64 is checks alternative
multiples before declaring the values bad.

Comment 18 Doug Ledford 2008-04-08 14:01:56 UTC

Created attachment 301633 [details]
get_clock.c

Comment 19 Doug Ledford 2008-04-08 14:02:23 UTC

Created attachment 301634 [details]
get_clock.h

Comment 20 Doug Ledford 2008-04-08 14:22:32 UTC

Created attachment 301635 [details]
possibly fixed get_clock.c

This version of get_clock.c attempts to determine if a multiple is in use
between the itc and cpu clocks, and if so adjusts things accordingly.

Comment 23 Luming Yu 2008-04-09 03:17:43 UTC

I tested it, and get the following results. The DBS does make the testing
results complete different.  Probaby RHEL 5 kernel doesn't have /proc/cpuinfo
linked to CPUFREQ driver for retrieving current cpu frequency. Another reason is
the current cpu frequency is indeed different at calibrating time than at time
of peeking /proc/cpuinfo. (It is quite possible because that is what DBS is
doing to adapt to  different load for power saving purpose.)

[root@tigerG tmp]# service cpuspeed stop
Disabling ondemand cpu frequency scaling:                  [  OK  ]
[root@tigerG tmp]# ./a.out
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
proc frequency values detected: 1667.000000 , 1667.000000
[root@tigerG tmp]# service cpuspeed start
Enabling ondemand cpu frequency scaling:                   [  OK  ]
[root@tigerG tmp]# ./a.out
proc frequency values detected: 1466.000000 , 1466.000000
proc frequency values detected: 1466.000000 , 1466.000000
proc frequency values detected: 1466.000000 , 1466.000000
proc frequency values detected: 1466.000000 , 1466.000000
proc frequency values detected: 1466.000000 , 1466.000000
proc frequency values detected: 1466.000000 , 1667.000000
Conflicting CPU frequency values detected: 1466.000000 != 1667.000000
[root@tigerG tmp]#

Comment 24 Doug Ledford 2008-04-09 15:36:51 UTC

OK, I've built a new version of perftest with the modified routine to check for
a clock multiple of the itc.  It will still fail if it detects different cpu
speeds, and there's not much I think we should do about that.  I would be more
include to tell people to disable cpu speed scaling during performance runs.

Comment 26 Gurhan Ozen 2008-04-24 04:13:29 UTC

So even on this intel whitebox i can reproduce the behavior Luming pointed:

[root@intel-s6e4533-01-mm 2008:8175]# service cpuspeed stop
Disabling ondemand cpu frequency scaling:                  [  OK  ]
[root@intel-s6e4533-01-mm 2008:8175]# ib_send_bw -m 2048
dell-pe1950-03.rhts.boston.redhat.com
------------------------------------------------------------------
                    Send BW Test
Connection type : RC
Inline data is used up to 400 bytes message
  local address:  LID 0x08, QPN 0x7c0406, PSN 0x7747da
  remote address: LID 0x01, QPN 0x0003, PSN 0xaab168
Mtu : 2048
------------------------------------------------------------------
 #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec]  
  65536        1000            3687.79               3687.69
------------------------------------------------------------------
[root@intel-s6e4533-01-mm 2008:8175]# service cpuspeed start
Enabling ondemand cpu frequency scaling:                   [  OK  ]
[root@intel-s6e4533-01-mm 2008:8175]# ib_send_bw -m 2048
dell-pe1950-03.rhts.boston.redhat.com
------------------------------------------------------------------
                    Send BW Test
Connection type : RC
Inline data is used up to 400 bytes message
  local address:  LID 0x08, QPN 0x7d0406, PSN 0xef907e
  remote address: LID 0x01, QPN 0x0004, PSN 0xa5f393
Mtu : 2048
------------------------------------------------------------------
 #bytes #iterations    BW peak[MB/sec]    BW average[MB/sec]  
Conflicting CPU frequency values detected: 1466.000000 != 1667.000000
  65536        1000               0.00                  0.00
------------------------------------------------------------------
[root@intel-s6e4533-01-mm 2008:8175]# 

Shall we release note this per comment #24 ?

Comment 28 Don Domingo 2008-04-24 04:39:31 UTC

Thanks Gurhan. Added the following note to RHEl5.2 release notes updates:

<quote>
(ia64) Running perftest will fail if different CPU speeds are detected. As such,
you should disable CPU speed scaling before running perftest
</quote>

please advise if any further revisions are required. thanks!

Comment 31 errata-xmlrpc 2008-05-21 17:25:21 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0432.html

Comment 32 Ryan Lerch 2008-08-11 01:22:49 UTC

Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes. 

This Release Note is currently located in the Known Issues section.

Comment 33 Ryan Lerch 2008-08-11 01:22:49 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Note You need to log in before you can comment on or make changes to this bug.