Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 462622

Summary: spufs in RHEL5.3: missing context switch notification log
Product: Red Hat Enterprise Linux 5 Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Ameet Paranjape <aparanja>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: medium    
Version: 5.3CC: aparanja, bpeters, cward, dhowells, hannsj_uhl, jjarvis, peterm, syeghiay
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: other   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 20:07:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 439483    
Attachments:
Description Flags
[POWERPC] spufs: add context switch notification log
none
add on patch for review
none
powerpc-spufs-switch_log-little-fixes.patch none

Description IBM Bug Proxy 2008-09-17 15:40:48 UTC
=Comment: #0=================================================
Christian Krafft <KRAFFT.com> - 2008-09-17 10:20 EDT
---Problem Description---
Hello RedHat,

The following patch slipped through the list of required patches for cell.
This patch is required for performance analysis on cell, as it introduces a log
file containing timing information of spu context switches.
The patch has been tested successfully.
 
---Additional Hardware Info---
IBM QS22
 
---uname output---
2.6.18-113.el5
 
Machine Type = CHRP IBM,0793-2RZ
 
---Debugger---
A debugger is not configured
 
---Steps to Reproduce---
[shvadron@qdc211 test]$ ./pdt test
PDT using the following environment varilable:
- LD_LIBRARY_PATH = /usr/lib64/trace:
- PDT_CONFIG_FILE = example_config.xml
- PDT_TRACE_OUTPUT = .
- PDT_OUTPUT_PREFIX =.
OK to run: 'test ' ? (y/n)
y
(PDT) WARNING: Context switch log does not exist.
Hello world from SPE!
An user event in SPE!
An user interval in SPE 0ll
The value before DMA in SPE is -1
The value received in SPE is 1
Final Value = 100
 
Userspace tool common name: Performance Debug Tool

The userspace tool has the following bit modes: both
 
Userspace rpm:
http://www.bsc.es/projects/deepcomputing/linuxoncell/cellsimulator/sdk3.1/SRPMS/pdt-module-2.6.18-
92.el5.src.rpm
 
=Comment: #1=================================================
Christian Krafft <KRAFFT.com> - 2008-09-17 10:23 EDT

[POWERPC] spufs: add context switch notification log

This patch adds the "context switch notification log", so that Performance
Debug Tool (PDT) is working correctly again.
=Comment: #2=================================================
Hanns-Joachim Uhl <hannsj_uhl.com> - 2008-09-17 10:38 EDT
Hello Red Hat,
as already announced at 9/15 in the RHEL5.3 feature request in LTC bug 43388 - 
RHBZ439483- LTC:5.3:201549:kernel: update spufs for Cell in RHEL5.3 to the 
upstream version
this bugzilla is now addressing the problem as outlined again below.
.
During our investigation of the integrated patches into the RHEL5.3 test 
kernel -111 we found that an important patch for a RHEL5.3 feature request is 
not included because we missed to provide it to you at 5/21 for integration 
into RHEL5.3.
.
The feature affected is the 'Kernel support for Cell Performance Debugging 
Tool (PDT)' requested with the RHEL5.3 feature request in LTC bug 38861 - 
RHBZ293611: 201139: Kernel support for Cell Performance Debugging Tool (PDT).
.
The patch "[POWERPC] spufs: add context switch notification log" is missing 
which was included upstream in kernel 2.6.26 with the commit id
5158e9b5218bd3799c9fa8c401ad24d7f0c0a0a1 
to make this capability working.
.
This bugzilla now provides you the backported patch to satisfy this RHEL5.3 
feature request. As pointed out already the attached patch was successfully 
verified using the -113 kernel. 
.
Please note that the above mentioned RHEL5.3 feature request is unusable 
without this patch. Therefore we would like to ask you to integrate this 
patch into the next RHEL5.3 test kernel.
.
Please keep me informed in case of any questions.
Thanks in advance for your support.

Comment 1 IBM Bug Proxy 2008-09-17 15:40:54 UTC
Created attachment 316980 [details]
[POWERPC] spufs: add context switch notification log

Comment 2 John Jarvis 2008-09-17 19:29:18 UTC
The kernel freeze deadline was Sept. 15.  Please provide a business justification so an exception can be requested.

Comment 3 IBM Bug Proxy 2008-09-17 20:00:32 UTC
Hello John,
as pointed out above this bugzilla is related to the RHEL5.3 feature request
as in LTC bug 38861 - RHBZ293611: 201139: Kernel support for Cell Performance
Debugging Tool (PDT).
.
Due to the missing patch from this bugzilla this RHEL5.3 feature request is
not usable with the current RHEL5.3 test kernel.
.
The PDT tool for Cell is an important performance monitoring tool for the Cell
platform and vital for customers trying to exploit the Cell hardware in an
optimal way. Therefore the enablement of this tool with RHEL5.3 is very
important for the Cell platform.
.
Therefore I would lke to ask you to raise an execption request to establish
the kernel enablement of this tool with RHEL5.3.
.
Thanks in advance for your support.

Comment 4 IBM Bug Proxy 2008-09-18 19:20:59 UTC
Hi John, here is my best understanding the the customer impact of this patch.
Hanns, Christian, please feel free to add more detail.

Q. Who is impacted?
A. All Cell B.E. customers are deeply concerned about performance and the pdt
performance analysis tool represents a significant improvement in their ability
to tune their workload.

Q. What is the impact if this patch is not included?
A. The pdt tool reports the missing log and emits erroneous output. Customers
are unable to use the best tool to tune performance on a Cell B.E. system. We
expect this to be a serious customer satisfaction issue.

Q. Does pdt currently work on 5.2?
A. No

Q. How do customers currently tune their executables on 5.2?
A. Customers currently must use oprofile which provides less granular information.

Comment 7 RHEL Program Management 2008-09-22 12:52:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 IBM Bug Proxy 2008-10-07 17:50:28 UTC
(In reply to comment #24)
> The kernel maintainer has some concerns about this patch:
>
> dhowells> my main objection is that opening the switch_log preemptively clears
> it, even if someone else is currently using it
> <dhowells> that means potential data loss
> <dhowells> and anyone can open it; its umode is 0444
>
>
> Christian,
>
> Please update so I can reply to the Red Hat kernel maintainer ASAP.  Thanks.
>

We can change the semantics to only clear on first open.  I'll cook up a patch.

Comment 9 IBM Bug Proxy 2008-10-15 12:23:43 UTC
Created attachment 320426 [details]
add on patch for review

The following patch addresses comments from RKML.
The patch applies cleanly on top of 
el5.119 plus the patch https://bugzilla.linux.ibm.com/attachment.cgi?id=39098.
The functional verification is still outstanding because the necessary people are on public holidays.
I will report success as soon as I get the information from those people.

Comment 10 IBM Bug Proxy 2008-10-16 00:40:56 UTC
I've been wanting to address this for some time, seems like now would be appropriate. Here's what I've just posted to cbe-oss-dev:

http://patchwork.ozlabs.org/bundle/jk/log-updates/

Patch 3/4 should address David's concerns, without changing the kABI. Patch 1/4 makes the same change to sputrace.

Perhaps we'd just need 3/4 added to the RH backport?

Comment 11 David Howells 2008-10-16 12:28:41 UTC
> I've been wanting to address this for some time, seems like now would be
> appropriate. Here's what I've just posted to cbe-oss-dev:
>
> http://patchwork.ozlabs.org/bundle/jk/log-updates/
>
> Patch 3/4 should address David's concerns, without changing the kABI. Patch
> 1/4 makes the same change to sputrace.

That looks better:-)

Note that you should probably use kmalloc() rather than kzalloc() to allocate the buffer - you don't wipe the entries you've copied to userspace, so wiping the buffer up front would seem to be a waste of time.

However, now that you've taken out the spinlock, what protects spu_run_fini()'s call to spu_switch_log_notify() against collisions with spu_bind_context() and spu_unbind_context()?  run uses ctx->run_mutex, but bind/unbind use cbe_spu_info[node].list_mutex.

Comment 12 David Howells 2008-10-16 12:30:02 UTC
(In reply to comment #9)
> Created an attachment (id=320426) [details]
> add on patch for review
> 
> The following patch addresses comments from RKML.
> The patch applies cleanly on top of 
> el5.119 plus the patch
> https://bugzilla.linux.ibm.com/attachment.cgi?id=39098.
> The functional verification is still outstanding because the necessary
> people are on public holidays.
> I will report success as soon as I get the information from those people.

That patch looks okay.

Comment 13 IBM Bug Proxy 2008-10-17 12:17:34 UTC
(In reply to comment #29, copying previously-offline reply to bugzilla)
Hi David,

> Note that you should probably use kmalloc() rather than kzalloc() to
> allocate the buffer - you don't wipe the entries you've copied to
> userspace, so wiping the buffer up front would seem to be a waste of
> time.

The existing code uses kzalloc to set the head and tail to NULL, but I
agree - would be better to kmalloc + set head and tail explicitly.

I'll post a separate patch.

> However, now that you've taken out the spinlock, what protects
> spu_run_fini()'s call to spu_switch_log_notify() against collisions
> with spu_bind_context() and spu_unbind_context()?  run uses
> ctx->run_mutex, but bind/unbind use cbe_spu_info[node].list_mutex.

All context switches are done with ctx->state_mutex held (ie,
spu_acquire/spu_release). This is why I moved the switch_log_notify in
run.c to spu_run_fini, before we release the context.


(In reply to comment #30)
> I'll post a separate patch.

Additional patch for kzalloc -> kmalloc change is available here:

http://patchwork.ozlabs.org/patch/4793/

Christian - shall we send as a single patch? perhaps 3/4, 4/4, and this kmalloc change?

Comment 14 IBM Bug Proxy 2008-10-17 17:23:20 UTC
I just started a build including 3/4 (took out hch's version), 4/4 and
http://patchwork.ozlabs.org/patch/4793/
It wont finish today, I'll give the rpm to PDT on monday.

jk, I'll attach them as soon as PDT team has been testing.
Would be cool, If we would have commit id's by then ;-)

Comment 15 IBM Bug Proxy 2008-10-21 18:41:01 UTC
Created attachment 321075 [details]
powerpc-spufs-switch_log-little-fixes.patch

this folded patch addresses all comments from RKML regarding the patch spufs-add-context-switch-notification-log.
See patch header for their upstream commit ids.
This patch goes on top of spufs-add-context-switch-notification-log.patch

Comment 16 IBM Bug Proxy 2008-10-30 14:51:10 UTC
*** Bug 49481 has been marked as a duplicate of this bug. ***

Comment 18 Don Zickus 2008-11-12 16:37:22 UTC
in kernel-2.6.18-123.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 19 IBM Bug Proxy 2008-11-13 17:01:56 UTC
Tested on qdc164 - negative....  :
--------------------------------------
[shvadron@qdc164 test]$ uname -a
Linux qdc164.austin.ibm.com 2.6.18-121.el5 #1 SMP Mon Oct 27 21:52:33 EDT 2008 ppc64 ppc64 ppc64 GNU/Linux
[shvadron@qdc164 test]$ ./pdt test
PDT using the following environment varilable:
- LD_LIBRARY_PATH = /usr/lib64/trace:
- PDT_CONFIG_FILE = example_config.xml
- PDT_TRACE_OUTPUT = .
- PDT_OUTPUT_PREFIX =
OK to run: 'test ' ? (y/n)
y
(PDT) WARNING: Context switch log does not exist.
Hello world from SPE!
An user event in SPE
An user interval in SPE 0ll
The value before DMA in SPE is -1
The value received in SPE is 1
Final Value = 100
[shvadron@qdc164 test]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 Beta (Tikanga)
[shvadron@qdc164 test]$
---------------------------------------------

As you can see from the warning - the context switch notification interface does not work.


mmmm- I now see that release -123 is needed, not -121. I will ask for an upgrade. Stay tuned.

Comment 21 IBM Bug Proxy 2008-11-13 17:21:45 UTC
OK - on release -123 things are fine:

19    0.050771      0.188ms   PPE[0] 0300 CONTEXT_SWITCH EventID=300 Processor=2 EventCount=A CallingThread=400023DF220 .......

Comment 22 IBM Bug Proxy 2008-11-13 17:31:02 UTC
bugzilla verified with -123 kernel .. closing bugzilla ...

Comment 24 IBM Bug Proxy 2009-01-06 01:01:13 UTC
> However, now that you've taken out the spinlock, what protects spu_run_fini()'s
> call to spu_switch_log_notify() against collisions with spu_bind_context() and
> spu_unbind_context()?  run uses ctx->run_mutex, but bind/unbind use
> cbe_spu_info[node].list_mutex.

ctx->state_mutex must be held over bind and unbind, and spu_run_fini is always called with state mutex held, but drops the mutex (spu_release) after calling spu_switch_log notify.

Comment 25 errata-xmlrpc 2009-01-20 20:07:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html