Bug 462622
| Summary: | spufs in RHEL5.3: missing context switch notification log | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | IBM Bug Proxy <bugproxy> | ||||||||
| Component: | kernel | Assignee: | Ameet Paranjape <aparanja> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 5.3 | CC: | aparanja, bpeters, cward, dhowells, hannsj_uhl, jjarvis, peterm, syeghiay | ||||||||
| Target Milestone: | rc | Keywords: | OtherQA | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | other | ||||||||||
| OS: | All | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2009-01-20 20:07:13 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 439483 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
IBM Bug Proxy
2008-09-17 15:40:48 UTC
Created attachment 316980 [details]
[POWERPC] spufs: add context switch notification log
The kernel freeze deadline was Sept. 15. Please provide a business justification so an exception can be requested. Hello John, as pointed out above this bugzilla is related to the RHEL5.3 feature request as in LTC bug 38861 - RHBZ293611: 201139: Kernel support for Cell Performance Debugging Tool (PDT). . Due to the missing patch from this bugzilla this RHEL5.3 feature request is not usable with the current RHEL5.3 test kernel. . The PDT tool for Cell is an important performance monitoring tool for the Cell platform and vital for customers trying to exploit the Cell hardware in an optimal way. Therefore the enablement of this tool with RHEL5.3 is very important for the Cell platform. . Therefore I would lke to ask you to raise an execption request to establish the kernel enablement of this tool with RHEL5.3. . Thanks in advance for your support. Hi John, here is my best understanding the the customer impact of this patch. Hanns, Christian, please feel free to add more detail. Q. Who is impacted? A. All Cell B.E. customers are deeply concerned about performance and the pdt performance analysis tool represents a significant improvement in their ability to tune their workload. Q. What is the impact if this patch is not included? A. The pdt tool reports the missing log and emits erroneous output. Customers are unable to use the best tool to tune performance on a Cell B.E. system. We expect this to be a serious customer satisfaction issue. Q. Does pdt currently work on 5.2? A. No Q. How do customers currently tune their executables on 5.2? A. Customers currently must use oprofile which provides less granular information. Posted for review: http://post-office.corp.redhat.com/archives/rhkernel-list/2008-September/msg00990.html This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. (In reply to comment #24) > The kernel maintainer has some concerns about this patch: > > dhowells> my main objection is that opening the switch_log preemptively clears > it, even if someone else is currently using it > <dhowells> that means potential data loss > <dhowells> and anyone can open it; its umode is 0444 > > > Christian, > > Please update so I can reply to the Red Hat kernel maintainer ASAP. Thanks. > We can change the semantics to only clear on first open. I'll cook up a patch. Created attachment 320426 [details] add on patch for review The following patch addresses comments from RKML. The patch applies cleanly on top of el5.119 plus the patch https://bugzilla.linux.ibm.com/attachment.cgi?id=39098. The functional verification is still outstanding because the necessary people are on public holidays. I will report success as soon as I get the information from those people. I've been wanting to address this for some time, seems like now would be appropriate. Here's what I've just posted to cbe-oss-dev: http://patchwork.ozlabs.org/bundle/jk/log-updates/ Patch 3/4 should address David's concerns, without changing the kABI. Patch 1/4 makes the same change to sputrace. Perhaps we'd just need 3/4 added to the RH backport? > I've been wanting to address this for some time, seems like now would be > appropriate. Here's what I've just posted to cbe-oss-dev: > > http://patchwork.ozlabs.org/bundle/jk/log-updates/ > > Patch 3/4 should address David's concerns, without changing the kABI. Patch > 1/4 makes the same change to sputrace. That looks better:-) Note that you should probably use kmalloc() rather than kzalloc() to allocate the buffer - you don't wipe the entries you've copied to userspace, so wiping the buffer up front would seem to be a waste of time. However, now that you've taken out the spinlock, what protects spu_run_fini()'s call to spu_switch_log_notify() against collisions with spu_bind_context() and spu_unbind_context()? run uses ctx->run_mutex, but bind/unbind use cbe_spu_info[node].list_mutex. (In reply to comment #9) > Created an attachment (id=320426) [details] > add on patch for review > > The following patch addresses comments from RKML. > The patch applies cleanly on top of > el5.119 plus the patch > https://bugzilla.linux.ibm.com/attachment.cgi?id=39098. > The functional verification is still outstanding because the necessary > people are on public holidays. > I will report success as soon as I get the information from those people. That patch looks okay. (In reply to comment #29, copying previously-offline reply to bugzilla) Hi David, > Note that you should probably use kmalloc() rather than kzalloc() to > allocate the buffer - you don't wipe the entries you've copied to > userspace, so wiping the buffer up front would seem to be a waste of > time. The existing code uses kzalloc to set the head and tail to NULL, but I agree - would be better to kmalloc + set head and tail explicitly. I'll post a separate patch. > However, now that you've taken out the spinlock, what protects > spu_run_fini()'s call to spu_switch_log_notify() against collisions > with spu_bind_context() and spu_unbind_context()? run uses > ctx->run_mutex, but bind/unbind use cbe_spu_info[node].list_mutex. All context switches are done with ctx->state_mutex held (ie, spu_acquire/spu_release). This is why I moved the switch_log_notify in run.c to spu_run_fini, before we release the context. (In reply to comment #30) > I'll post a separate patch. Additional patch for kzalloc -> kmalloc change is available here: http://patchwork.ozlabs.org/patch/4793/ Christian - shall we send as a single patch? perhaps 3/4, 4/4, and this kmalloc change? I just started a build including 3/4 (took out hch's version), 4/4 and http://patchwork.ozlabs.org/patch/4793/ It wont finish today, I'll give the rpm to PDT on monday. jk, I'll attach them as soon as PDT team has been testing. Would be cool, If we would have commit id's by then ;-) Created attachment 321075 [details]
powerpc-spufs-switch_log-little-fixes.patch
this folded patch addresses all comments from RKML regarding the patch spufs-add-context-switch-notification-log.
See patch header for their upstream commit ids.
This patch goes on top of spufs-add-context-switch-notification-log.patch
*** Bug 49481 has been marked as a duplicate of this bug. *** in kernel-2.6.18-123.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Tested on qdc164 - negative.... : -------------------------------------- [shvadron@qdc164 test]$ uname -a Linux qdc164.austin.ibm.com 2.6.18-121.el5 #1 SMP Mon Oct 27 21:52:33 EDT 2008 ppc64 ppc64 ppc64 GNU/Linux [shvadron@qdc164 test]$ ./pdt test PDT using the following environment varilable: - LD_LIBRARY_PATH = /usr/lib64/trace: - PDT_CONFIG_FILE = example_config.xml - PDT_TRACE_OUTPUT = . - PDT_OUTPUT_PREFIX = OK to run: 'test ' ? (y/n) y (PDT) WARNING: Context switch log does not exist. Hello world from SPE! An user event in SPE An user interval in SPE 0ll The value before DMA in SPE is -1 The value received in SPE is 1 Final Value = 100 [shvadron@qdc164 test]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.3 Beta (Tikanga) [shvadron@qdc164 test]$ --------------------------------------------- As you can see from the warning - the context switch notification interface does not work. mmmm- I now see that release -123 is needed, not -121. I will ask for an upgrade. Stay tuned. OK - on release -123 things are fine: 19 0.050771 0.188ms PPE[0] 0300 CONTEXT_SWITCH EventID=300 Processor=2 EventCount=A CallingThread=400023DF220 ....... bugzilla verified with -123 kernel .. closing bugzilla ... > However, now that you've taken out the spinlock, what protects spu_run_fini()'s
> call to spu_switch_log_notify() against collisions with spu_bind_context() and
> spu_unbind_context()? run uses ctx->run_mutex, but bind/unbind use
> cbe_spu_info[node].list_mutex.
ctx->state_mutex must be held over bind and unbind, and spu_run_fini is always called with state mutex held, but drops the mutex (spu_release) after calling spu_switch_log notify.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |