Bug 438342 - Oprofile does not work in R2 or MRG
Summary: Oprofile does not work in R2 or MRG
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: beta
Hardware: x86_64
OS: All
low
medium
Target Milestone: ---
: ---
Assignee: Clark Williams
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-03-20 13:56 UTC by IBM Bug Proxy
Modified: 2008-06-03 06:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-06-02 22:32:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to make oprofile only use counters that are available (2.11 KB, patch)
2008-04-01 18:16 UTC, William Cohen
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 43249 0 None None None Never

Description IBM Bug Proxy 2008-03-20 13:56:23 UTC
=Comment: #0=================================================
Vernon Mauery <mauery.com> - 2008-03-18 17:14 EDT
When trying to run oprofile, I see this:
[root@elm3b198 ~]# opcontrol --reset
Signalling daemon... done
[root@elm3b198 ~]# opcontrol --shutdown
Stopping profiling.
Killing daemon.
[root@elm3b198 ~]# opcontrol
--vmlinux=/usr/lib/debug/lib/modules/2.6.24.3-29.el5rt/vmlinux
[root@elm3b198 ~]# opcontrol --start
Using default event: CPU_CLK_UNHALTED:100000:0:1:1
/usr/bin/opcontrol: line 1031: /dev/oprofile/0/enabled: No such file or directory
/usr/bin/opcontrol: line 1031: /dev/oprofile/0/event: No such file or directory
/usr/bin/opcontrol: line 1031: /dev/oprofile/0/count: No such file or directory
/usr/bin/opcontrol: line 1031: /dev/oprofile/0/kernel: No such file or directory
/usr/bin/opcontrol: line 1031: /dev/oprofile/0/user: No such file or directory
/usr/bin/opcontrol: line 1031: /dev/oprofile/0/unit_mask: No such file or directory
Using 2.6+ OProfile kernel interface.
Reading module info.
Using log file /var/lib/oprofile/samples/oprofiled.log
Daemon started.
Profiler running.
[root@elm3b198 ~]# opcontrol --stop
Stopping profiling.
[root@elm3b198 ~]# opreport --long-filenames
opreport error: No sample file found: try running opcontrol --dump
or specify a session containing sample files



This bug exhibits similar symptoms to bug #40996, but the patches that were
applied to the kernel to fix that bug are already in the R2/MRG kernels.

I have seen this running 2.6.24.3-29.el5rt and Alan Stevens has seen it running
2.6.24.1-24ibmrt.
=Comment: #1=================================================
Vernon Mauery <mauery.com> - 2008-03-18 17:22 EDT
Since this blocks further investigation of Bug #35584 - RH244819-Multiple
streams degrades network performance, I am marking it as blocking.
=Comment: #2=================================================
Ankita Garg <ankigarg.com> - 2008-03-19 06:20 EDT
Vernon, could you check if oprofile works for you with a version of ooprofile
package that had acme had provided sometime back:

http://oops.ghostprotocols.net:81/acme/oprofile-0.9.3-6.acme.x86_64.rpm

or can be obtained from here:

http://userweb.kernel.org/~acme/oprofile-0.9.3-6.acme.x86_64.rpm
=Comment: #3=================================================
Ankita Garg <ankigarg.com> - 2008-03-19 06:21 EDT
On HS21 & LS41, oprofile is working fine with the latest R2 kernel without any
userspace package changes. I have not tried with the MRG kernel though.
=Comment: #4=================================================
Vernon Mauery <mauery.com> - 2008-03-19 16:06 EDT
I tried using acme's build of oprofile.  It didn't seem to fix the problem.  I
also tried downloading the source and building it myself.  I always saw the same
errors.

This is on an LS21.  I don't recall what machine type Alan was working on.
=Comment: #5=================================================
Alan P. Stevens <alan_stevens.com> - 2008-03-20 05:41 EDT
(In reply to comment #4)
> This is on an LS21.  I don't recall what machine type Alan was working on.

Vernon, I'm also on an LS21. 

NOTE: Since the Austin Performance Tools tprof is also broken on MRG kernels ( 
because of the variable tick interval ) I now have no working profilers for 
MRG / V2...

=Comment: #6=================================================
Ankita Garg <ankigarg.com> - 2008-03-20 05:53 EDT
I tried with the oprofile rpms in comment #2 on a LS21 on top of R2. With this
version of the rpm, oprofile worked fine for me. I could not try it on MRG yet,
but since R2 is based on MRG, wonder what could be missing.

Comment 1 IBM Bug Proxy 2008-03-25 20:08:26 UTC
------- Comment From mauery.com 2008-03-25 16:00 EDT-------
In all my attempts to get oprofile working with various -rt kernels and various
versions of oprofile, I must have messed something up.

I just tried again using a fresh install of MRG plus acme's build of oprofile
and that combination seems to work fine for me.  I am going to close this bug out.

*** This bug has been marked as a duplicate of 40996 ***

Comment 2 IBM Bug Proxy 2008-03-25 21:08:26 UTC
------- Comment From jstultz.com 2008-03-25 17:06 EDT-------
Reopening as we need to track this into MRG.

Comment 3 IBM Bug Proxy 2008-03-31 12:08:41 UTC
------- Comment From sripathi.com 2008-03-31 08:00 EDT-------
RH has put an updated oprofile rpm in their repos. I installed this rpm
(oprofile-0.9.3-16.el5), but I continued to see the same error as before!! I
then installed http://userweb.kernel.org/~acme/oprofile-0.9.3-6.acme.x86_64.rpm
on the same machine and saw that the errors disappear.

So http://userweb.kernel.org/~acme/oprofile-0.9.3-6.acme.x86_64.rpm works, but
oprofile-0.9.3-16.el5 doesn't seem to.

Ankita, could you please verify this? In case I did something wrong...

Comment 4 William Cohen 2008-04-01 16:12:12 UTC
The difference between the two rpms is that the one that works has a patch that
check the /dev/oprofile/ for the various counter directories ([0-9]+). The
patched code generates a bit mask based on the counter directories that are
available and only puts the events in the available counters.

Comment 5 Clark Williams 2008-04-01 16:45:12 UTC
Sripathi,

Would you boot our latest kernel with nmi_watchdog=0 and see if the version of
oprofile delivered with MRG works?

Clark


Comment 7 William Cohen 2008-04-01 18:16:12 UTC
Created attachment 299937 [details]
Patch to make oprofile only use counters that are available

The attached patch was proposed in nov 2006. It didn't get any comments on it
and wasn't pushed into the upstream oprofile. Changes were made in the kernel,
so that this was no longer an apparent problem in the kernel after 2.6.19 (e.g.
unable to replicate problem on 2.6.24.3-50.f8 kernel with watchdog timer). The
RT kernel appears to be doing things differently.

Comment 10 Don Zickus 2008-04-01 21:20:09 UTC
After discussing this with Clark and trying the workaround successfully, I was
asked to summarize our results here.

- In RHEL-5 oprofile works because the oprofile-kernel piece disables the
nmi_watchdog allowing it to access all 4 performance counters (IOW either
nmi_watchdog or oprofile can run but not both)

- Upstream 2.6.19 and later, I added code that allows both oprofile and
nmi_watchdog to run together (at the cost of one perf counter).

- Fedora follows upstream and upstream has nmi_watchdog disabled by default, so
the oprofile-userspace patch was never needed (unless you enabled nmi_watchdog,
then it would be necessary)

- kernel-rt follows RHEL-5 and enables the nmi_watchdog by default thus causing
this bugzilla (and the need for the oprofile-userspace patch).

- testing with nmi_watchdog=1 on the boot commandline failed to show this
failure because nmi_watchdog=1 uses IO_APIC for the watchdog which does _not_
use perfcounters.  Booting with nmi_watchdog=2 uses the LOCAL_APIC and does use
the perfcounters

The temporary workaround to avoid using the userspace patch would be to boot the
kernel-rt normally and run the following commands to use oprofile

# echo 0 > /proc/sys/kernel/nmi_watchdog //disable nmi_watchdog
# *oprofile stuff*
# opcontrol --deinit  //unload oprofile driver module
# echo 1 > /proc/sys/kernel/nmi_watchdog  //re-enable nmi_watchdog

booting with nmi_watchdog=0 is not recommended if you would like to turn the
nmi_watchdog on later because 'echo 1 > /proc/sys/kernel/nmi_watchdog' has a bug
 that prevents turning on the nmi_watchdog for the first time.

I think that covers everything.

Cheers,
Don

Comment 11 IBM Bug Proxy 2008-04-02 22:24:42 UTC
------- Comment From mauery.com 2008-04-02 18:23 EDT-------
I have verified that the 'echo [01] > /proc/sys/kernel/nmi_watchdog' workaround
as described above works with the oprofile-0.9.3-16.el5 package.

Comment 12 Clark Williams 2008-04-07 14:56:26 UTC
We'll need to wait for a user-space update to opcontrol for a permanent fix. 

With a workaround in place, I'm going to drop the severity from urgent to medium

Clark


Comment 13 Clark Williams 2008-05-02 19:28:11 UTC
We've pushed the RHEL5.2 candidate rpm into the MRG beta repository and will
carry that until MRG RT rebases to RHEL5.2 (from RHEL5.1).



Comment 14 IBM Bug Proxy 2008-05-27 16:56:45 UTC
------- Comment From sripathi.com 2008-05-27 12:52 EDT-------
(In reply to comment #26)
> ------- Comment From williams 2008-05-02 15:28 EST-------
> We've pushed the RHEL5.2 candidate rpm into the MRG beta repository and will
> carry that until MRG RT rebases to RHEL5.2 (from RHEL5.1).

I don't see the rpm under
http://ftp.redhat.com/pub/redhat/linux/beta/MRG/RHEL-5/. Isn't that the place to
find it?

Comment 15 Clark Williams 2008-05-27 22:16:48 UTC
I believe that the 5.2 packages got pulled from the beta repository when RHEL5.2
GA'ed.  Do we need to put it back?

Comment 16 IBM Bug Proxy 2008-05-28 09:56:34 UTC
------- Comment From sripathi.com 2008-05-28 05:48 EDT-------
(In reply to comment #28)
> ------- Comment From williams 2008-05-27 18:16 EST-------
> I believe that the 5.2 packages got pulled from the beta repository when RHEL5.2
> GA'ed.  Do we need to put it back?

Yes, if MRG is going to be supported on RHEL5.1. I thought it was so.

Comment 17 Clark Williams 2008-06-02 22:32:02 UTC
oprofile packages back in the repository, closing

Comment 18 IBM Bug Proxy 2008-06-03 06:08:32 UTC
------- Comment From sripathi.com 2008-06-03 02:03 EDT-------
Closing on our side as well.


Note You need to log in before you can comment on or make changes to this bug.