=Comment: #0================================================= Vernon Mauery <mauery.com> - 2008-03-18 17:14 EDT When trying to run oprofile, I see this: [root@elm3b198 ~]# opcontrol --reset Signalling daemon... done [root@elm3b198 ~]# opcontrol --shutdown Stopping profiling. Killing daemon. [root@elm3b198 ~]# opcontrol --vmlinux=/usr/lib/debug/lib/modules/2.6.24.3-29.el5rt/vmlinux [root@elm3b198 ~]# opcontrol --start Using default event: CPU_CLK_UNHALTED:100000:0:1:1 /usr/bin/opcontrol: line 1031: /dev/oprofile/0/enabled: No such file or directory /usr/bin/opcontrol: line 1031: /dev/oprofile/0/event: No such file or directory /usr/bin/opcontrol: line 1031: /dev/oprofile/0/count: No such file or directory /usr/bin/opcontrol: line 1031: /dev/oprofile/0/kernel: No such file or directory /usr/bin/opcontrol: line 1031: /dev/oprofile/0/user: No such file or directory /usr/bin/opcontrol: line 1031: /dev/oprofile/0/unit_mask: No such file or directory Using 2.6+ OProfile kernel interface. Reading module info. Using log file /var/lib/oprofile/samples/oprofiled.log Daemon started. Profiler running. [root@elm3b198 ~]# opcontrol --stop Stopping profiling. [root@elm3b198 ~]# opreport --long-filenames opreport error: No sample file found: try running opcontrol --dump or specify a session containing sample files This bug exhibits similar symptoms to bug #40996, but the patches that were applied to the kernel to fix that bug are already in the R2/MRG kernels. I have seen this running 2.6.24.3-29.el5rt and Alan Stevens has seen it running 2.6.24.1-24ibmrt. =Comment: #1================================================= Vernon Mauery <mauery.com> - 2008-03-18 17:22 EDT Since this blocks further investigation of Bug #35584 - RH244819-Multiple streams degrades network performance, I am marking it as blocking. =Comment: #2================================================= Ankita Garg <ankigarg.com> - 2008-03-19 06:20 EDT Vernon, could you check if oprofile works for you with a version of ooprofile package that had acme had provided sometime back: http://oops.ghostprotocols.net:81/acme/oprofile-0.9.3-6.acme.x86_64.rpm or can be obtained from here: http://userweb.kernel.org/~acme/oprofile-0.9.3-6.acme.x86_64.rpm =Comment: #3================================================= Ankita Garg <ankigarg.com> - 2008-03-19 06:21 EDT On HS21 & LS41, oprofile is working fine with the latest R2 kernel without any userspace package changes. I have not tried with the MRG kernel though. =Comment: #4================================================= Vernon Mauery <mauery.com> - 2008-03-19 16:06 EDT I tried using acme's build of oprofile. It didn't seem to fix the problem. I also tried downloading the source and building it myself. I always saw the same errors. This is on an LS21. I don't recall what machine type Alan was working on. =Comment: #5================================================= Alan P. Stevens <alan_stevens.com> - 2008-03-20 05:41 EDT (In reply to comment #4) > This is on an LS21. I don't recall what machine type Alan was working on. Vernon, I'm also on an LS21. NOTE: Since the Austin Performance Tools tprof is also broken on MRG kernels ( because of the variable tick interval ) I now have no working profilers for MRG / V2... =Comment: #6================================================= Ankita Garg <ankigarg.com> - 2008-03-20 05:53 EDT I tried with the oprofile rpms in comment #2 on a LS21 on top of R2. With this version of the rpm, oprofile worked fine for me. I could not try it on MRG yet, but since R2 is based on MRG, wonder what could be missing.
------- Comment From mauery.com 2008-03-25 16:00 EDT------- In all my attempts to get oprofile working with various -rt kernels and various versions of oprofile, I must have messed something up. I just tried again using a fresh install of MRG plus acme's build of oprofile and that combination seems to work fine for me. I am going to close this bug out. *** This bug has been marked as a duplicate of 40996 ***
------- Comment From jstultz.com 2008-03-25 17:06 EDT------- Reopening as we need to track this into MRG.
------- Comment From sripathi.com 2008-03-31 08:00 EDT------- RH has put an updated oprofile rpm in their repos. I installed this rpm (oprofile-0.9.3-16.el5), but I continued to see the same error as before!! I then installed http://userweb.kernel.org/~acme/oprofile-0.9.3-6.acme.x86_64.rpm on the same machine and saw that the errors disappear. So http://userweb.kernel.org/~acme/oprofile-0.9.3-6.acme.x86_64.rpm works, but oprofile-0.9.3-16.el5 doesn't seem to. Ankita, could you please verify this? In case I did something wrong...
The difference between the two rpms is that the one that works has a patch that check the /dev/oprofile/ for the various counter directories ([0-9]+). The patched code generates a bit mask based on the counter directories that are available and only puts the events in the available counters.
Sripathi, Would you boot our latest kernel with nmi_watchdog=0 and see if the version of oprofile delivered with MRG works? Clark
Created attachment 299937 [details] Patch to make oprofile only use counters that are available The attached patch was proposed in nov 2006. It didn't get any comments on it and wasn't pushed into the upstream oprofile. Changes were made in the kernel, so that this was no longer an apparent problem in the kernel after 2.6.19 (e.g. unable to replicate problem on 2.6.24.3-50.f8 kernel with watchdog timer). The RT kernel appears to be doing things differently.
After discussing this with Clark and trying the workaround successfully, I was asked to summarize our results here. - In RHEL-5 oprofile works because the oprofile-kernel piece disables the nmi_watchdog allowing it to access all 4 performance counters (IOW either nmi_watchdog or oprofile can run but not both) - Upstream 2.6.19 and later, I added code that allows both oprofile and nmi_watchdog to run together (at the cost of one perf counter). - Fedora follows upstream and upstream has nmi_watchdog disabled by default, so the oprofile-userspace patch was never needed (unless you enabled nmi_watchdog, then it would be necessary) - kernel-rt follows RHEL-5 and enables the nmi_watchdog by default thus causing this bugzilla (and the need for the oprofile-userspace patch). - testing with nmi_watchdog=1 on the boot commandline failed to show this failure because nmi_watchdog=1 uses IO_APIC for the watchdog which does _not_ use perfcounters. Booting with nmi_watchdog=2 uses the LOCAL_APIC and does use the perfcounters The temporary workaround to avoid using the userspace patch would be to boot the kernel-rt normally and run the following commands to use oprofile # echo 0 > /proc/sys/kernel/nmi_watchdog //disable nmi_watchdog # *oprofile stuff* # opcontrol --deinit //unload oprofile driver module # echo 1 > /proc/sys/kernel/nmi_watchdog //re-enable nmi_watchdog booting with nmi_watchdog=0 is not recommended if you would like to turn the nmi_watchdog on later because 'echo 1 > /proc/sys/kernel/nmi_watchdog' has a bug that prevents turning on the nmi_watchdog for the first time. I think that covers everything. Cheers, Don
------- Comment From mauery.com 2008-04-02 18:23 EDT------- I have verified that the 'echo [01] > /proc/sys/kernel/nmi_watchdog' workaround as described above works with the oprofile-0.9.3-16.el5 package.
We'll need to wait for a user-space update to opcontrol for a permanent fix. With a workaround in place, I'm going to drop the severity from urgent to medium Clark
We've pushed the RHEL5.2 candidate rpm into the MRG beta repository and will carry that until MRG RT rebases to RHEL5.2 (from RHEL5.1).
------- Comment From sripathi.com 2008-05-27 12:52 EDT------- (In reply to comment #26) > ------- Comment From williams 2008-05-02 15:28 EST------- > We've pushed the RHEL5.2 candidate rpm into the MRG beta repository and will > carry that until MRG RT rebases to RHEL5.2 (from RHEL5.1). I don't see the rpm under http://ftp.redhat.com/pub/redhat/linux/beta/MRG/RHEL-5/. Isn't that the place to find it?
I believe that the 5.2 packages got pulled from the beta repository when RHEL5.2 GA'ed. Do we need to put it back?
------- Comment From sripathi.com 2008-05-28 05:48 EDT------- (In reply to comment #28) > ------- Comment From williams 2008-05-27 18:16 EST------- > I believe that the 5.2 packages got pulled from the beta repository when RHEL5.2 > GA'ed. Do we need to put it back? Yes, if MRG is going to be supported on RHEL5.1. I thought it was so.
oprofile packages back in the repository, closing
------- Comment From sripathi.com 2008-06-03 02:03 EDT------- Closing on our side as well.