Bug 483957
Summary: | [FOCUS] [DESTINY] oprofile does not work on nehalem chips | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | IBM Bug Proxy <bugproxy> |
Component: | realtime-kernel | Assignee: | Red Hat Real Time Maintenance <rt-maint> |
Status: | CLOSED NOTABUG | QA Contact: | David Sommerseth <davids> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 1.1 | CC: | bhu, ovasik, williams |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-09-12 19:21:34 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
IBM Bug Proxy
2009-02-04 11:40:33 UTC
Created attachment 330852 [details]
oprofile: Don't report nehalam as core_2
Created attachment 330853 [details]
oprofile: Implement Intel architectural perfmon support
Created attachment 330854 [details]
add Nehalam to list of ppro cores
Note to RH: The three patches attached to this bug are ported from Linus' tree to make oprofile work on MRG on Nehalem chips. We have been able to run oprofile with these patches(using cvs version of oprofile user space tools). However, when we stop and restart oprofile the system seems to crash/hang. We will try to get more information about this problem. So the patches above are required, but we probably need some more fixes to have a complete working oprofile on MRG on Nehalem. I'm trying this on elm3a112. I am trying to find the problem with Kiran's kernel. The java console was hardly useful. After a lot of juggling with SOL settings in BIOS, I got it working. Hopefully I will get much better information now. I recreated the problem with oprofile today and tried to grab more information from the panic that happens. However, so far I have not been able to get anything on the SOL. The panic text scrolls by on the Java console and there is no way to scroll back and see what the beginning of oops/panic was. I have kdump enabled, but it doesn't trigger. I have verified that kdump works fine on this hardware when I do "echo c >/proc/sysrq-trigger". Okay, I gathered a bit more information about the problem using whatever little text scrolls by on the Java console. I will attach the backtrace screenshot to this bug. The machine probably goes into a deadlock. The backtrace shows nmi_cpu_setup, trying to hold a spin lock. From the code it is possibly here: static void nmi_cpu_setup(void * dummy) { int cpu = smp_processor_id(); struct op_msrs * msrs = &cpu_msrs[cpu]; spin_lock(&oprofilefs_lock); <========= model->setup_ctrs(msrs); spin_unlock(&oprofilefs_lock); However, the backtrace shows a branch from nmi_cpu_setup+0x0/0x66, which doesn't make sense to me. Also, I can't see the top of the backtrace, so I can't be sure about what the problem is. Created attachment 333010 [details]
Console screen capture when the system 'hangs'.
Thinking of an alternative approach, I wanted to see if oprofile works as-is on 2.6.29-rc4-rt2-tip. However, with this kernel I saw the original problem reported by this bug. Also, trying to stop and restart oprofile resulted in a system hang. Hence it looks like the problem is much more deep rooted. I'll see how this behaves on non-rt kernels on this hardware. With 2.6.29-rc6-tip kernel (non-RT) oprofile worked just fine on this machine. However with 2.6.29-rc4-rt2-tip kernel it fails as I reported in my previous comment. Hence the problem seems to be RT specific. ------- Comment From kirpraka.com 2009-06-30 02:54 EDT------- I get the following error when I run oprofile on 2.6.29.4-23.el5rt MRG kernel [root@elm3a112 kiran]# opcontrol --start cpu_type 'unset' is not valid you should upgrade oprofile or force the use of timer mode cpu_type 'unset' is not valid you should upgrade oprofile or force the use of timer mode the oprofile I'm using is 0.9.3. I'll try upgrading oprofile to 0.9.4 and check if it works. ------- Comment From kirpraka.com 2009-07-01 07:39 EDT------- Tried with oprofile 0.9.5 cvs version. The bug has been fixed. It worked fine on 2.6.29 based MRG kernel 2.6.29.4-23.el5rt and mainline RT kernel 2.6.29.5-rt22. ------- Comment From dvhltc.com 2009-07-01 14:49 EDT------- Let's discuss at our next MRG Interlock (Jul 13). I plan to look for every rt bug with a RH bug number in the title on that call. However, it wouldn't hurt to send a pointer to this bug to the rhel-rt-ibm mailing list, Cc'ing Clark Williams, and asking how they would like to proceed. ------- Comment From kirpraka.com 2009-10-23 02:13 EDT------- I tested oprofile on MRG1.2 based on RHEL 5.4 and it works perfectly fine without any errors. ------- Comment From sripathik.com 2009-10-26 11:10 EDT------- Okay, so we can close this bug. |