Bug 214832
Summary: | getting unlock_cpu_hotplug warning at boot on rhel5-b2 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Matthew Coffey <mcoffey> |
Component: | kernel | Assignee: | Prarit Bhargava <prarit> |
Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 5.0 | CC: | jason_mack, jfeeney, rhentosh |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | athlon | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-11-29 20:12:23 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 200812 |
Description
Matthew Coffey
2006-11-09 17:43:21 UTC
RHEL5-Beta2-x86_64: BUG: warning at kernel/cpu.c:56/unlock_cpu_hotplug() Description: While running ad hoc overnight stress testing PE1950 failed the Newburn test on RHEL5-Beta2-x86_64. Later the same issue was observed on PE840. Dell's CTCS Newburn SYSLOG: >> Nov 16 06:39:34 pe1950-r5-b2-rc kernel: BUG: warning at kernel/cpu.c:56/unlock_cpu_hotplug() (Not tainted) Thu Nov 16 06:40:18 CST 2006: SYSLOG FAILED: on 4/0 after 10h28m1s 4 fail 0 succeed 4 count While the PE1950 was very sluggish until Newburn exited (cleanly), the system at no time locked. It rebooted fine, and all lights are blue. BIOS is at v1.3.0. kernel is 2.6.18-1.2747.el5. Mem is at 2.5 GB using 2x 256MB and 2x 1024MB DIMMs. Steps to Re-Create: 1: PE1950, w/ 2 Quad-core Intel CPU's install and run RHEL5-B2 x86_64. 2: Using Newburn from Dell's RHEL(4) CTCS, run stress overnight. 3: Next morning, system has flashing fail messages on screen, from Newburn. I also see this issue on PE840, another Xeon system. Also, it seems the same on every arch; I can see at least 5 reports here on RH's Bugzilla. In all these cases the function unlock_cpu_hotplug, line 56 is seen as the cause of the BUG. The issue comes to cpu.c's line 56, implying a common name as 'line 56 bug.' During subsequent testing, I noted that this bug is not specific to the Newburn test. The trace looks like this: Nov 16 06:39:34 pe1950-r5-b2-rc kernel: BUG: warning at kernel/cpu.c:56/unlock_cpu_hotplug() (Not tainted) Nov 16 06:39:34 pe1950-r5-b2-rc kernel: Nov 16 06:39:34 pe1950-r5-b2-rc kernel: Call Trace: Nov 16 06:39:34 pe1950-r5-b2-rc kernel: [<ffffffff80069632>] show_trace+0x34/0x47 Nov 16 06:39:34 pe1950-r5-b2-rc kernel: [<ffffffff80069657>] dump_stack+0x12/0x17 Nov 16 06:39:34 pe1950-r5-b2-rc kernel: [<ffffffff800a0c60>] unlock_cpu_hotplug+0x47/0x74 Nov 16 06:39:34 pe1950-r5-b2-rc kernel: [<ffffffff882e52aa>] :cpufreq_ondemand:do_dbs_timer+0x11c/0x174 Nov 16 06:39:34 pe1950-r5-b2-rc kernel: [<ffffffff8004b5cc>] run_workqueue+0x94/0xe5 Nov 16 06:39:34 pe1950-r5-b2-rc kernel: [<ffffffff80048018>] worker_thread+0xf0/0x122 Nov 16 06:39:34 pe1950-r5-b2-rc kernel: [<ffffffff800322e7>] kthread+0xf6/0x12a Nov 16 06:39:34 pe1950-r5-b2-rc kernel: [<ffffffff8005c365>] child_rip+0xa/0x11 Nov 16 06:39:34 pe1950-r5-b2-rc kernel: DWARF2 unwinder stuck at child_rip+0xa/0x11 Nov 16 06:39:35 pe1950-r5-b2-rc kernel: Leftover inexact backtrace: Nov 16 06:39:36 pe1950-r5-b2-rc kernel: [<ffffffff8009c368>] keventd_create_kthread+0x0/0x61 Nov 16 06:39:36 pe1950-r5-b2-rc kernel: [<ffffffff800321f1>] kthread+0x0/0x12a Nov 16 06:39:37 pe1950-r5-b2-rc kernel: [<ffffffff8005c35b>] child_rip+0x0/0x11 (Note: b2-rc=b2) In cpu.c the function is: void unlock_cpu_hotplug(void) { WARN_ON(recursive != current); if (recursive_depth) { recursive_depth--; return; } mutex_unlock(&cpu_bitmask_lock); recursive = NULL; } EXPORT_SYMBOL_GPL(unlock_cpu_hotplug); It is the second function in cpu.c. Some proposed fixes for this bug are seen in Bug 211301, for the ia64 platform. Oh. Line 56 is: WARN_ON(recursive != current); Well, ok, the "The cpu.c line 56 bug". So what else can I do to help? Prarit please review this issue, bring in Konrad as required. If possible resolve this for R4.5. Dup of 213455. *** This bug has been marked as a duplicate of 213455 *** |