Description of problem: The customer's dual-core box is occasionally hanging. A core dump shows both CPUs are waiting to acquire bdev_lock. Further inspection reveals bdev_lock seems to have been acquired with two waiters, leading us to infer one of the CPUs acquired the lock twice. Version-Release number of selected component (if applicable): 2.6.9-34.0.2.ELsmp How reproducible: Seems to happen with some regularity in the customer's environment. Steps to Reproduce: Unknown Actual results: Double acquisition Expected results: Correct locking Additional info: The lock is used only in a few places, and all of them are balanced. It doesn't look like there was an interrupt that allowed the double-acquisition to happen.
Both vmcores shows the double-lock problem pretty clearly. The si_meminfo() function calls nr_blockdev_pages(), which takes the lock: PID: 3533 TASK: cd06a630 CPU: 1 COMMAND: "cmahostd" #0 [c03cbf1c] smp_call_function_interrupt at c0116cab #1 [c03cbf24] call_function_interrupt at c02d30a9 EAX: c038ff00 EBX: c038ff00 ECX: c0326f00 EDX: 00000000 EBP: c03cbfcc DS: 007b ESI: f8b763c1 ES: 007b EDI: ffffffff CS: 0060 EIP: c02d122f ERR: fffffffb EFLAGS: 00000282 #2 [c03cbf58] _spin_lock at c02d122f #3 [c03cbf60] nr_blockdev_pages at c0160c76 #4 [c03cbf68] si_meminfo at c0143d21 #5 [c03cbf70] update_defense_level at f8b761dd #6 [c03cbfc0] defense_timer_handler at f8b763c1 #7 [c03cbfc4] run_timer_softirq at c012a2cf #8 [c03cbfe8] __do_softirq at c0126752 --- <soft IRQ> --- #0 [f7ddcde8] do_softirq at c0108143 #1 [f7ddcdf0] smp_apic_timer_interrupt at c0117483 #2 [f7ddcdf8] apic_timer_interrupt at c02d30c9 #3 [f7ddce34] si_meminfo at c0143d21 #4 [f7ddce3c] meminfo_read_proc at c0189753 #5 [f7ddcf50] proc_file_read at c0187c2c #6 [f7ddcf88] vfs_read at c015a41a #7 [f7ddcfa4] sys_read at c015a62b #8 [f7ddcfc0] system_call at c02d2688 EAX: 00000003 EBX: 00000003 ECX: b7ff3000 EDX: 00000400 DS: 007b ESI: 080a76a8 ES: 007b EDI: 00000000 SS: 007b ESP: bffff9ec EBP: bffffa08 CS: 0073 EIP: 00ba87a2 ERR: 00000003 EFLAGS: 00000246 crash> In the trace above, si_meminfo() in frame #3 has called nr_blockdev_pages(): crash> dis -r c0143d21 0xc0143d07 <si_meminfo>: push %ebx 0xc0143d08 <si_meminfo+1>: mov %eax,%ebx 0xc0143d0a <si_meminfo+3>: mov 0xc0436f3c,%eax 0xc0143d0f <si_meminfo+8>: movl $0x0,0x18(%ebx) 0xc0143d16 <si_meminfo+15>: mov %eax,0x10(%ebx) 0xc0143d19 <si_meminfo+18>: call 0xc0143ae8 <nr_free_pages> 0xc0143d1e <si_meminfo+23>: mov %eax,0x14(%ebx) 0xc0143d21 <si_meminfo+26>: call 0xc0160c6e <nr_blockdev_pages> crash> and took a timer interrupt while holding the bdev_lock in nr_blockdev_pages(). After servicing the timer interrupt, it performed soft IRQ handling, and the defense_timer_handler() calls update_defense_level() which called si_meminfo() again. Comparing the callers of update_defense_level() in the hanging version of RHEL4 (2.6.9-34.0.2) vs. what's in the current RHEL4 tree, there is this: RHEL4 -- 2.6.9-34.0.2: Functions calling this function: update_defense_level File Function Line 0 ipv4/ipvs/ip_vs_ctl.c defense_timer_handler 219 update_defense_level(); 1 ipv4/ipvs/ip_vs_ctl.c proc_do_defense_mode 1363 update_defense_level(); RHEL4 -- currently in CVS: Functions calling this function: update_defense_level File Function Line 0 ipv4/ipvs/ip_vs_ctl.c defense_work_handler 229 update_defense_level(); 1 ipv4/ipvs/ip_vs_ctl.c proc_do_defense_mode 1369 update_defense_level(); So in 2.6.9-34.0.2, it is called from a timer handler in soft IRQ context, (allowing for the potential double-lock scenario), whereas in current RHEL4, it's called from a work queue, which only runs in process context. The RHEL4 patch that fixed is Thomas Graf's "linux-2.6.12-network.patch", which went into 2.6.9-34.5, and was related to BZ #174990: * Fri Mar 17 2006 Jason Baron <jbaron> [2.6.9-34.5] -add skge net driver (John Linville) [157247 167768] -knfsd: improve hasing function (Steve Dickson) [176173] -revert: USB storage change which breaks remote installs using an IBM RSAII adapter (Mike Gahagan) [178271] -Move ip_vs defense work to keventd (Thomas Graf) [174990] -Fix [rw]mem_max < [rw]mem_default (Thomas Graf) [174709] -Remove CAP_NET_ADMIN requirement for INFOQUERY ioctl (Thomas Graf) [174833] -device-mapper mirrors: fix missing monitoring workqueue destruction (Alasdair Kergon) [180138] -ia64: fix system crash (Anil Keshavamurthy) [183495] Time for a customer upgrade...
RHKL post: http://post-office.corp.redhat.com/archives/rhkernel-list/2006-January/msg00271.html * From: Thomas Graf <tgraf redhat com> * To: rhkernel-list redhat com * Subject: [RHEL4 U4 BZ174990]: Move ip_vs defense work to keventd * Date: Mon, 23 Jan 2006 11:09:50 +0100 update_defense_level() called from timer context uses si_meminfo() which is not irq-safe. Move update_defense_level() to keventd and add/reorder the locks accordingly. The patch is a combination of Andrew Morton's original fix including Roland Dreier's feedback and the follow up fix from Julian Anastasov to add/reorder the locks all upstream for quite some time. ...
> The RHEL4 patch that fixed is Thomas Graf's "linux-2.6.12-network.patch", > which went into 2.6.9-34.5, and was related to BZ #174990: FYI: Thomas's patch for this issue ended up being a patch-to-a-patch, updating the monstrous "linux-2.6.12-network.patch" with the piece shown in his RHKL post above.
*** This bug has been marked as a duplicate of bug 174990 ***