Bug 190616 - SMP spinlock problems with multiple devices
SMP spinlock problems with multiple devices
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-05-03 18:58 EDT by Alan McIvor
Modified: 2007-11-30 17:11 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-10-01 17:43:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alan McIvor 2006-05-03 18:58:52 EDT
Description of problem:

On a P4-3HZ machine running a SMP kernel, if there are multiple
devices in the machine, the system sooner or later panics as
shown below. With small numbers of devices (<5) it does not appear
to happen at all. With 10 or more devices it happens straight away.

There is a unique fgBT878 process for each device. If the process is
locked to a CPU by setting its affinity then the problem does not
occur. 

No problems occur when running a UP kernel. Nor do they happen with
Fedora Core 2.


BUG: spinlock wrong CPU on CPU#1, fgBT878/1849 (Not tainted)
 lock: d01b7ef0, .magic: dead4ead, .owner: fgBT878/1849, .owner_cpu: 0
 [<c01d6501>] spin_bug+0x87/0xe9     [<c01d65ba>] _raw_spin_unlock+0x57/0x6c
 [<c02f2232>] _spin_unlock_irqrestore+0x8/0xc     [<d01b0ab0>]
bttv_do_ioctl+0x3d7/0x74e [bt878_c]
 [<c011d6a9>] activate_task+0x9d/0xaa     [<c011da09>] try_to_wake_up+0x353/0x35d
 [<d01aa34d>] video_usercopy+0x119/0x18d [videodev]     [<c011c78f>]
__wake_up_common+0x2f/0x53
 [<c02f2156>] __down+0xce/0x107     [<c0136bb7>] hrtimer_cancel+0xa/0x10
 [<c02f1b37>] schedule_hrtimer+0x33/0x6e     [<c011da13>]
default_wake_function+0x0/0xc
 [<d01b06d6>] bttv_ioctl+0xe/0x11 [bt878_c]     [<d01b06d9>]
bttv_do_ioctl+0x0/0x74e [bt878_c]
 [<c01717c3>] do_ioctl+0x47/0x5d     [<c0171a23>] vfs_ioctl+0x24a/0x25c
 [<c0171a7d>] sys_ioctl+0x48/0x5f     [<c0103d25>] sysenter_past_esp+0x56/0x79
Kernel panic - not syncing: bad locking
 [<c01234b6>] panic+0x3e/0x174     [<c01d6524>] spin_bug+0xaa/0xe9
 [<c01d65ba>] _raw_spin_unlock+0x57/0x6c     [<c02f2232>]
_spin_unlock_irqrestore+0x8/0xc
 [<d01b0ab0>] bttv_do_ioctl+0x3d7/0x74e [bt878_c]     [<c011d6a9>]
activate_task+0x9d/0xaa
 [<c011da09>] try_to_wake_up+0x353/0x35d     [<d01aa34d>]
video_usercopy+0x119/0x18d [videodev]
 [<c011c78f>] __wake_up_common+0x2f/0x53     [<c02f2156>] __down+0xce/0x107
 [<c0136bb7>] hrtimer_cancel+0xa/0x10     [<c02f1b37>] schedule_hrtimer+0x33/0x6e
 [<c011da13>] default_wake_function+0x0/0xc     [<d01b06d6>] bttv_ioctl+0xe/0x11
[bt878_c]
 [<d01b06d9>] bttv_do_ioctl+0x0/0x74e [bt878_c]     [<c01717c3>] do_ioctl+0x47/0x5d
 [<c0171a23>] vfs_ioctl+0x24a/0x25c     [<c0171a7d>] sys_ioctl+0x48/0x5f
 [<c0103d25>] sysenter_past_esp+0x56/0x79
     <0>BUG: spinlock lockup on CPU#0, swapper/0, d01b7ef0 (Not tainted)
 [<c01d6727>] _raw_spin_lock+0xb9/0xd7
 [<d01b0263>] bttv_irq+0x36d/0x42e [bt878_c]     [<c01456fe>]
handle_IRQ_event+0x23/0x4c
 [<c01457b4>] __do_IRQ+0x8d/0xdd     [<c0105e8e>] do_IRQ+0x60/0x7b
 =======================
 [<c010474e>] common_interrupt+0x1a/0x20     [<c0102f2e>] mwait_idle+0x1f/0x33
 [<c0102ef6>] cpu_idle+0x8f/0xa8     [<c03c8715>] start_kernel+0x2fe/0x304

Version-Release number of selected component (if applicable):

kernel-2.6.15-1.2054_FC5smp

Also happens using 2.6.16-1.2096_FC5smp

How reproducible:

It always happens.

Steps to Reproduce:
1. Put 10 or more devices in machine
2. Start processing
  
Actual results:

Panic as described above

Expected results:

Normal operation


Additional info:

The fgBT878 processes run SCHED_FIFO 
at the highest level.
Comment 1 Alan McIvor 2006-10-01 17:43:17 EDT
The problem was a fault in the condition being passed to
wait_event_interruptible_timeout().

Note You need to log in before you can comment on or make changes to this bug.