Bug 662384

Summary: Pid: 797, comm: ivtv0 Not tainted 2.6.37-0.rc5.git2.1.fc15.x86_64 #1
Product: [Fedora] Fedora Reporter: Nicolas Mailhot <nicolas.mailhot>
Component: kernelAssignee: Jarod Wilson <jarodwilson>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: awalls, dougsland, gansalmon, itamar, jarod, jonathan, kernel-maint, kmcmartin, madhu.chinakonda, mchehab
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-21 20:08:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg
none
lspci
none
Patch to get rid of lockdep's initial INFO gripe
none
dmesg for subsequent potential problem lockdep detects none

Description Nicolas Mailhot 2010-12-12 09:54:57 UTC
[   22.850699] ivtv: Start initialization, version 1.4.2
[   22.850789] ivtv0: Initializing card 0
[   22.850791] ivtv0: Autodetected Hauppauge card (cx23416 based)
[   22.852236] INFO: trying to register non-static key.
[   22.852259] the code is fine but needs lockdep annotation.
[   22.852279] turning off the locking correctness validator.
[   22.852299] Pid: 797, comm: ivtv0 Not tainted 2.6.37-0.rc5.git2.1.fc15.x86_64 #1
[   22.852324] Call Trace:
[   22.852339]  [<ffffffff8107fb73>] register_lock_class+0x102/0x2e2
[   22.852361]  [<ffffffff81082108>] ? __lock_acquire+0x56a/0xd65
[   22.852383]  [<ffffffff81011269>] ? paravirt_read_tsc+0x9/0xd
[   22.852404]  [<ffffffff81081c86>] __lock_acquire+0xe8/0xd65
[   22.852424]  [<ffffffff8107fa16>] ? lock_release_holdtime+0x54/0x5b
[   22.852448]  [<ffffffff814b4ca0>] ? _raw_spin_unlock_irq+0x30/0x3a
[   22.852470]  [<ffffffff810831f1>] ? trace_hardirqs_on_caller+0x10b/0x12f
[   22.852494]  [<ffffffff8106ebab>] ? kthread_worker_fn+0xa9/0x14c
[   22.852516]  [<ffffffff81082dd2>] lock_acquire+0xc8/0xf3
[   22.852535]  [<ffffffff8106ebab>] ? kthread_worker_fn+0xa9/0x14c
[   22.852558]  [<ffffffff814b4491>] _raw_spin_lock_irq+0x40/0x75
[   22.852579]  [<ffffffff8106ebab>] ? kthread_worker_fn+0xa9/0x14c
[   22.852600]  [<ffffffff8106ebab>] kthread_worker_fn+0xa9/0x14c
[   22.852621]  [<ffffffff8106eb02>] ? kthread_worker_fn+0x0/0x14c
[   22.852642]  [<ffffffff8106ecee>] kthread+0xa0/0xa8
[   22.852660]  [<ffffffff810831f1>] ? trace_hardirqs_on_caller+0x10b/0x12f
[   22.852685]  [<ffffffff8100bb24>] kernel_thread_helper+0x4/0x10
[   22.852706]  [<ffffffff814b5110>] ? restore_args+0x0/0x30
[   22.852725]  [<ffffffff8106ec4e>] ? kthread+0x0/0xa8
[   22.852744]  [<ffffffff8100bb20>] ? kernel_thread_helper+0x0/0x10

Comment 1 Nicolas Mailhot 2010-12-12 09:56:16 UTC
Created attachment 468200 [details]
dmesg

Comment 2 Nicolas Mailhot 2010-12-12 09:56:46 UTC
Created attachment 468201 [details]
lspci

Comment 3 Jarod Wilson 2010-12-17 22:00:59 UTC
Hm... This is probably some fallout from some of the locking that has been redone upstream as a result of the BKL getting killed... Mauro, does this look familiar to you, by chance? (And even better, do we have a patch for it upstream already?).

Comment 4 Mauro Carvalho Chehab 2010-12-17 23:23:26 UTC
(In reply to comment #3)
> Hm... This is probably some fallout from some of the locking that has been
> redone upstream as a result of the BKL getting killed... Mauro, does this look
> familiar to you, by chance? (And even better, do we have a patch for it
> upstream already?).

This bug is new to me. It may be related to BKL removal stuff. Those patches are somewhat aggressive. Let's see if awalls have some ideas about it.

Comment 5 Mauro Carvalho Chehab 2010-12-17 23:28:00 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Hm... This is probably some fallout from some of the locking that has been
> > redone upstream as a result of the BKL getting killed... Mauro, does this look
> > familiar to you, by chance? (And even better, do we have a patch for it
> > upstream already?).
> 
> This bug is new to me. It may be related to BKL removal stuff. Those patches
> are somewhat aggressive. Let's see if awalls have some ideas about it.

Btw, as Linus merged the BKL fixes upstream today, the better is to rebase rawhide kernel with today's patches and see if this fixes the issue.

Comment 6 Nicolas Mailhot 2010-12-18 10:34:28 UTC
[   21.769255] ivtv: Start initialization, version 1.4.2
[   21.769347] ivtv0: Initializing card 0
[   21.769349] ivtv0: Autodetected Hauppauge card (cx23416 based)
[   21.771615] INFO: trying to register non-static key.
[   21.771640] the code is fine but needs lockdep annotation.
[   21.771661] turning off the locking correctness validator.
[   21.771684] Pid: 776, comm: ivtv0 Not tainted 2.6.37-0.rc6.git0.1.fc15.x86_64 #1
[   21.771712] Call Trace:
[   21.771728]  [<ffffffff8107fce7>] register_lock_class+0x102/0x2e2
[   21.771750]  [<ffffffff8108227c>] ? __lock_acquire+0x56a/0xd65
[   21.771775]  [<ffffffff81011269>] ? paravirt_read_tsc+0x9/0xd
[   21.771796]  [<ffffffff81081dfa>] __lock_acquire+0xe8/0xd65
[   21.771816]  [<ffffffff8107fb8a>] ? lock_release_holdtime+0x54/0x5b
[   21.771840]  [<ffffffff814b4d70>] ? _raw_spin_unlock_irq+0x30/0x3a
[   21.771864]  [<ffffffff81083365>] ? trace_hardirqs_on_caller+0x10b/0x12f
[   21.771890]  [<ffffffff8106ed1f>] ? kthread_worker_fn+0xa9/0x14c
[   21.771915]  [<ffffffff81082f46>] lock_acquire+0xc8/0xf3
[   21.771939]  [<ffffffff8106ed1f>] ? kthread_worker_fn+0xa9/0x14c
[   21.771962]  [<ffffffff814b4561>] _raw_spin_lock_irq+0x40/0x75
[   21.771984]  [<ffffffff8106ed1f>] ? kthread_worker_fn+0xa9/0x14c
[   21.772005]  [<ffffffff8106ed1f>] kthread_worker_fn+0xa9/0x14c
[   21.772005]  [<ffffffff8106ec76>] ? kthread_worker_fn+0x0/0x14c
[   21.772005]  [<ffffffff8106ee62>] kthread+0xa0/0xa8
[   21.772005]  [<ffffffff81083365>] ? trace_hardirqs_on_caller+0x10b/0x12f
[   21.772005]  [<ffffffff8100bb24>] kernel_thread_helper+0x4/0x10
[   21.772005]  [<ffffffff814b51d0>] ? restore_args+0x0/0x30
[   21.772005]  [<ffffffff8106edc2>] ? kthread+0x0/0xa8
[   21.772005]  [<ffffffff8100bb20>] ? kernel_thread_helper+0x0/0x10
[   21.772307] ivtv 0000:06:01.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[   21.772339] ivtv0: Unreasonably low latency timer, setting to 64 (was 32)
[   21.827155] tveeprom 6-0050: The eeprom says no radio is present, but the tuner type
[   21.827185] tveeprom 6-0050: indicates otherwise. I will assume that radio is present.
[   21.827212] tveeprom 6-0050: Hauppauge model 26039, rev C1A5, serial# 8203263
[   21.827236] tveeprom 6-0050: tuner model is TCL MPE05-2 (idx 105, type 38)
[   21.827260] tveeprom 6-0050: TV standards PAL(B/G) PAL(I) SECAM(L/L') PAL(D/D1/K) (eeprom 0x74)
[   21.827288] tveeprom 6-0050: audio processor is CX25842 (idx 36)
[   21.827309] tveeprom 6-0050: decoder processor is CX25842 (idx 29)
[   21.827330] tveeprom 6-0050: has radio, has IR receiver, has IR transmitter
[   21.827353] ivtv0: Autodetected Hauppauge WinTV PVR-150
[   21.827371] ivtv0: Reopen i2c bus for IR-blaster support
[   22.160241] cx25840 6-0044: cx25842-23 found @ 0x88 (ivtv i2c driver #0)

Comment 7 Andy Walls 2010-12-18 16:49:43 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Hm... This is probably some fallout from some of the locking that has been
> > redone upstream as a result of the BKL getting killed... Mauro, does this look
> > familiar to you, by chance? (And even better, do we have a patch for it
> > upstream already?).
> 
> This bug is new to me. It may be related to BKL removal stuff. Those patches
> are somewhat aggressive. Let's see if awalls have some ideas about it.

This is not BKL related.

This is not a critical ivtv bug, the driver code is correct.

This is a lockdep gripe on how a kernel API improperly initializes a spinlock.
This lockdep gripe is related to Tejun Heo's improved implementation of kenrel
workqueues for handling deferred work.

Specifically ivtv had a single threaded workqueue with real-time
priority for timing critical contraints on VBI data over I2C.  Tejun had to
convert it to a kthread worker with realtime priority.  The new workqueue
implementation uses pools of workers, and real-time priority obviously can't be
designated on an ivtv work object. Hence the dedicated worker thread for each
ivtv card instance.

The problem appears to be init_kthread_worker() and KTHREAD_WORKER_INIT() using
initializers for static spin_lock objects (SPIN_LOCK_UNLOCKED) on the
dynamically allocated, per ivtv card instance, itv->irq_worker object's 
internal spin lock.  I think it should be using spin_lock_init() for
those objects' internal locks.

I could hack up a patch if needed.  However Tejun or someone familiar with
kthread worker internals and/or lockdep would turn around an optimal and
correct solution in shorter time.

BTW when did Fedora start shiping with all the lock validation enabled in the
config?

Regards,
Andy

Comment 8 Nicolas Mailhot 2010-12-18 17:28:17 UTC
(In reply to comment #7)

> BTW when did Fedora start shiping with all the lock validation enabled in the
> config?

This is a fedora-devel kernel (aka rawhide). They always have all sorts of snail/debugging options activated

Comment 9 Andy Walls 2010-12-18 21:47:04 UTC
Created attachment 469544 [details]
Patch to get rid of lockdep's initial INFO gripe

Comment 10 Andy Walls 2010-12-18 21:48:18 UTC
Created attachment 469545 [details]
dmesg for subsequent potential problem lockdep detects

Comment 11 Andy Walls 2010-12-18 21:53:35 UTC
(In reply to comment #7)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > Hm... This is probably some fallout from some of the locking that has been
> > > redone upstream as a result of the BKL getting killed... Mauro, does this look
> > > familiar to you, by chance? (And even better, do we have a patch for it
> > > upstream already?).
> > 
> > This bug is new to me. It may be related to BKL removal stuff. Those patches
> > are somewhat aggressive. Let's see if awalls have some ideas about it.
> 

> This is a lockdep gripe on how a kernel API improperly initializes a spinlock.
> This lockdep gripe is related to Tejun Heo's improved implementation of kenrel
> workqueues for handling deferred work.
 
> The problem appears to be init_kthread_worker() and KTHREAD_WORKER_INIT() using
> initializers for static spin_lock objects (SPIN_LOCK_UNLOCKED) on the
> dynamically allocated, per ivtv card instance, itv->irq_worker object's 
> internal spin lock.  I think it should be using spin_lock_init() for
> those objects' internal locks.
> 
> I could hack up a patch if needed.  However Tejun or someone familiar with
> kthread worker internals and/or lockdep would turn around an optimal and
> correct solution in shorter time.

The patch to get rid of the lockdep INFO gripe was easier than I thought, since
ivtv is the only current user of the kernel API call in question.

With the attached patch, lockdep now is happy about how the dynamically
allocated kthread_worker's spinlock_t is initialized. :)

However, lockdep is now able to gripe about a potential locking problem with
the new V4L2 control framework and ivtv's kthread worker. :(

Dmesg output is attached, since I won't be able to continue investigation for
at least a day or two.

Regards,
Andy

Comment 12 Andy Walls 2010-12-19 13:55:13 UTC
(In reply to comment #11)

> The patch to get rid of the lockdep INFO gripe was easier than I thought, since
> ivtv is the only current user of the kernel API call in question.
> 
> With the attached patch, lockdep now is happy about how the dynamically
> allocated kthread_worker's spinlock_t is initialized. :)

I have sent the patch I attached to the LKML for review, hopefully
by at least Tejun, and inclusion upstream.

https://lkml.org/lkml/2010/12/19/101
 
> However, lockdep is now able to gripe about a potential locking problem with
> the new V4L2 control framework and ivtv's kthread worker. :(
> 
> Dmesg output is attached, since I won't be able to continue investigation for
> at least a day or two.

Hans Verkuil provided confirmation that the recursive lock is a problem in
the new V4L2 control handling framework.  I'm not sure if that problem is
in the rawhide kernel in question, yet.

I recommend declaring the attached patch as a fix for this bug report and
writing a new bug report *if* the new V4L2 control framework (and recursive
lock) is in the rawhide kernel in question.

Regards,
Andy

Comment 13 Andy Walls 2010-12-27 21:41:07 UTC
(In reply to comment #12)
> (In reply to comment #11)

> > With the attached patch, lockdep now is happy about how the dynamically
> > allocated kthread_worker's spinlock_t is initialized. :)
> 
> I have sent the patch I attached to the LKML for review, hopefully
> by at least Tejun, and inclusion upstream.
> 
> https://lkml.org/lkml/2010/12/19/101

Here is the patch for this bug, that Tejun is pushing to Linus:
http://git.kernel.org/?p=linux/kernel/git/tj/wq.git;a=commitdiff;h=4f32e9b1f812fd6c00cc85a127583fefbdedaedc

> > However, lockdep is now able to gripe about a potential locking problem with
> > the new V4L2 control framework
 
> Hans Verkuil provided confirmation that the recursive lock is a problem in
> the new V4L2 control handling framework.  I'm not sure if that problem is
> in the rawhide kernel in question, yet.

The potential recursive mutex lock that lockdep finds in the V4L2 control framework is not a real issue, as no deadlocks can actually occur, if V4L2 drivers are written properly.

However, lockdep_set_class_and_name() or mutex_lock_nested() calls need to be added in the V4L2 control framework to inform lockdep that, that particular recursive lock on the struct v4l2_ctrl_handler.lock (aka &hdl>lock) mutex lock class is OK.

That is a separate bug for which no final fix is yet available.
 
Regards,
Andy

Comment 14 Josh Boyer 2012-02-21 20:08:07 UTC
The ivtv bug has been fixed since 2.6.37.  Closing this out...