905572 – GFS2: suspicious RCU usage during mount

Bug 905572 - GFS2: suspicious RCU usage during mount

Summary: GFS2: suspicious RCU usage during mount

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Steve Whitehouse
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-01-29 16:50 UTC by Abhijith Das
Modified:	2013-04-30 15:38 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-04-30 15:38:25 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Proposed fix (1.04 KB, patch) 2013-01-30 10:52 UTC, Steve Whitehouse	no flags	Details \| Diff
View All

Description Abhijith Das 2013-01-29 16:50:57 UTC

Noticed this in my kernel logs:
100G device mounted with lock_nolock on a hand-compiled kernel running the latest nmw kernel bits as of today.

[   87.799733] GFS2 installed
[   87.812359] GFS2: fsid=dm-14: Trying to join cluster "lock_nolock", "dm-14"
[   87.820199] GFS2: fsid=dm-14: Now mounting FS...
[   87.825628] 
[   87.827321] ===============================
[   87.832428] [ INFO: suspicious RCU usage. ]
[   87.837132] 3.8.0-rc2 #9 Tainted: G        W  O
[   87.842216] -------------------------------
[   87.846913] include/linux/rculist_bl.h:23 suspicious rcu_dereference_check() usage!
[   87.855482] 
[   87.855482] other info that might help us debug this:
[   87.855482] 
[   87.864448] 
[   87.864448] rcu_scheduler_active = 1, debug_locks = 0
[   87.871761] 1 lock held by mount/1197:
[   87.875973]  #0:  (&type->s_umount_key#30/1){+.+.+.}, at: [<ffffffff811cfa4d>] sget+0x37d/0x640
[   87.885893] 
[   87.885893] stack backtrace:
[   87.890783] Pid: 1197, comm: mount Tainted: G        W  O 3.8.0-rc2 #9
[   87.898099] Call Trace:
[   87.900861]  [<ffffffff810d9f8d>] lockdep_rcu_suspicious+0xfd/0x130
[   87.907917]  [<ffffffffa03582c8>] search_bucket+0x138/0x180 [gfs2]
[   87.914854]  [<ffffffffa03592e8>] gfs2_glock_get+0x618/0x770 [gfs2]
[   87.921896]  [<ffffffffa0358cd5>] ? gfs2_glock_get+0x5/0x770 [gfs2]
[   87.928931]  [<ffffffffa035b7b0>] gfs2_glock_nq_num+0x30/0xa0 [gfs2]
[   87.936068]  [<ffffffffa0367b7d>] fill_super+0x64d/0xe40 [gfs2]
[   87.942720]  [<ffffffff81358104>] ? snprintf+0x34/0x40
[   87.948479]  [<ffffffff816bc14d>] ? __mutex_unlock_slowpath+0xdd/0x180
[   87.955796]  [<ffffffffa03685f3>] gfs2_mount+0x283/0x2e0 [gfs2]
[   87.962431]  [<ffffffff811d0cc3>] mount_fs+0x43/0x1b0
[   87.968091]  [<ffffffff8118b2b0>] ? __alloc_percpu+0x10/0x20
[   87.974433]  [<ffffffff811eff13>] vfs_kern_mount+0x73/0x110
[   87.980676]  [<ffffffff811f26e6>] do_mount+0x216/0xa70
[   87.986432]  [<ffffffff8118501b>] ? memdup_user+0x4b/0x90
[   87.992475]  [<ffffffff811850bb>] ? strndup_user+0x5b/0x80
[   87.998620]  [<ffffffff811f2fce>] sys_mount+0x8e/0xe0
[   88.004281]  [<ffffffff816c8899>] system_call_fastpath+0x16/0x1b
[   88.076900] GFS2: fsid=dm-14.0: jid=0, already locked for use
[   88.083342] GFS2: fsid=dm-14.0: jid=0: Looking at journal...
[   91.208829] GFS2: fsid=dm-14.0: jid=0: Done
[   91.213559] GFS2: fsid=dm-14.0: first mount done, others may mount

Comment 1 Steve Whitehouse 2013-01-30 10:21:45 UTC

this looks like something that needs fixing upstream.....


Notice that rcu_dereference_check() will check for rcu_read_lock but that is all. So when we repeat the search for an existing glock under the spinlock, then this will trigger. I think probably the bug is in rculist_bl.h since we have:

static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h)
{
        return (struct hlist_bl_node *)
                ((unsigned long)rcu_dereference(h->first) & ~LIST_BL_LOCKMASK);
}

What we probably need to do is to add a function hlist_bl_is_locked() and then
to have:

static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h)
{
        return (struct hlist_bl_node *)
                ((unsigned long)rcu_dereference_check(h->first, hlist_bl_is_locked(h)) & ~LIST_BL_LOCKMASK);
}

or something along those lines....

Comment 2 Steve Whitehouse 2013-01-30 10:52:37 UTC

Created attachment 690282 [details]
Proposed fix

Does this fix the issue for you?

Comment 3 Abhijith Das 2013-01-30 18:01:10 UTC

The above patch seems to fix the issue. I've seen the messages before on the first mount after a reboot, but I don't see it anymore with this patch.

Comment 4 Steve Whitehouse 2013-01-31 10:14:14 UTC

I've posted the proposed fix upstream

Comment 5 Fedora End Of Life 2013-04-03 20:21:40 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 6 Justin M. Forbes 2013-04-05 15:35:34 UTC

Is this still a problem with 3.9 based F19 kernels?

Comment 7 Steve Whitehouse 2013-04-05 15:37:29 UTC

I think so - patch was scheduled for the next merge window upstream last I saw

Comment 8 Steve Whitehouse 2013-04-30 15:38:25 UTC

Fix in upstream kernel:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/linux/list_bl.h?id=49d0de082c31de34cc896c14eec5f1c2ade0415a

So it will be in the 3.10 kernel.

Note You need to log in before you can comment on or make changes to this bug.