Bug 905572
| Summary: | GFS2: suspicious RCU usage during mount | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Abhijith Das <adas> | ||||
| Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||
| Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 19 | CC: | adas, anprice, bmarzins, gansalmon, itamar, jforbes, jonathan, kernel-maint, madhu.chinakonda, pevans, rpeterso, swhiteho | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-04-30 15:38:25 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
this looks like something that needs fixing upstream.....
Notice that rcu_dereference_check() will check for rcu_read_lock but that is all. So when we repeat the search for an existing glock under the spinlock, then this will trigger. I think probably the bug is in rculist_bl.h since we have:
static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h)
{
return (struct hlist_bl_node *)
((unsigned long)rcu_dereference(h->first) & ~LIST_BL_LOCKMASK);
}
What we probably need to do is to add a function hlist_bl_is_locked() and then
to have:
static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h)
{
return (struct hlist_bl_node *)
((unsigned long)rcu_dereference_check(h->first, hlist_bl_is_locked(h)) & ~LIST_BL_LOCKMASK);
}
or something along those lines....
Created attachment 690282 [details]
Proposed fix
Does this fix the issue for you?
The above patch seems to fix the issue. I've seen the messages before on the first mount after a reboot, but I don't see it anymore with this patch. I've posted the proposed fix upstream This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19 Is this still a problem with 3.9 based F19 kernels? I think so - patch was scheduled for the next merge window upstream last I saw Fix in upstream kernel: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/linux/list_bl.h?id=49d0de082c31de34cc896c14eec5f1c2ade0415a So it will be in the 3.10 kernel. |
Noticed this in my kernel logs: 100G device mounted with lock_nolock on a hand-compiled kernel running the latest nmw kernel bits as of today. [ 87.799733] GFS2 installed [ 87.812359] GFS2: fsid=dm-14: Trying to join cluster "lock_nolock", "dm-14" [ 87.820199] GFS2: fsid=dm-14: Now mounting FS... [ 87.825628] [ 87.827321] =============================== [ 87.832428] [ INFO: suspicious RCU usage. ] [ 87.837132] 3.8.0-rc2 #9 Tainted: G W O [ 87.842216] ------------------------------- [ 87.846913] include/linux/rculist_bl.h:23 suspicious rcu_dereference_check() usage! [ 87.855482] [ 87.855482] other info that might help us debug this: [ 87.855482] [ 87.864448] [ 87.864448] rcu_scheduler_active = 1, debug_locks = 0 [ 87.871761] 1 lock held by mount/1197: [ 87.875973] #0: (&type->s_umount_key#30/1){+.+.+.}, at: [<ffffffff811cfa4d>] sget+0x37d/0x640 [ 87.885893] [ 87.885893] stack backtrace: [ 87.890783] Pid: 1197, comm: mount Tainted: G W O 3.8.0-rc2 #9 [ 87.898099] Call Trace: [ 87.900861] [<ffffffff810d9f8d>] lockdep_rcu_suspicious+0xfd/0x130 [ 87.907917] [<ffffffffa03582c8>] search_bucket+0x138/0x180 [gfs2] [ 87.914854] [<ffffffffa03592e8>] gfs2_glock_get+0x618/0x770 [gfs2] [ 87.921896] [<ffffffffa0358cd5>] ? gfs2_glock_get+0x5/0x770 [gfs2] [ 87.928931] [<ffffffffa035b7b0>] gfs2_glock_nq_num+0x30/0xa0 [gfs2] [ 87.936068] [<ffffffffa0367b7d>] fill_super+0x64d/0xe40 [gfs2] [ 87.942720] [<ffffffff81358104>] ? snprintf+0x34/0x40 [ 87.948479] [<ffffffff816bc14d>] ? __mutex_unlock_slowpath+0xdd/0x180 [ 87.955796] [<ffffffffa03685f3>] gfs2_mount+0x283/0x2e0 [gfs2] [ 87.962431] [<ffffffff811d0cc3>] mount_fs+0x43/0x1b0 [ 87.968091] [<ffffffff8118b2b0>] ? __alloc_percpu+0x10/0x20 [ 87.974433] [<ffffffff811eff13>] vfs_kern_mount+0x73/0x110 [ 87.980676] [<ffffffff811f26e6>] do_mount+0x216/0xa70 [ 87.986432] [<ffffffff8118501b>] ? memdup_user+0x4b/0x90 [ 87.992475] [<ffffffff811850bb>] ? strndup_user+0x5b/0x80 [ 87.998620] [<ffffffff811f2fce>] sys_mount+0x8e/0xe0 [ 88.004281] [<ffffffff816c8899>] system_call_fastpath+0x16/0x1b [ 88.076900] GFS2: fsid=dm-14.0: jid=0, already locked for use [ 88.083342] GFS2: fsid=dm-14.0: jid=0: Looking at journal... [ 91.208829] GFS2: fsid=dm-14.0: jid=0: Done [ 91.213559] GFS2: fsid=dm-14.0: first mount done, others may mount