| Summary: | GFS2: glock issue w/ xattr in upstream kernel | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Nate Straz <nstraz> |
| Component: | kernel | Assignee: | Andreas Gruenbacher <agruenba> |
| Status: | CLOSED UPSTREAM | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | gansalmon, gfs2-maint, ichavero, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, nstraz, pmoore, rpeterso |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-09 21:20:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 437984, 1426295 | ||
| Attachments: | |||
|
Description
Nate Straz
2016-10-26 16:18:11 UTC
If I try to use the demote_rq interface to free up the blocked process I get an error [root@host-053 STSRHTS30869:locksmith0]# echo "2:134032 EX" > demote_rq [68369.503224] gfs2: fsid=STSRHTS30869:locksmith0.0: demote interface used -bash: echo: write error: No such file or directory [root@host-053 STSRHTS30869:locksmith0]# echo "2:131513 EX" > demote_rq -bash: echo: write error: No such file or directory Created attachment 1214356 [details]
Instrumentation patch 1 - glock history
This is an upstream patch for glock history.
Created attachment 1214357 [details]
Instrumentation patch 2 - glock state machine instrumentation
Here is a second instrumentation patch. The idea is to apply them
both, recreate the glock problems with locksmith, then dump the
glocks debugfs file, plus the /proc/<pid>stack files of all the
glock workqueue workers. I tested similar patches on RHEL6, but
I haven't tested the upstream versions yet.
I was able to git bisect to this commit.
[root@dash-01 linux-upstream]# git bisect bad
f39814f60ad0fa982ac87a97af1fb0bde244c319 is the first bad commit
[root@dash-01 linux-upstream]# git show
commit f39814f60ad0fa982ac87a97af1fb0bde244c319
Author: Andreas Gruenbacher <agruenba>
Date: Thu Dec 24 11:09:40 2015 -0500
gfs2: Invalid security labels of inodes when they go invalid
When gfs2 releases the glock of an inode, it must invalidate all
information cached for that inode, including the page cache and acls.
Use the new security_inode_invalidate_secctx hook to also invalidate
security labels in that case. These items will be reread from disk
when needed after reacquiring the glock.
Signed-off-by: Andreas Gruenbacher <agruenba>
Acked-by: Bob Peterson <rpeterso>
Acked-by: Steven Whitehouse <swhiteho>
Cc: cluster-devel
[PM: fixed spelling errors and description line lengths]
Signed-off-by: Paul Moore <pmoore>
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index f348cfb..437fd73 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -13,6 +13,7 @@
#include <linux/gfs2_ondisk.h>
#include <linux/bio.h>
#include <linux/posix_acl.h>
+#include <linux/security.h>
#include "gfs2.h"
#include "incore.h"
@@ -262,6 +263,7 @@ static void inode_go_inval(struct gfs2_glock *gl, int flags)
if (ip) {
set_bit(GIF_INVALID, &ip->i_flags);
forget_all_cached_acls(&ip->i_inode);
+ security_inode_invalidate_secctx(&ip->i_inode);
gfs2_dir_hash_inval(ip);
}
}
Reassigning to Andreas based on comment #4 Here are the backtraces from all blocked processes on one node. [271972.808843] sysrq: SysRq : Show Blocked State [271972.809593] task PC stack pid father [271972.810561] kworker/0:2H D ffff880000c7bc38 0 11211 2 0x00000080 [271972.811720] Workqueue: glock_workqueue glock_work_func [gfs2] [271972.812661] ffff880000c7bc38 ffff88003d090000 ffff88003c0a2a00 ffff880000c7c000 [271972.813902] ffff880017d0c43c ffff88003c0a2a00 00000000ffffffff ffff880017d0c440 [271972.815175] ffff880000c7bc50 ffffffff8169e415 ffff880017d0c438 ffff880000c7bc60 [271972.816422] Call Trace: [271972.816822] [<ffffffff8169e415>] schedule+0x35/0x80 [271972.817633] [<ffffffff8169e6fe>] schedule_preempt_disabled+0xe/0x10 [271972.818651] [<ffffffff816a0194>] __mutex_lock_slowpath+0xb4/0x130 [271972.819638] [<ffffffff816a022f>] mutex_lock+0x1f/0x2f [271972.820472] [<ffffffff812b348d>] selinux_inode_invalidate_secctx+0x1d/0x30 [271972.821571] [<ffffffff812addd5>] security_inode_invalidate_secctx+0x25/0x40 [271972.822690] [<ffffffffa037e61a>] inode_go_inval+0x6a/0xf0 [gfs2] [271972.823664] [<ffffffffa037d255>] do_xmote+0x155/0x250 [gfs2] [271972.824597] [<ffffffffa037d480>] run_queue+0x130/0x250 [gfs2] [271972.825544] [<ffffffffa037d600>] glock_work_func+0x60/0x130 [gfs2] [271972.826560] [<ffffffff810a0b1c>] process_one_work+0x14c/0x3c0 [271972.827505] [<ffffffff810a1354>] worker_thread+0x114/0x470 [271972.828410] [<ffffffff810a1240>] ? rescuer_thread+0x310/0x310 [271972.829413] [<ffffffff810a6e58>] kthread+0xd8/0xf0 [271972.830220] [<ffffffff8108bb9c>] ? do_exit+0x66c/0xab0 [271972.831050] [<ffffffff810a6d80>] ? kthread_park+0x60/0x60 [271972.831910] [<ffffffff816a228f>] ret_from_fork+0x3f/0x70 [271972.832799] [<ffffffff810a6d80>] ? kthread_park+0x60/0x60 [271972.833727] kworker/0:1H D ffff88003bc8bc38 0 11978 2 0x00000080 [271972.834910] Workqueue: glock_workqueue glock_work_func [gfs2] [271972.835865] ffff88003bc8bc38 ffff88003e310000 ffff88003ae65400 ffff88003bc8c000 [271972.837162] ffff880035aa2dec ffff88003ae65400 00000000ffffffff ffff880035aa2df0 [271972.838426] ffff88003bc8bc50 ffffffff8169e415 ffff880035aa2de8 ffff88003bc8bc60 [271972.839685] Call Trace: [271972.840120] [<ffffffff8169e415>] schedule+0x35/0x80 [271972.840907] [<ffffffff8169e6fe>] schedule_preempt_disabled+0xe/0x10 [271972.841932] [<ffffffff816a0194>] __mutex_lock_slowpath+0xb4/0x130 [271972.842917] [<ffffffff816a022f>] mutex_lock+0x1f/0x2f [271972.843754] [<ffffffff812b348d>] selinux_inode_invalidate_secctx+0x1d/0x30 [271972.844852] [<ffffffff812addd5>] security_inode_invalidate_secctx+0x25/0x40 [271972.845986] [<ffffffffa037e61a>] inode_go_inval+0x6a/0xf0 [gfs2] [271972.846981] [<ffffffffa037d255>] do_xmote+0x155/0x250 [gfs2] [271972.847917] [<ffffffffa037d480>] run_queue+0x130/0x250 [gfs2] [271972.848984] [<ffffffffa037d600>] glock_work_func+0x60/0x130 [gfs2] [271972.849986] [<ffffffff810a0b1c>] process_one_work+0x14c/0x3c0 [271972.850932] [<ffffffff810a1354>] worker_thread+0x114/0x470 [271972.851837] [<ffffffff8169de19>] ? __schedule+0x349/0x910 [271972.852731] [<ffffffff810a1240>] ? rescuer_thread+0x310/0x310 [271972.853671] [<ffffffff810a6e58>] kthread+0xd8/0xf0 [271972.854463] [<ffffffff810a6d80>] ? kthread_park+0x60/0x60 [271972.855354] [<ffffffff816a228f>] ret_from_fork+0x3f/0x70 [271972.856228] [<ffffffff810a6d80>] ? kthread_park+0x60/0x60 [271972.857120] xdoio D ffff880017cbfbd0 0 12004 12003 0x00000080 [271972.858290] ffff880017cbfbd0 ffff88003a876900 ffff88003c0a6900 ffff880017cc0000 [271972.859565] ffff880017cbfc60 ffff88003ff314e8 0000000000000002 ffffffff8169ec20 [271972.860819] ffff880017cbfbe8 ffffffff8169e415 ffff880017cbfc48 ffff880017cbfbf8 [271972.862120] Call Trace: [271972.862530] [<ffffffff8169ec20>] ? out_of_line_wait_on_atomic_t+0xf0/0xf0 [271972.863619] [<ffffffff8169e415>] schedule+0x35/0x80 [271972.864443] [<ffffffff8169ec4c>] bit_wait+0x2c/0x50 [271972.865253] [<ffffffff8169e83d>] __wait_on_bit+0x5d/0x90 [271972.866137] [<ffffffff8169ec20>] ? out_of_line_wait_on_atomic_t+0xf0/0xf0 [271972.867233] [<ffffffff8169e8f1>] out_of_line_wait_on_bit+0x81/0xb0 [271972.868246] [<ffffffff810cba70>] ? autoremove_wake_function+0x40/0x40 [271972.869285] [<ffffffffa037c103>] gfs2_glock_wait+0x43/0x80 [gfs2] [271972.870287] [<ffffffffa037d95d>] gfs2_glock_nq+0x21d/0x3f0 [gfs2] [271972.871293] [<ffffffffa038b9ac>] gfs2_getxattr+0xfc/0x140 [gfs2] [271972.872278] [<ffffffffa038b9a2>] ? gfs2_getxattr+0xf2/0x140 [gfs2] [271972.873291] [<ffffffff811e5f18>] ? kmem_cache_alloc_trace+0x38/0x140 [271972.874325] [<ffffffff812b8044>] inode_doinit_with_dentry+0xf4/0x640 [271972.875366] [<ffffffff810cba30>] ? prepare_to_wait_event+0xf0/0xf0 [271972.876369] [<ffffffff812b85e1>] __inode_security_revalidate+0x51/0x60 [271972.877432] [<ffffffff812b984f>] selinux_file_permission+0x2f/0x130 [271972.878454] [<ffffffff812afe1d>] security_file_permission+0x3d/0xc0 [271972.879468] [<ffffffff81205cc2>] rw_verify_area+0x52/0xd0 [271972.880346] [<ffffffff81205da6>] vfs_read+0x66/0x130 [271972.881177] [<ffffffff81206c35>] SyS_read+0x55/0xc0 [271972.881959] [<ffffffff816a1f2e>] entry_SYSCALL_64_fastpath+0x12/0x71 [271972.882990] xdoio D ffff880035a33bd0 0 12053 12052 0x00000080 [271972.884184] ffff880035a33bd0 ffff88003e310000 ffff88003ab7d400 ffff880035a34000 [271972.885450] ffff880035a33c60 ffff88003ff314b8 0000000000000002 ffffffff8169ec20 [271972.886710] ffff880035a33be8 ffffffff8169e415 ffff880035a33c48 ffff880035a33bf8 [271972.887974] Call Trace: [271972.888409] [<ffffffff8169ec20>] ? out_of_line_wait_on_atomic_t+0xf0/0xf0 [271972.889516] [<ffffffff8169e415>] schedule+0x35/0x80 [271972.890324] [<ffffffff8169ec4c>] bit_wait+0x2c/0x50 [271972.891140] [<ffffffff8169e83d>] __wait_on_bit+0x5d/0x90 [271972.891989] [<ffffffffa038496b>] ? gfs2_write_end+0x1eb/0x440 [gfs2] [271972.893029] [<ffffffff8169ec20>] ? out_of_line_wait_on_atomic_t+0xf0/0xf0 [271972.894142] [<ffffffff8169e8f1>] out_of_line_wait_on_bit+0x81/0xb0 [271972.895155] [<ffffffff810cba70>] ? autoremove_wake_function+0x40/0x40 [271972.896208] [<ffffffffa037c103>] gfs2_glock_wait+0x43/0x80 [gfs2] [271972.897197] [<ffffffffa037d95d>] gfs2_glock_nq+0x21d/0x3f0 [gfs2] [271972.898192] [<ffffffffa038b9ac>] gfs2_getxattr+0xfc/0x140 [gfs2] [271972.899186] [<ffffffffa038b9a2>] ? gfs2_getxattr+0xf2/0x140 [gfs2] [271972.900196] [<ffffffff811e5f18>] ? kmem_cache_alloc_trace+0x38/0x140 [271972.901217] [<ffffffff812b8044>] inode_doinit_with_dentry+0xf4/0x640 [271972.902240] [<ffffffff810cba30>] ? prepare_to_wait_event+0xf0/0xf0 [271972.903237] [<ffffffff812b85e1>] __inode_security_revalidate+0x51/0x60 [271972.904286] [<ffffffff812b984f>] selinux_file_permission+0x2f/0x130 [271972.905307] [<ffffffff812afe1d>] security_file_permission+0x3d/0xc0 [271972.906319] [<ffffffff81205cc2>] rw_verify_area+0x52/0xd0 [271972.907197] [<ffffffff81205da6>] vfs_read+0x66/0x130 [271972.907979] [<ffffffff81206c35>] SyS_read+0x55/0xc0 [271972.908772] [<ffffffff816a1f2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Created attachment 1219100 [details]
[PATCH 1/3] selinux: Minor cleanups
Fix the comment for function __inode_security_revalidate, which returns
an integer.
Use the LABEL_* constants consistently for isec->initialized.
Created attachment 1219101 [details]
[PATCH 2/3] selinux: Clean up initialization of isec->sclass
When isec->initialized is LABEL_INITIALIZED, assume that isec->sclass is
initialized already. Otherwise, initialize it right at the beginning of
inode_doinit_with_dentry.
Created attachment 1219102 [details]
[PATCH 3/3] selinux: Convert isec->lock into spinlock
Convert isec->lock from a mutex into a spinlock. Instead of holding the
lock in a sleeping context in inode_doinit_with_dentry, set
isec->initialized to LABEL_PENDING and release the lock. Then, when the
sid has been determined, re-acquire the lock. If isec->initialized is
still set to LABEL_PENDING, set isec->sid; otherwise, the sid has been
set by another task (LABEL_INITIALIZED) or invalidated (LABEL_INVALID)
in the meantime.
This fixes a deadlock on gfs2 where
* one task is in inode_doinit_with_dentry -> gfs2_getxattr, holds
isec->lock, and tries to acquire the inode's glock, and
* another task is in do_xmote -> inode_go_inval ->
selinux_inode_invalidate_secctx, holds the inode's glock, and tries
to acquire isec->lock.
Nate, can you check if this set of three selinux patches fixes things? (They are rather unlikely to make it into the 4.9 kernel, but still ...) Created attachment 1219388 [details]
Fixed patches
It turned out that the middle patch depended on an additional proc fix. The other patches are mostly unchanged; here is an updated patch queue for testing.
I was able to get through a QUICK version of locksmith with the fixed patches. I'll keep going with the rest of the regression suite. A quick note, it's really awkward to review patches via Bugzilla, doubly so if the patches are sent as a tarball. That said, looking at patch 4/4, this looks like what we talked about on IRC a few days ago, with the exception of the change from a mutex to a spinlock. However, given the short duration of the lock that seems like a reasonable change. The rest of the regression suite ran without issues. I re-ran this on an upstream 4.12.5 kernel and locksmith passed without issues. |