Hide Forgot
Description of problem: While running our locksmith test on GFS2 w/ SELinux enabled some tests will get stuck and not finish. Glock analysis shows multiple processes stuck waiting for glocks. If I mount the file system with a forced context, the tests run without blocking. [nstraz@tin ~]$ python glocker.py host-053:/mnt/locksmith0 Getting mount info from host-053 Using /dev/mapper/STSRHTS30869-locksmith0 /mnt/locksmith0 gfs2 rw,seclabel,relatime Getting GFS2 lock table name on host-053 Getting node list from host-053 Checking that debugfs is mounted on host-053 debugfs mounted on host-053 at /sys/kernel/debug Checking that debugfs is mounted on host-054 debugfs mounted on host-054 at /sys/kernel/debug Getting GFS2 locks for /mnt/locksmith0 from host-053 489 glocks parsed 48 glocks kept, 441 discarded Getting DLM locks for locksmith0 from host-053 88 DLM locks parsed Getting DLM locks for locksmith0 from host-054 87 DLM locks parsed Getting GFS2 locks for /mnt/locksmith0 from host-054 489 glocks parsed 48 glocks kept, 441 discarded Dumping all glocks which have waiters ================================================================================ 2 20b90 dlm Master copy from node host-054 Resource ffff88003b6d73c0 Name (len=24) " 2 20b90" Master Copy Granted Queue 00000096 EX Remote: 1 0000006d Conversion Queue 00000093 NL (EX) Waiting Queue host-053 GFS2 Glock (2, 20b90) G: s:EX n:2/20b90 f:lDpiIqo t:UN d:UN/76273000 a:0 v:0 r:4 m:150 H: s:SH f:AW e:0 p:9447 [xdoio] gfs2_xattr_get+0x18d/0x1e0 [gfs2] I: n:31/134032 t:8 f:0x11 d:0x00000000 s:512000 host-054 GFS2 Glock (2, 20b90) G: s:UN n:2/20b90 f:lIqob t:EX d:EX/0 a:0 v:0 r:3 m:200 H: s:EX f:W e:0 p:20377 [xdoio] gfs2_dirty_inode+0x1b1/0x240 [gfs2] ================================================================================ 2 201b9 dlm Master copy from node host-054 Resource ffff88001a265640 Name (len=24) " 2 201b9" Master Copy Granted Queue 00000042 EX Remote: 1 00000043 Conversion Queue 0000003d NL (EX) Waiting Queue host-053 GFS2 Glock (2, 201b9) G: s:EX n:2/201b9 f:lDpiIqo t:UN d:UN/344322000 a:0 v:0 r:4 m:10 H: s:SH f:AW e:0 p:8847 [xdoio] gfs2_xattr_get+0x18d/0x1e0 [gfs2] I: n:10/131513 t:8 f:0x11 d:0x00000000 s:512000 host-054 GFS2 Glock (2, 201b9) G: s:UN n:2/201b9 f:ldIqob t:EX d:UN/0 a:0 v:0 r:3 m:10 H: s:EX f:W e:0 p:19591 [xdoio] gfs2_dirty_inode+0x1b1/0x240 [gfs2] ================================================================================ host-053 8847 xdoio [<ffffffffa03a6203>] gfs2_glock_wait+0x43/0x80 [gfs2] [<ffffffffa03a78ef>] gfs2_glock_nq+0x24f/0x430 [gfs2] [<ffffffffa03a1247>] gfs2_xattr_get+0x197/0x1e0 [gfs2] [<ffffffff81252ad6>] generic_getxattr+0x56/0x70 [<ffffffff812e7128>] inode_doinit_with_dentry+0xf8/0x650 [<ffffffff812e8146>] __inode_security_revalidate+0x36/0x60 [<ffffffff812e921a>] selinux_file_permission+0x5a/0x120 [<ffffffff812dec3d>] security_file_permission+0x3d/0xc0 [<ffffffff8122c0f9>] rw_verify_area+0x49/0xb0 [<ffffffff8122c1c9>] vfs_read+0x69/0x140 [<ffffffff8122d6e5>] SyS_read+0x55/0xc0 [<ffffffff81003a47>] do_syscall_64+0x67/0x160 [<ffffffff816ff361>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff host-053 9447 xdoio [<ffffffffa03a6203>] gfs2_glock_wait+0x43/0x80 [gfs2] [<ffffffffa03a78ef>] gfs2_glock_nq+0x24f/0x430 [gfs2] [<ffffffffa03a1247>] gfs2_xattr_get+0x197/0x1e0 [gfs2] [<ffffffff81252ad6>] generic_getxattr+0x56/0x70 [<ffffffff812e7128>] inode_doinit_with_dentry+0xf8/0x650 [<ffffffff812e8146>] __inode_security_revalidate+0x36/0x60 [<ffffffff812e9392>] selinux_inode_permission+0xb2/0x190 [<ffffffff812de701>] security_inode_permission+0x41/0x60 [<ffffffff81236f68>] __inode_permission+0x48/0xd0 [<ffffffff81237008>] inode_permission+0x18/0x50 [<ffffffff8123709a>] may_open+0x5a/0x100 [<ffffffff8123b080>] path_openat+0x510/0x1370 [<ffffffff8123d161>] do_filp_open+0x91/0x100 [<ffffffff8122ac34>] do_sys_open+0x124/0x210 [<ffffffff8122ad3e>] SyS_open+0x1e/0x20 [<ffffffff81003a47>] do_syscall_64+0x67/0x160 [<ffffffff816ff361>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff host-054 19591 xdoio [<ffffffffa039e203>] gfs2_glock_wait+0x43/0x80 [gfs2] [<ffffffffa039f8ef>] gfs2_glock_nq+0x24f/0x430 [gfs2] [<ffffffffa03bb5fb>] gfs2_dirty_inode+0x1bb/0x240 [gfs2] [<ffffffff8125ad0f>] __mark_inode_dirty+0x17f/0x360 [<ffffffff81248607>] generic_update_time+0x87/0xd0 [<ffffffff81247e7d>] file_update_time+0xbd/0x110 [<ffffffff8119d98d>] __generic_file_write_iter+0x9d/0x1e0 [<ffffffff8119dbaa>] generic_file_write_iter+0xda/0x1c0 [<ffffffffa03a8318>] gfs2_file_write_iter+0x98/0x100 [gfs2] [<ffffffff8122bcc3>] __vfs_write+0xe3/0x160 [<ffffffff8122c352>] vfs_write+0xb2/0x1b0 [<ffffffff8122d7a5>] SyS_write+0x55/0xc0 [<ffffffff81003a47>] do_syscall_64+0x67/0x160 [<ffffffff816ff361>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff host-054 20377 xdoio [<ffffffffa039e203>] gfs2_glock_wait+0x43/0x80 [gfs2] [<ffffffffa039f8ef>] gfs2_glock_nq+0x24f/0x430 [gfs2] [<ffffffffa03bb5fb>] gfs2_dirty_inode+0x1bb/0x240 [gfs2] [<ffffffff8125ad0f>] __mark_inode_dirty+0x17f/0x360 [<ffffffff81248607>] generic_update_time+0x87/0xd0 [<ffffffff81247e7d>] file_update_time+0xbd/0x110 [<ffffffff8119d98d>] __generic_file_write_iter+0x9d/0x1e0 [<ffffffff8119dbaa>] generic_file_write_iter+0xda/0x1c0 [<ffffffffa03a8318>] gfs2_file_write_iter+0x98/0x100 [gfs2] [<ffffffff8122bcc3>] __vfs_write+0xe3/0x160 [<ffffffff8122c352>] vfs_write+0xb2/0x1b0 [<ffffffff8122d7a5>] SyS_write+0x55/0xc0 [<ffffffff81003a47>] do_syscall_64+0x67/0x160 [<ffffffff816ff361>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Version-Release number of selected component (if applicable): linux-4.8.4 How reproducible: Easily Steps to Reproduce: 1. run locksmith test with selinux enabled 2. 3. Actual results: Expected results: Additional info:
If I try to use the demote_rq interface to free up the blocked process I get an error [root@host-053 STSRHTS30869:locksmith0]# echo "2:134032 EX" > demote_rq [68369.503224] gfs2: fsid=STSRHTS30869:locksmith0.0: demote interface used -bash: echo: write error: No such file or directory [root@host-053 STSRHTS30869:locksmith0]# echo "2:131513 EX" > demote_rq -bash: echo: write error: No such file or directory
Created attachment 1214356 [details] Instrumentation patch 1 - glock history This is an upstream patch for glock history.
Created attachment 1214357 [details] Instrumentation patch 2 - glock state machine instrumentation Here is a second instrumentation patch. The idea is to apply them both, recreate the glock problems with locksmith, then dump the glocks debugfs file, plus the /proc/<pid>stack files of all the glock workqueue workers. I tested similar patches on RHEL6, but I haven't tested the upstream versions yet.
I was able to git bisect to this commit. [root@dash-01 linux-upstream]# git bisect bad f39814f60ad0fa982ac87a97af1fb0bde244c319 is the first bad commit [root@dash-01 linux-upstream]# git show commit f39814f60ad0fa982ac87a97af1fb0bde244c319 Author: Andreas Gruenbacher <agruenba> Date: Thu Dec 24 11:09:40 2015 -0500 gfs2: Invalid security labels of inodes when they go invalid When gfs2 releases the glock of an inode, it must invalidate all information cached for that inode, including the page cache and acls. Use the new security_inode_invalidate_secctx hook to also invalidate security labels in that case. These items will be reread from disk when needed after reacquiring the glock. Signed-off-by: Andreas Gruenbacher <agruenba> Acked-by: Bob Peterson <rpeterso> Acked-by: Steven Whitehouse <swhiteho> Cc: cluster-devel [PM: fixed spelling errors and description line lengths] Signed-off-by: Paul Moore <pmoore> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index f348cfb..437fd73 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -13,6 +13,7 @@ #include <linux/gfs2_ondisk.h> #include <linux/bio.h> #include <linux/posix_acl.h> +#include <linux/security.h> #include "gfs2.h" #include "incore.h" @@ -262,6 +263,7 @@ static void inode_go_inval(struct gfs2_glock *gl, int flags) if (ip) { set_bit(GIF_INVALID, &ip->i_flags); forget_all_cached_acls(&ip->i_inode); + security_inode_invalidate_secctx(&ip->i_inode); gfs2_dir_hash_inval(ip); } }
Reassigning to Andreas based on comment #4
Here are the backtraces from all blocked processes on one node. [271972.808843] sysrq: SysRq : Show Blocked State [271972.809593] task PC stack pid father [271972.810561] kworker/0:2H D ffff880000c7bc38 0 11211 2 0x00000080 [271972.811720] Workqueue: glock_workqueue glock_work_func [gfs2] [271972.812661] ffff880000c7bc38 ffff88003d090000 ffff88003c0a2a00 ffff880000c7c000 [271972.813902] ffff880017d0c43c ffff88003c0a2a00 00000000ffffffff ffff880017d0c440 [271972.815175] ffff880000c7bc50 ffffffff8169e415 ffff880017d0c438 ffff880000c7bc60 [271972.816422] Call Trace: [271972.816822] [<ffffffff8169e415>] schedule+0x35/0x80 [271972.817633] [<ffffffff8169e6fe>] schedule_preempt_disabled+0xe/0x10 [271972.818651] [<ffffffff816a0194>] __mutex_lock_slowpath+0xb4/0x130 [271972.819638] [<ffffffff816a022f>] mutex_lock+0x1f/0x2f [271972.820472] [<ffffffff812b348d>] selinux_inode_invalidate_secctx+0x1d/0x30 [271972.821571] [<ffffffff812addd5>] security_inode_invalidate_secctx+0x25/0x40 [271972.822690] [<ffffffffa037e61a>] inode_go_inval+0x6a/0xf0 [gfs2] [271972.823664] [<ffffffffa037d255>] do_xmote+0x155/0x250 [gfs2] [271972.824597] [<ffffffffa037d480>] run_queue+0x130/0x250 [gfs2] [271972.825544] [<ffffffffa037d600>] glock_work_func+0x60/0x130 [gfs2] [271972.826560] [<ffffffff810a0b1c>] process_one_work+0x14c/0x3c0 [271972.827505] [<ffffffff810a1354>] worker_thread+0x114/0x470 [271972.828410] [<ffffffff810a1240>] ? rescuer_thread+0x310/0x310 [271972.829413] [<ffffffff810a6e58>] kthread+0xd8/0xf0 [271972.830220] [<ffffffff8108bb9c>] ? do_exit+0x66c/0xab0 [271972.831050] [<ffffffff810a6d80>] ? kthread_park+0x60/0x60 [271972.831910] [<ffffffff816a228f>] ret_from_fork+0x3f/0x70 [271972.832799] [<ffffffff810a6d80>] ? kthread_park+0x60/0x60 [271972.833727] kworker/0:1H D ffff88003bc8bc38 0 11978 2 0x00000080 [271972.834910] Workqueue: glock_workqueue glock_work_func [gfs2] [271972.835865] ffff88003bc8bc38 ffff88003e310000 ffff88003ae65400 ffff88003bc8c000 [271972.837162] ffff880035aa2dec ffff88003ae65400 00000000ffffffff ffff880035aa2df0 [271972.838426] ffff88003bc8bc50 ffffffff8169e415 ffff880035aa2de8 ffff88003bc8bc60 [271972.839685] Call Trace: [271972.840120] [<ffffffff8169e415>] schedule+0x35/0x80 [271972.840907] [<ffffffff8169e6fe>] schedule_preempt_disabled+0xe/0x10 [271972.841932] [<ffffffff816a0194>] __mutex_lock_slowpath+0xb4/0x130 [271972.842917] [<ffffffff816a022f>] mutex_lock+0x1f/0x2f [271972.843754] [<ffffffff812b348d>] selinux_inode_invalidate_secctx+0x1d/0x30 [271972.844852] [<ffffffff812addd5>] security_inode_invalidate_secctx+0x25/0x40 [271972.845986] [<ffffffffa037e61a>] inode_go_inval+0x6a/0xf0 [gfs2] [271972.846981] [<ffffffffa037d255>] do_xmote+0x155/0x250 [gfs2] [271972.847917] [<ffffffffa037d480>] run_queue+0x130/0x250 [gfs2] [271972.848984] [<ffffffffa037d600>] glock_work_func+0x60/0x130 [gfs2] [271972.849986] [<ffffffff810a0b1c>] process_one_work+0x14c/0x3c0 [271972.850932] [<ffffffff810a1354>] worker_thread+0x114/0x470 [271972.851837] [<ffffffff8169de19>] ? __schedule+0x349/0x910 [271972.852731] [<ffffffff810a1240>] ? rescuer_thread+0x310/0x310 [271972.853671] [<ffffffff810a6e58>] kthread+0xd8/0xf0 [271972.854463] [<ffffffff810a6d80>] ? kthread_park+0x60/0x60 [271972.855354] [<ffffffff816a228f>] ret_from_fork+0x3f/0x70 [271972.856228] [<ffffffff810a6d80>] ? kthread_park+0x60/0x60 [271972.857120] xdoio D ffff880017cbfbd0 0 12004 12003 0x00000080 [271972.858290] ffff880017cbfbd0 ffff88003a876900 ffff88003c0a6900 ffff880017cc0000 [271972.859565] ffff880017cbfc60 ffff88003ff314e8 0000000000000002 ffffffff8169ec20 [271972.860819] ffff880017cbfbe8 ffffffff8169e415 ffff880017cbfc48 ffff880017cbfbf8 [271972.862120] Call Trace: [271972.862530] [<ffffffff8169ec20>] ? out_of_line_wait_on_atomic_t+0xf0/0xf0 [271972.863619] [<ffffffff8169e415>] schedule+0x35/0x80 [271972.864443] [<ffffffff8169ec4c>] bit_wait+0x2c/0x50 [271972.865253] [<ffffffff8169e83d>] __wait_on_bit+0x5d/0x90 [271972.866137] [<ffffffff8169ec20>] ? out_of_line_wait_on_atomic_t+0xf0/0xf0 [271972.867233] [<ffffffff8169e8f1>] out_of_line_wait_on_bit+0x81/0xb0 [271972.868246] [<ffffffff810cba70>] ? autoremove_wake_function+0x40/0x40 [271972.869285] [<ffffffffa037c103>] gfs2_glock_wait+0x43/0x80 [gfs2] [271972.870287] [<ffffffffa037d95d>] gfs2_glock_nq+0x21d/0x3f0 [gfs2] [271972.871293] [<ffffffffa038b9ac>] gfs2_getxattr+0xfc/0x140 [gfs2] [271972.872278] [<ffffffffa038b9a2>] ? gfs2_getxattr+0xf2/0x140 [gfs2] [271972.873291] [<ffffffff811e5f18>] ? kmem_cache_alloc_trace+0x38/0x140 [271972.874325] [<ffffffff812b8044>] inode_doinit_with_dentry+0xf4/0x640 [271972.875366] [<ffffffff810cba30>] ? prepare_to_wait_event+0xf0/0xf0 [271972.876369] [<ffffffff812b85e1>] __inode_security_revalidate+0x51/0x60 [271972.877432] [<ffffffff812b984f>] selinux_file_permission+0x2f/0x130 [271972.878454] [<ffffffff812afe1d>] security_file_permission+0x3d/0xc0 [271972.879468] [<ffffffff81205cc2>] rw_verify_area+0x52/0xd0 [271972.880346] [<ffffffff81205da6>] vfs_read+0x66/0x130 [271972.881177] [<ffffffff81206c35>] SyS_read+0x55/0xc0 [271972.881959] [<ffffffff816a1f2e>] entry_SYSCALL_64_fastpath+0x12/0x71 [271972.882990] xdoio D ffff880035a33bd0 0 12053 12052 0x00000080 [271972.884184] ffff880035a33bd0 ffff88003e310000 ffff88003ab7d400 ffff880035a34000 [271972.885450] ffff880035a33c60 ffff88003ff314b8 0000000000000002 ffffffff8169ec20 [271972.886710] ffff880035a33be8 ffffffff8169e415 ffff880035a33c48 ffff880035a33bf8 [271972.887974] Call Trace: [271972.888409] [<ffffffff8169ec20>] ? out_of_line_wait_on_atomic_t+0xf0/0xf0 [271972.889516] [<ffffffff8169e415>] schedule+0x35/0x80 [271972.890324] [<ffffffff8169ec4c>] bit_wait+0x2c/0x50 [271972.891140] [<ffffffff8169e83d>] __wait_on_bit+0x5d/0x90 [271972.891989] [<ffffffffa038496b>] ? gfs2_write_end+0x1eb/0x440 [gfs2] [271972.893029] [<ffffffff8169ec20>] ? out_of_line_wait_on_atomic_t+0xf0/0xf0 [271972.894142] [<ffffffff8169e8f1>] out_of_line_wait_on_bit+0x81/0xb0 [271972.895155] [<ffffffff810cba70>] ? autoremove_wake_function+0x40/0x40 [271972.896208] [<ffffffffa037c103>] gfs2_glock_wait+0x43/0x80 [gfs2] [271972.897197] [<ffffffffa037d95d>] gfs2_glock_nq+0x21d/0x3f0 [gfs2] [271972.898192] [<ffffffffa038b9ac>] gfs2_getxattr+0xfc/0x140 [gfs2] [271972.899186] [<ffffffffa038b9a2>] ? gfs2_getxattr+0xf2/0x140 [gfs2] [271972.900196] [<ffffffff811e5f18>] ? kmem_cache_alloc_trace+0x38/0x140 [271972.901217] [<ffffffff812b8044>] inode_doinit_with_dentry+0xf4/0x640 [271972.902240] [<ffffffff810cba30>] ? prepare_to_wait_event+0xf0/0xf0 [271972.903237] [<ffffffff812b85e1>] __inode_security_revalidate+0x51/0x60 [271972.904286] [<ffffffff812b984f>] selinux_file_permission+0x2f/0x130 [271972.905307] [<ffffffff812afe1d>] security_file_permission+0x3d/0xc0 [271972.906319] [<ffffffff81205cc2>] rw_verify_area+0x52/0xd0 [271972.907197] [<ffffffff81205da6>] vfs_read+0x66/0x130 [271972.907979] [<ffffffff81206c35>] SyS_read+0x55/0xc0 [271972.908772] [<ffffffff816a1f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
Created attachment 1219100 [details] [PATCH 1/3] selinux: Minor cleanups Fix the comment for function __inode_security_revalidate, which returns an integer. Use the LABEL_* constants consistently for isec->initialized.
Created attachment 1219101 [details] [PATCH 2/3] selinux: Clean up initialization of isec->sclass When isec->initialized is LABEL_INITIALIZED, assume that isec->sclass is initialized already. Otherwise, initialize it right at the beginning of inode_doinit_with_dentry.
Created attachment 1219102 [details] [PATCH 3/3] selinux: Convert isec->lock into spinlock Convert isec->lock from a mutex into a spinlock. Instead of holding the lock in a sleeping context in inode_doinit_with_dentry, set isec->initialized to LABEL_PENDING and release the lock. Then, when the sid has been determined, re-acquire the lock. If isec->initialized is still set to LABEL_PENDING, set isec->sid; otherwise, the sid has been set by another task (LABEL_INITIALIZED) or invalidated (LABEL_INVALID) in the meantime. This fixes a deadlock on gfs2 where * one task is in inode_doinit_with_dentry -> gfs2_getxattr, holds isec->lock, and tries to acquire the inode's glock, and * another task is in do_xmote -> inode_go_inval -> selinux_inode_invalidate_secctx, holds the inode's glock, and tries to acquire isec->lock.
Nate, can you check if this set of three selinux patches fixes things? (They are rather unlikely to make it into the 4.9 kernel, but still ...)
Created attachment 1219388 [details] Fixed patches It turned out that the middle patch depended on an additional proc fix. The other patches are mostly unchanged; here is an updated patch queue for testing.
I was able to get through a QUICK version of locksmith with the fixed patches. I'll keep going with the rest of the regression suite.
A quick note, it's really awkward to review patches via Bugzilla, doubly so if the patches are sent as a tarball. That said, looking at patch 4/4, this looks like what we talked about on IRC a few days ago, with the exception of the change from a mutex to a spinlock. However, given the short duration of the lock that seems like a reasonable change.
The rest of the regression suite ran without issues.
I re-ran this on an upstream 4.12.5 kernel and locksmith passed without issues.