Bug 812798 - selinux policy -114 causes all FUSE mounts to hang in Fedora 17
selinux policy -114 causes all FUSE mounts to hang in Fedora 17
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: selinux-policy (Show other bugs)
17
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Miroslav Grepl
Fedora Extras Quality Assurance
:
: 812550 812669 812795 812914 813060 813088 (view as bug list)
Depends On:
Blocks: 1060423
  Show dependency treegraph
 
Reported: 2012-04-16 05:22 EDT by Richard W.M. Jones
Modified: 2014-02-01 10:15 EST (History)
17 users (show)

See Also:
Fixed In Version: selinux-policy-3.10.0-116.fc17
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1060423 (view as bug list)
Environment:
Last Closed: 2012-04-18 19:08:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
strace of hanging process (17.62 KB, text/plain)
2012-04-16 05:22 EDT, Richard W.M. Jones
no flags Details
sysrq w when a process is hanging (31.65 KB, text/plain)
2012-04-16 15:47 EDT, Richard W.M. Jones
no flags Details

  None (edit)
Description Richard W.M. Jones 2012-04-16 05:22:57 EDT
Created attachment 577673 [details]
strace of hanging process

Description of problem:

This has just started to happen recently.  FUSE mounts simply
hang in Fedora 17.

Version-Release number of selected component (if applicable):

fuse-2.8.7-1.fc17.x86_64 [1]
kernel 3.3.1-5.fc17.x86_64

[1] see bug 784823 -- I also tried fuse 2.8.6 and it hangs
in the same way

How reproducible:

100%

Steps to Reproduce:
1. mount any fuse filesystem
2. it hangs
3.

The hang occurs in fusermount, here:

22376 mount("/dev/fuse", ".", "fuse", MS_NOSUID|MS_NODEV, "fd=4,rootmode=40000,\
user_id=1000,group_id=1000" <unfinished ...>                                    

After killing fusermount, the module link count of the module
is > 1, even though there are no fuse mounts and nothing else
uses the module:

$ sudo lsmod | grep fuse
fuse                   77772  3 

(It increases by 3 every time I kill a hanging mount)

'sync' command hangs in D state, arrgghh!

I have attached a full 'strace' showing the hanging process.
Comment 1 Richard W.M. Jones 2012-04-16 05:25:03 EDT
Addition details:

SELinux is permissive.

No output in dmesg, except:

[  119.265866] fuse init (API version 7.18)
[  119.504612] SELinux: (dev fusectl, type fusectl) has no xattr support
Comment 2 Richard W.M. Jones 2012-04-16 05:31:42 EDT
I also tried 2.9.0-pre0 (ie. upstream git) of the userspace
library, but same thing.
Comment 3 Richard W.M. Jones 2012-04-16 05:49:55 EDT
Also tried latest f17 kernel (kernel-3.3.2-1.fc17), same thing.
Comment 4 Richard W.M. Jones 2012-04-16 06:47:59 EDT
Same problem with 3.3.1-3.fc17.x86_64.
Comment 5 Richard W.M. Jones 2012-04-16 07:16:32 EDT
OK, *disabling* selinux fixes it.

Note, just setting selinux to permissive doesn't fix it.

I suspect that what's happening is this is a reappearance
of bug 493565.  That bug has an extremely long history, but
if you read through the comments starting here:

https://bugzilla.redhat.com/show_bug.cgi?id=493565#c44

then you'll see it's some sort of problem with auditd
trying to access the mount point during the mount process,
and causing a deadlock.
Comment 6 Mamoru TASAKA 2012-04-16 07:40:50 EDT
Is bug 812588 related this this?
Comment 7 Richard W.M. Jones 2012-04-16 07:59:35 EDT
Simple reproducer:

git clone git://fuse.git.sourceforge.net/gitroot/fuse/fuse
cd fuse
./configure
make
mkdir /tmp/mnt
cd examples
./hello /tmp/mnt

The final command hangs if selinux is enabled OR permissive.
The final command completes if selinux is disabled.
Comment 8 Richard W.M. Jones 2012-04-16 08:27:32 EDT
Interesting, downgrading selinux-policy from 0:3.10.0-114.fc17
to -110.fc17 fixes the problem.
Comment 9 Miroslav Grepl 2012-04-16 09:01:09 EDT
Could you test it with 

-115.fc17 release which is available from koji.
Comment 10 Richard W.M. Jones 2012-04-16 09:35:42 EDT
No, it still hangs with -115.
Comment 11 Miroslav Grepl 2012-04-16 09:46:39 EDT
This relates with the latest fixes which we added.
Comment 12 Daniel Walsh 2012-04-16 09:47:45 EDT
This is a kernel/fuse issue.
Comment 13 Miroslav Grepl 2012-04-16 09:54:19 EDT
The question if we can get a fix on time. -114.fc16 is going to be marked as stable.
Comment 14 Richard W.M. Jones 2012-04-16 09:55:38 EDT
One additional thing I found is that something (auditd possibly)
makes a 'getattr' call very early on after the filesystem is
mounted, and that seems to be what causes the deadlock.

So an selinux rule that made getattr into a 'dontaudit' seems
like it might make the problem go away (or conversely adding
an audit rule would cause the problem).

I looked through all the changes from -110 through -115 which
might do this, and there are a couple of lines like this:

++      dontaudit $1 tmp_t:dir getattr;

(My test used /tmp)
Comment 15 Richard W.M. Jones 2012-04-16 09:58:48 EDT
... although that would be the reverse, causing the
problem to go away.  Still looking ...
Comment 16 Eric Paris 2012-04-16 10:40:57 EDT
What changed in policy.  FUSE does not support xattrs from an SELinux PoV.  We would like to see that change, but for now, if you added xattr support for FUSE in policy take it out.  It is wrong.  That IS a policy bug.
Comment 17 Richard W.M. Jones 2012-04-16 10:49:20 EDT
Eric, which bit of policy specifically?  I see two sections
that were added that seem to relate to fuse:

++interface(`fs_fusefs_domtrans',`
++      gen_require(`
++              type fusefs_t;
++      ')
++
++      allow $1 fusefs_t:dir search_dir_perms;
++      domain_auto_transition_pattern($1, fusefs_t, $2)
++')

and

+ type fusefs_t;
+-fs_noxattr_type(fusefs_t)
++fs_type(fusefs_t)
++files_type(fusefs_t)
+ files_mountpoint(fusefs_t)
++files_poly_parent(fusefs_t)
++dev_associate(fusefs_t)
++
+ allow fusefs_t self:filesystem associate;
+ allow fusefs_t fs_t:filesystem associate;
+-genfscon fuse / gen_context(system_u:object_r:fusefs_t,s0)
+-genfscon fuseblk / gen_context(system_u:object_r:fusefs_t,s0)
+-genfscon fusectl / gen_context(system_u:object_r:fusefs_t,s0)
+ 
++# Use a transition SID based on the allocating task SID and the
++# filesystem SID to label inodes in the following filesystem types,
++# and label the filesystem itself with the specified context.
++# This is appropriate for pseudo filesystems like devpts and tmpfs
++# where we want to label objects with a derived type.
++fs_use_xattr fuse gen_context(system_u:object_r:fusefs_t,s0);
++fs_use_xattr fuseblk gen_context(system_u:object_r:fusefs_t,s0);
++fs_use_xattr fusectl gen_context(system_u:object_r:fusefs_t,s0);
++allow fusefs_t noxattrfs:filesystem associate;
Comment 18 Eric Paris 2012-04-16 11:04:54 EDT
It's that whole second hunk.  Dan, you knew the FUSE tools cannot support xattrs.  If we can convince the FUSE people to fix their stuff you can make these changes (and add a dependency on the new version of FUSE utils) but for now, that is a bug and should be reverted.
Comment 19 Daniel Walsh 2012-04-16 11:35:23 EDT
Ok, I did not know that the change was causing the problem.

Miroslav lets revert.
Comment 20 Josef Bacik 2012-04-16 14:33:59 EDT
Can I get sysrq+w when the process hangs?  I feel like I know what's going on here and I'd like to have a trace to show Miklos when I resend the patch I wrote forever ago to fix this (if it's the same problem).
Comment 21 Eric Paris 2012-04-16 14:47:33 EDT
Josef, it's a different problem.  i should get you and Jeff Darcy talking.  We've been arguing about how to solve this problem....
Comment 22 Tom "spot" Callaway 2012-04-16 14:51:42 EDT
*** Bug 813060 has been marked as a duplicate of this bug. ***
Comment 23 Richard W.M. Jones 2012-04-16 15:47:35 EDT
Created attachment 577813 [details]
sysrq w when a process is hanging

FWIW here is the sysrq 'w' output when a FUSE process is hanging.
Comment 24 Richard W.M. Jones 2012-04-16 15:49:35 EDT
I've just realized that output is essentially empty.

The FUSE process itself doesn't go into TASK_UNINTERRUPTIBLE,
and is killable (albeit only by root & kill -9).

'sync' goes into D state, had I run it.
Comment 25 Tom "spot" Callaway 2012-04-16 16:51:30 EDT
*** Bug 812795 has been marked as a duplicate of this bug. ***
Comment 26 Miroslav Grepl 2012-04-16 16:59:50 EDT
(In reply to comment #19)
> Ok, I did not know that the change was causing the problem.
> 
> Miroslav lets revert.

Fixed in selinux-policy-3.10.0-116.fc17
Comment 27 Fedora Update System 2012-04-16 17:29:01 EDT
selinux-policy-3.10.0-116.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/selinux-policy-3.10.0-116.fc17
Comment 28 Richard W.M. Jones 2012-04-16 18:18:42 EDT
-116 fixes the problem over here.
Comment 29 Jaroslav Franek 2012-04-16 18:38:25 EDT
Yep, with -116 NTFS mounts work again (Bug #813060). Thanks guys.
Comment 30 Mikhail 2012-04-17 00:12:19 EDT
The worstest in this case that while the process mount.ntfs is hanging other devices are not recognized. Here's an example I connected 3G modem. While I did not kill mount.ntfs he did not want to be defined.


[16286.732046] usb 2-2: new high-speed USB device number 9 using ehci_hcd
[16286.921300] usb 2-2: New USB device found, idVendor=8564, idProduct=1000
[16286.921307] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[16286.921311] usb 2-2: Product: Mass Storage Device
[16286.921315] usb 2-2: Manufacturer: JetFlash
[16286.921319] usb 2-2: SerialNumber: 85P8KCQ2A75S3WRH
[16286.922411] scsi11 : usb-storage 2-2:1.0
[16288.324206] scsi 11:0:0:0: Direct-Access     JetFlash Transcend 16GB   1100 PQ: 0 ANSI: 0 CCS
[16288.325786] sd 11:0:0:0: Attached scsi generic sg1 type 0
[16288.328546] sd 11:0:0:0: [sdb] 31703040 512-byte logical blocks: (16.2 GB/15.1 GiB)
[16288.329293] sd 11:0:0:0: [sdb] Write Protect is off
[16288.329300] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 00
[16288.330049] sd 11:0:0:0: [sdb] No Caching mode page present
[16288.330056] sd 11:0:0:0: [sdb] Assuming drive cache: write through
[16288.333287] sd 11:0:0:0: [sdb] No Caching mode page present
[16288.333292] sd 11:0:0:0: [sdb] Assuming drive cache: write through
[16288.334486]  sdb: sdb1
[16288.337175] sd 11:0:0:0: [sdb] No Caching mode page present
[16288.337183] sd 11:0:0:0: [sdb] Assuming drive cache: write through
[16288.337188] sd 11:0:0:0: [sdb] Attached SCSI removable disk
[16292.688991] usb 2-2: USB disconnect, device number 9
[16292.699032] sd 11:0:0:0: [sdb] Unhandled error code
[16292.699036] sd 11:0:0:0: [sdb]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[16292.699040] sd 11:0:0:0: [sdb] CDB: Write(10): 2a 00 00 5c 78 f8 00 00 f0 00
[16292.699048] end_request: I/O error, dev sdb, sector 6060280
[16292.699053] Buffer I/O error on device sdb1, logical block 6060248
[16292.699055] lost page write due to I/O error on sdb1
[16292.699058] Buffer I/O error on device sdb1, logical block 6060249
[16292.699060] lost page write due to I/O error on sdb1
[16292.699062] Buffer I/O error on device sdb1, logical block 6060250
[16292.699064] lost page write due to I/O error on sdb1
[16292.699066] Buffer I/O error on device sdb1, logical block 6060251
[16292.699068] lost page write due to I/O error on sdb1
[16292.699070] Buffer I/O error on device sdb1, logical block 6060252
[16292.699072] lost page write due to I/O error on sdb1
[16292.699074] Buffer I/O error on device sdb1, logical block 6060253
[16292.699076] lost page write due to I/O error on sdb1
[16292.699078] Buffer I/O error on device sdb1, logical block 6060254
[16292.699080] lost page write due to I/O error on sdb1
[16292.699082] Buffer I/O error on device sdb1, logical block 6060255
[16292.699083] lost page write due to I/O error on sdb1
[16292.699088] Buffer I/O error on device sdb1, logical block 6060256
[16292.699090] lost page write due to I/O error on sdb1
[16292.699092] Buffer I/O error on device sdb1, logical block 6060257
[16292.699094] lost page write due to I/O error on sdb1
[16292.728147] sd 11:0:0:0: [sdb] Unhandled error code
[16292.728151] sd 11:0:0:0: [sdb]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[16292.728155] sd 11:0:0:0: [sdb] CDB: Write(10): 2a 00 00 5c 79 e8 00 00 f0 00
[16292.728164] end_request: I/O error, dev sdb, sector 6060520
[19662.601503] show_signal_msg: 470 callbacks suppressed
[19662.601508] Compositor[1992]: segfault at 0 ip   (null) sp af4a68bc error 14
[24483.840535] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[27629.446130] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[27633.335602] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[85599.356284] SELinux: (dev sdb1, type fuseblk) getxattr errno 4
[85599.569063] usb 2-2: new high-speed USB device number 10 using ehci_hcd
[85599.685686] usb 2-2: New USB device found, idVendor=0502, idProduct=3223
[85599.685693] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[85599.685698] usb 2-2: Product: Android HSUSB Device
[85599.685702] usb 2-2: Manufacturer: Acer Incorporated
[85599.685705] usb 2-2: SerialNumber: 0000038460334896
[85599.691061] rndis_host 2-2:1.0: usb0: register 'rndis_host' at usb-0000:00:1d.7-2, RNDIS device, 4a:e9:c0:20:4d:99
[85615.730020] usb0: no IPv6 routers present
[mikhail@telecon17l ~]$
Comment 31 Ankur Sinha (FranciscoD) 2012-04-17 01:17:11 EDT
*** Bug 813088 has been marked as a duplicate of this bug. ***
Comment 32 Miroslav Grepl 2012-04-17 04:26:31 EDT
(In reply to comment #28)
> -116 fixes the problem over here.

Could you update karma. Thanks.
Comment 33 Tom "spot" Callaway 2012-04-17 14:56:08 EDT
*** Bug 812669 has been marked as a duplicate of this bug. ***
Comment 34 Tom "spot" Callaway 2012-04-17 14:56:14 EDT
*** Bug 812914 has been marked as a duplicate of this bug. ***
Comment 35 David L. Crow 2012-04-17 21:01:15 EDT
I have a dual boot machine with NTFS filesystems, and the problem reported here prevented the successful install of Fedora 17 Beta.  The install hung when running grub2-mkconfig (which calls os-prober) during the bootloader update.  The only recovery was to change virtual terminals to the shell and kill -9 the mount command over and over for every one of my NTFS partitions.

Fortunately, I was able to manually merge the partially generated grub.cfg with the pre-existing one and was able to boot successfully.

Installing the -116 update to the selinux packages after booting allowed me to successfully run grub2-mkconfig.
Comment 36 Fedora Update System 2012-04-18 19:08:22 EDT
selinux-policy-3.10.0-116.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 37 Jeff Darcy 2012-04-19 11:53:12 EDT
(In reply to comment #18)
> It's that whole second hunk.  Dan, you knew the FUSE tools cannot support
> xattrs.

Eric, you know they can.  FUSE filesystems can support xattrs generally, and can even support the way SELinux uses them from within mount() if they do certain things.  You and I have discussed this on bug811217 and in email during the last few days.  The problem is that the SELinux policy doesn't distinguish among FUSE filesystems.  As a result it disables labeling even for those that can support it, but allowing it for those that don't leads to this bug which is even worse.  We need some way to distinguish between "good" and "bad" FUSE fileystems either in the policy or at run time (e.g. presence of a new mount option causes "promotion" from safe-but-limited fusefs_t to generic fs_t).
Comment 38 Eric Paris 2012-04-19 14:07:53 EDT
Jeff, agree with everything you said.  Dan was premature on the policy changes.  When we get there we also need to make sure that policy conflicts with FUSE userspace which would leaves us with a deadlock.

So it seems to be like you already did the hard part.  You solved the deadlock!  Can I get that patch from you?  Josef Bacik, agreed to jam it into Fedora so we can get some testing and see what blows up.

I'll also try to sort out the selinux kernel and policy problems, hopefully we can cleanly figure out which FS support xattrs on the fly (I did patches once before for this, but dropped them when we found that FUSE deadlocked)
Comment 39 Jeff Darcy 2012-04-19 15:01:10 EDT
I'm working on the patch now.  I've talked to Miklos and he seems OK with the idea, but if we want to get it into Fedora separately that's OK too.  I just need to iron out the way we (eventually) collect the mount status.

As for the rest, the current solution does avoid the deadlock but does so at the cost of fuse_mount_sys returning prematurely, which can cause spurious errors if somebody actually tries to use the new mountpoint while the forked mount process is still working.  To avoid that, a FUSE filesystem would not only need to use the patched libfuse but would need to call fuse_mount_sys with special arguments after it had a thread ready to handle getxattr requests.  I can make those changes for GlusterFS, but not for others which would then have glitchy behavior (but I guess at least they wouldn't deadlock).  Since I don't think we can determine by probing that they handle that part right, I think we should skip the probe and act as if it failed unless a specific mount option is present.
Comment 40 Richard W.M. Jones 2012-04-19 15:17:06 EDT
Couple of things:

- Please put this in Rawhide, not F17.

- Is there a summary available anywhere of what changes need
to be made to an existing FUSE filesystem so that it has
getxattr-compatible-with-SELinux support?  I would like to make
the necessary changes to guestmount.
Comment 41 Peter Lemenkov 2012-04-19 16:06:52 EDT
*** Bug 812550 has been marked as a duplicate of this bug. ***
Comment 42 Jeff Darcy 2012-04-19 17:26:48 EDT
There are three things that are necessary for some arbitrary FooFS to work.

(1) FooFS must open /dev/fuse itself, and be polling already in a separate thread when it passes that fd to fuse_mount_sys (vs. getting it back from fuse_mount_sys as currently).

(2) FooFS must also create a pipe and pass the write side to fuse_mount_sys, which will use it (in the child) to pass PID and status back when the mount call is complete.

(3) The policies have to be fixed (see bug811217#c4).

http://review.gluster.com/#change,3199 has the FUSE and FS changes for GlusterFS.  The mount.c changes areto Gluster's own version, unfortunately, so I still need to "backport" those to the upstream version before sending on to Miklos.
Comment 43 Mikhail 2012-04-20 04:08:08 EDT
After updating selinux policy to 3.10.0-116 version, again start occurs problem https://bugzilla.redhat.com/show_bug.cgi?id=801600 while mount network shares.
Comment 44 Colin Walters 2014-02-01 07:26:46 EST
What's the status of this bug?  This came up to the top of my list now; my build
system is using libguestfs's "guestmount" to mount filesystems
from the host, so I can create disk images, and also do
incremental upgrades.

I think I can do an elaborate workaround by skipping xattr copying using FUSE, and then do it via the libguestfs API, but it'll be incredibly ugly and painful.  Would be far nicer to be able to just call setxattr().

Since this one is closed, OK to clone it as a new bug for tracking adding libguestfs+FUSE+SELinux?
Comment 45 Richard W.M. Jones 2014-02-01 08:42:44 EST
The actual bug (selinux-policy) is of course fixed.

Unfortunately fixing the guestmount + xattr problem is quite a
lot more complex than it seems.  I will try and find the links to
everything when I have more reasonable internet access.

The basic problem is it requires us to change the libguestfs API
in order to allow multithreaded guestmount to be written, since
threads are required in order for guestmount to "answer" the getxattr
call that happens during mount when SELinux is enabled.

Note You need to log in before you can comment on or make changes to this bug.