Created attachment 577673 [details]
strace of hanging process
Description of problem:
This has just started to happen recently. FUSE mounts simply
hang in Fedora 17.
Version-Release number of selected component (if applicable):
 see bug 784823 -- I also tried fuse 2.8.6 and it hangs
in the same way
Steps to Reproduce:
1. mount any fuse filesystem
2. it hangs
The hang occurs in fusermount, here:
22376 mount("/dev/fuse", ".", "fuse", MS_NOSUID|MS_NODEV, "fd=4,rootmode=40000,\
user_id=1000,group_id=1000" <unfinished ...>
After killing fusermount, the module link count of the module
is > 1, even though there are no fuse mounts and nothing else
uses the module:
$ sudo lsmod | grep fuse
fuse 77772 3
(It increases by 3 every time I kill a hanging mount)
'sync' command hangs in D state, arrgghh!
I have attached a full 'strace' showing the hanging process.
SELinux is permissive.
No output in dmesg, except:
[ 119.265866] fuse init (API version 7.18)
[ 119.504612] SELinux: (dev fusectl, type fusectl) has no xattr support
I also tried 2.9.0-pre0 (ie. upstream git) of the userspace
library, but same thing.
Also tried latest f17 kernel (kernel-3.3.2-1.fc17), same thing.
Same problem with 3.3.1-3.fc17.x86_64.
OK, *disabling* selinux fixes it.
Note, just setting selinux to permissive doesn't fix it.
I suspect that what's happening is this is a reappearance
of bug 493565. That bug has an extremely long history, but
if you read through the comments starting here:
then you'll see it's some sort of problem with auditd
trying to access the mount point during the mount process,
and causing a deadlock.
Is bug 812588 related this this?
git clone git://fuse.git.sourceforge.net/gitroot/fuse/fuse
The final command hangs if selinux is enabled OR permissive.
The final command completes if selinux is disabled.
Interesting, downgrading selinux-policy from 0:3.10.0-114.fc17
to -110.fc17 fixes the problem.
Could you test it with
-115.fc17 release which is available from koji.
No, it still hangs with -115.
This relates with the latest fixes which we added.
This is a kernel/fuse issue.
The question if we can get a fix on time. -114.fc16 is going to be marked as stable.
One additional thing I found is that something (auditd possibly)
makes a 'getattr' call very early on after the filesystem is
mounted, and that seems to be what causes the deadlock.
So an selinux rule that made getattr into a 'dontaudit' seems
like it might make the problem go away (or conversely adding
an audit rule would cause the problem).
I looked through all the changes from -110 through -115 which
might do this, and there are a couple of lines like this:
++ dontaudit $1 tmp_t:dir getattr;
(My test used /tmp)
... although that would be the reverse, causing the
problem to go away. Still looking ...
What changed in policy. FUSE does not support xattrs from an SELinux PoV. We would like to see that change, but for now, if you added xattr support for FUSE in policy take it out. It is wrong. That IS a policy bug.
Eric, which bit of policy specifically? I see two sections
that were added that seem to relate to fuse:
++ type fusefs_t;
++ allow $1 fusefs_t:dir search_dir_perms;
++ domain_auto_transition_pattern($1, fusefs_t, $2)
+ type fusefs_t;
+ allow fusefs_t self:filesystem associate;
+ allow fusefs_t fs_t:filesystem associate;
+-genfscon fuse / gen_context(system_u:object_r:fusefs_t,s0)
+-genfscon fuseblk / gen_context(system_u:object_r:fusefs_t,s0)
+-genfscon fusectl / gen_context(system_u:object_r:fusefs_t,s0)
++# Use a transition SID based on the allocating task SID and the
++# filesystem SID to label inodes in the following filesystem types,
++# and label the filesystem itself with the specified context.
++# This is appropriate for pseudo filesystems like devpts and tmpfs
++# where we want to label objects with a derived type.
++fs_use_xattr fuse gen_context(system_u:object_r:fusefs_t,s0);
++fs_use_xattr fuseblk gen_context(system_u:object_r:fusefs_t,s0);
++fs_use_xattr fusectl gen_context(system_u:object_r:fusefs_t,s0);
++allow fusefs_t noxattrfs:filesystem associate;
It's that whole second hunk. Dan, you knew the FUSE tools cannot support xattrs. If we can convince the FUSE people to fix their stuff you can make these changes (and add a dependency on the new version of FUSE utils) but for now, that is a bug and should be reverted.
Ok, I did not know that the change was causing the problem.
Miroslav lets revert.
Can I get sysrq+w when the process hangs? I feel like I know what's going on here and I'd like to have a trace to show Miklos when I resend the patch I wrote forever ago to fix this (if it's the same problem).
Josef, it's a different problem. i should get you and Jeff Darcy talking. We've been arguing about how to solve this problem....
*** Bug 813060 has been marked as a duplicate of this bug. ***
Created attachment 577813 [details]
sysrq w when a process is hanging
FWIW here is the sysrq 'w' output when a FUSE process is hanging.
I've just realized that output is essentially empty.
The FUSE process itself doesn't go into TASK_UNINTERRUPTIBLE,
and is killable (albeit only by root & kill -9).
'sync' goes into D state, had I run it.
*** Bug 812795 has been marked as a duplicate of this bug. ***
(In reply to comment #19)
> Ok, I did not know that the change was causing the problem.
> Miroslav lets revert.
Fixed in selinux-policy-3.10.0-116.fc17
selinux-policy-3.10.0-116.fc17 has been submitted as an update for Fedora 17.
-116 fixes the problem over here.
Yep, with -116 NTFS mounts work again (Bug #813060). Thanks guys.
The worstest in this case that while the process mount.ntfs is hanging other devices are not recognized. Here's an example I connected 3G modem. While I did not kill mount.ntfs he did not want to be defined.
[16286.732046] usb 2-2: new high-speed USB device number 9 using ehci_hcd
[16286.921300] usb 2-2: New USB device found, idVendor=8564, idProduct=1000
[16286.921307] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[16286.921311] usb 2-2: Product: Mass Storage Device
[16286.921315] usb 2-2: Manufacturer: JetFlash
[16286.921319] usb 2-2: SerialNumber: 85P8KCQ2A75S3WRH
[16286.922411] scsi11 : usb-storage 2-2:1.0
[16288.324206] scsi 11:0:0:0: Direct-Access JetFlash Transcend 16GB 1100 PQ: 0 ANSI: 0 CCS
[16288.325786] sd 11:0:0:0: Attached scsi generic sg1 type 0
[16288.328546] sd 11:0:0:0: [sdb] 31703040 512-byte logical blocks: (16.2 GB/15.1 GiB)
[16288.329293] sd 11:0:0:0: [sdb] Write Protect is off
[16288.329300] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 00
[16288.330049] sd 11:0:0:0: [sdb] No Caching mode page present
[16288.330056] sd 11:0:0:0: [sdb] Assuming drive cache: write through
[16288.333287] sd 11:0:0:0: [sdb] No Caching mode page present
[16288.333292] sd 11:0:0:0: [sdb] Assuming drive cache: write through
[16288.334486] sdb: sdb1
[16288.337175] sd 11:0:0:0: [sdb] No Caching mode page present
[16288.337183] sd 11:0:0:0: [sdb] Assuming drive cache: write through
[16288.337188] sd 11:0:0:0: [sdb] Attached SCSI removable disk
[16292.688991] usb 2-2: USB disconnect, device number 9
[16292.699032] sd 11:0:0:0: [sdb] Unhandled error code
[16292.699036] sd 11:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[16292.699040] sd 11:0:0:0: [sdb] CDB: Write(10): 2a 00 00 5c 78 f8 00 00 f0 00
[16292.699048] end_request: I/O error, dev sdb, sector 6060280
[16292.699053] Buffer I/O error on device sdb1, logical block 6060248
[16292.699055] lost page write due to I/O error on sdb1
[16292.699058] Buffer I/O error on device sdb1, logical block 6060249
[16292.699060] lost page write due to I/O error on sdb1
[16292.699062] Buffer I/O error on device sdb1, logical block 6060250
[16292.699064] lost page write due to I/O error on sdb1
[16292.699066] Buffer I/O error on device sdb1, logical block 6060251
[16292.699068] lost page write due to I/O error on sdb1
[16292.699070] Buffer I/O error on device sdb1, logical block 6060252
[16292.699072] lost page write due to I/O error on sdb1
[16292.699074] Buffer I/O error on device sdb1, logical block 6060253
[16292.699076] lost page write due to I/O error on sdb1
[16292.699078] Buffer I/O error on device sdb1, logical block 6060254
[16292.699080] lost page write due to I/O error on sdb1
[16292.699082] Buffer I/O error on device sdb1, logical block 6060255
[16292.699083] lost page write due to I/O error on sdb1
[16292.699088] Buffer I/O error on device sdb1, logical block 6060256
[16292.699090] lost page write due to I/O error on sdb1
[16292.699092] Buffer I/O error on device sdb1, logical block 6060257
[16292.699094] lost page write due to I/O error on sdb1
[16292.728147] sd 11:0:0:0: [sdb] Unhandled error code
[16292.728151] sd 11:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[16292.728155] sd 11:0:0:0: [sdb] CDB: Write(10): 2a 00 00 5c 79 e8 00 00 f0 00
[16292.728164] end_request: I/O error, dev sdb, sector 6060520
[19662.601503] show_signal_msg: 470 callbacks suppressed
[19662.601508] Compositor: segfault at 0 ip (null) sp af4a68bc error 14
[24483.840535] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[27629.446130] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[27633.335602] SELinux: initialized (dev proc, type proc), uses genfs_contexts
[85599.356284] SELinux: (dev sdb1, type fuseblk) getxattr errno 4
[85599.569063] usb 2-2: new high-speed USB device number 10 using ehci_hcd
[85599.685686] usb 2-2: New USB device found, idVendor=0502, idProduct=3223
[85599.685693] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[85599.685698] usb 2-2: Product: Android HSUSB Device
[85599.685702] usb 2-2: Manufacturer: Acer Incorporated
[85599.685705] usb 2-2: SerialNumber: 0000038460334896
[85599.691061] rndis_host 2-2:1.0: usb0: register 'rndis_host' at usb-0000:00:1d.7-2, RNDIS device, 4a:e9:c0:20:4d:99
[85615.730020] usb0: no IPv6 routers present
*** Bug 813088 has been marked as a duplicate of this bug. ***
(In reply to comment #28)
> -116 fixes the problem over here.
Could you update karma. Thanks.
*** Bug 812669 has been marked as a duplicate of this bug. ***
*** Bug 812914 has been marked as a duplicate of this bug. ***
I have a dual boot machine with NTFS filesystems, and the problem reported here prevented the successful install of Fedora 17 Beta. The install hung when running grub2-mkconfig (which calls os-prober) during the bootloader update. The only recovery was to change virtual terminals to the shell and kill -9 the mount command over and over for every one of my NTFS partitions.
Fortunately, I was able to manually merge the partially generated grub.cfg with the pre-existing one and was able to boot successfully.
Installing the -116 update to the selinux packages after booting allowed me to successfully run grub2-mkconfig.
selinux-policy-3.10.0-116.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
(In reply to comment #18)
> It's that whole second hunk. Dan, you knew the FUSE tools cannot support
Eric, you know they can. FUSE filesystems can support xattrs generally, and can even support the way SELinux uses them from within mount() if they do certain things. You and I have discussed this on bug811217 and in email during the last few days. The problem is that the SELinux policy doesn't distinguish among FUSE filesystems. As a result it disables labeling even for those that can support it, but allowing it for those that don't leads to this bug which is even worse. We need some way to distinguish between "good" and "bad" FUSE fileystems either in the policy or at run time (e.g. presence of a new mount option causes "promotion" from safe-but-limited fusefs_t to generic fs_t).
Jeff, agree with everything you said. Dan was premature on the policy changes. When we get there we also need to make sure that policy conflicts with FUSE userspace which would leaves us with a deadlock.
So it seems to be like you already did the hard part. You solved the deadlock! Can I get that patch from you? Josef Bacik, agreed to jam it into Fedora so we can get some testing and see what blows up.
I'll also try to sort out the selinux kernel and policy problems, hopefully we can cleanly figure out which FS support xattrs on the fly (I did patches once before for this, but dropped them when we found that FUSE deadlocked)
I'm working on the patch now. I've talked to Miklos and he seems OK with the idea, but if we want to get it into Fedora separately that's OK too. I just need to iron out the way we (eventually) collect the mount status.
As for the rest, the current solution does avoid the deadlock but does so at the cost of fuse_mount_sys returning prematurely, which can cause spurious errors if somebody actually tries to use the new mountpoint while the forked mount process is still working. To avoid that, a FUSE filesystem would not only need to use the patched libfuse but would need to call fuse_mount_sys with special arguments after it had a thread ready to handle getxattr requests. I can make those changes for GlusterFS, but not for others which would then have glitchy behavior (but I guess at least they wouldn't deadlock). Since I don't think we can determine by probing that they handle that part right, I think we should skip the probe and act as if it failed unless a specific mount option is present.
Couple of things:
- Please put this in Rawhide, not F17.
- Is there a summary available anywhere of what changes need
to be made to an existing FUSE filesystem so that it has
getxattr-compatible-with-SELinux support? I would like to make
the necessary changes to guestmount.
*** Bug 812550 has been marked as a duplicate of this bug. ***
There are three things that are necessary for some arbitrary FooFS to work.
(1) FooFS must open /dev/fuse itself, and be polling already in a separate thread when it passes that fd to fuse_mount_sys (vs. getting it back from fuse_mount_sys as currently).
(2) FooFS must also create a pipe and pass the write side to fuse_mount_sys, which will use it (in the child) to pass PID and status back when the mount call is complete.
(3) The policies have to be fixed (see bug811217#c4).
http://review.gluster.com/#change,3199 has the FUSE and FS changes for GlusterFS. The mount.c changes areto Gluster's own version, unfortunately, so I still need to "backport" those to the upstream version before sending on to Miklos.
After updating selinux policy to 3.10.0-116 version, again start occurs problem https://bugzilla.redhat.com/show_bug.cgi?id=801600 while mount network shares.
What's the status of this bug? This came up to the top of my list now; my build
system is using libguestfs's "guestmount" to mount filesystems
from the host, so I can create disk images, and also do
I think I can do an elaborate workaround by skipping xattr copying using FUSE, and then do it via the libguestfs API, but it'll be incredibly ugly and painful. Would be far nicer to be able to just call setxattr().
Since this one is closed, OK to clone it as a new bug for tracking adding libguestfs+FUSE+SELinux?
The actual bug (selinux-policy) is of course fixed.
Unfortunately fixing the guestmount + xattr problem is quite a
lot more complex than it seems. I will try and find the links to
everything when I have more reasonable internet access.
The basic problem is it requires us to change the libguestfs API
in order to allow multithreaded guestmount to be written, since
threads are required in order for guestmount to "answer" the getxattr
call that happens during mount when SELinux is enabled.