Bug 1060423 - Support setting xattrs via FUSE
Summary: Support setting xattrs via FUSE
Keywords:
Status: NEW
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libguestfs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact:
URL:
Whiteboard:
Depends On: 812798
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-01 15:15 UTC by Colin Walters
Modified: 2021-04-19 10:35 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 812798
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Colin Walters 2014-02-01 15:15:17 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=812798#c45

 Richard W.M. Jones 2014-02-01 08:42:44 EST

The actual bug (selinux-policy) is of course fixed.

Unfortunately fixing the guestmount + xattr problem is quite a
lot more complex than it seems.  I will try and find the links to
everything when I have more reasonable internet access.

The basic problem is it requires us to change the libguestfs API
in order to allow multithreaded guestmount to be written, since
threads are required in order for guestmount to "answer" the getxattr
call that happens during mount when SELinux is enabled.

Comment 1 Colin Walters 2014-02-01 15:19:06 UTC
Thanks for the explanation, Richard.  I'm going to investigate the workaround I mentioned where I do file copying via guestmount, and write special code to use libguestfs directly for xattrs.

Comment 2 Richard W.M. Jones 2014-02-05 14:32:27 UTC
Let me write an explanation of this bug as best I remember it.  This
is going to be quite long because it involves describing the history
and the current APIs.

History lessons
---------------

FUSE has several levels of API.  Although demo/simple programs use the
high level API, libguestfs and any serious FUSE user is going to use
the medium or low level APIs because that offers a lot more control.
The way we use the medium level API is that we call [ignoring error
handling etc]:

  ch = fuse_mount (the_mount_point, &some_options);
  f = fuse_new (ch, &some_options, &operations, sizeof operations, opaque);
  fuse_loop (f);

The important point here is that fuse_mount calls into the kernel to
create the mountpoint, fuse_new (I believe) just creates a userspace
object, and fuse_loop is where filesystem requests get processed.

In other words, there is a gap between where the mountpoint is
created in the kernel, and when filesystem operations (requests) can
be processed by that same mountpoint.

If some other userspace process comes along in that time, say reading
the mountpoint, the userspace process is blocked until request
processing begins.  There's no race condition here.  As long as
fuse_loop is called, sooner or later processes accessing the
mountpoint will get unblocked and be able to complete filesystem
operations on the new mountpoint.

Normally the kernel does not perform any filesystem operations on the
mountpoint during the mount.  If it did, that would cause a deadlock
because fuse_mount would never be able to return, and so fuse_loop
would never be entered, and so requests would never start to be
processed.

Unfortunately, with SELinux this changes.  SELinux (for perfectly good
reasons) has to determine the SELinux label of the mountpoint.  This
is stored in the FUSE filesystem, so SELinux has to issue a getxattr
call to get that.  It has to do it during the mount, since otherwise
it is possible that a process could jump in just before SELinux has
worked out the label and run some operation against the unlabelled
filesystem (ie. it could be a security problem).  So SELinux issues
this getxattr call during the mount (fuse_mount), resulting in the
deadlock described in the previous paragraph.

Of course it isn't going to work if SELinux starts unconditionally
doing getxattr calls on FUSE filesystems.  FUSE filesystems which
follow the long-established FUSE API would all deadlock.  Therefore
SELinux has to distinguish between what we might call "traditional
FUSE API filesystems" and "FUSE filesystems that are able to handle
xattr during mount".  It does this by having two labels which I
believe are:

  fs_noxattr_type(fusefs_t)  (traditional API)
  fs_type(fusefs_t)          (can handle xattr during mount)

[https://bugzilla.redhat.com/show_bug.cgi?id=812798#c17]

So how do we actually handle xattr during mount?
------------------------------------------------

Well it's not easy.  Jeff Darcy actually explains it better than I
could, so go and read that instead:

https://bugzilla.redhat.com/show_bug.cgi?id=811217#c4

Basically it involves having multiple threads or processes, opening
/dev/fuse explicitly (which I believe is even lower level than the
low-level FUSE API -- we might need to patch libfuse), and passing the
fd between threads.

More history lessons: the libguestfs API
----------------------------------------

Originally FUSE was implemented in a separate program (guestmount).

However we realized soon after that FUSE functionality was pretty
useful for all libguestfs API users, and for that reason we
reimplemented FUSE support as a libguestfs API:

http://libguestfs.org/guestfs.3.html#mount-local

guestmount is now just a thin wrapper that does command-line parsing
and calls into this API.  But you don't need to use guestmount, you
can call the API directly as in this example:

http://libguestfs.org/guestfs-examples.3.html#example:-the-mount-local-api

The API is basically split into two parts:

 guestfs_mount_local() calls:

  ch = fuse_mount (the_mount_point, &some_options);
  f = fuse_new (ch, &some_options, &operations, sizeof operations, g);

 guestfs_mount_local_run() calls:

  fuse_loop (f);

(cf. traditional FUSE API as described above)

The libguestfs API "ignores" threads: Callers have to promise not to
reuse the same guestfs_h* handle in two threads at the same time.  The
mount-local part of the libguestfs API also ignores threads [to some
extent, this is not the whole truth].

So how do we do this in libguestfs?
-----------------------------------

The current mount-local API model simply does not work for this case.
That means we need a new API to handle it.

What exactly this new API looks like is not currently very clear to
me.  First of all we'd need to write a standalone FUSE program which
works right with SELinux (or examine glusterfs very carefully).  That
would give us an idea of what the shape of the new API might be.

Threads are going to be an issue here.

Also backwards compatibility is going to be an issue.  We absolutely
can not break existing mount-local API users.

Comment 3 Richard W.M. Jones 2014-02-08 13:40:15 UTC
A few more thoughts about this, mainly notes to self ...

(1) You can set (eg) user xattrs via FUSE at the moment.  For example
in a guestmount-ed disk:

$ setfattr home -n user.test -v system_u:object_r:home_root_t:s0
$ getfattr -d -m ^user home
# file: home
user.test="system_u:object_r:home_root_t:s0"

I checked with guestmount and the lsetxattr call is passed through
to libguestfs:

libguestfs: trace: lsetxattr "user.test" "system_u:object_r:home_root_t:s0" 32 "/home"
libguestfs: trace: lsetxattr = 0
libguestfs: trace: lgetxattrs "/home"
libguestfs: trace: lgetxattrs = <struct guestfs_xattr_list *>

(2) However the security.selinux attribute is handled specially by
some layer in the host kernel.  Writes are not permitted:

$ setfattr home -n security.selinux -v system_u:object_r:home_root_t:s0
setfattr: home: Operation not supported

And reads always return a fixed value:

$ getfattr -d -m ^security home
# file: home
security.selinux="system_u:object_r:fusefs_t:s0"

I checked in guestmount, and libguestfs does not even see a lgetxattr
call in this case.

(3) If we get SELinux labels working over FUSE, it's not clear to
me what will happen if you label a guest file with a label which is
not known by the host SELinux policy.  (Say for example you need to
label a RHEL 6 guest, using a Fedora 20 host).

It may be that setting the security.selinux attribute can be made
to work (ie. using lsetxattr, but not setfilecon).

Comment 4 Colin Walters 2014-02-08 17:34:59 UTC
I reread the bug history here, and realized that I could make it work by entirely disabling SELinux on the build server - to make it work on the client side.  Kind of ironic, but it's OK for now.

That de-escalates the priority of this bug a lot for me - I have more work to do to be sure updates work right wrt. SELinux on the client side, which is more important.

For (3) - it should work as long as the writing process has "mac_admin".

Comment 5 Colin Walters 2014-05-23 20:53:17 UTC
An alternative workaround for this would be to avoid FUSE, and have programs directly call into the libguestfs API to set the xattrs.   That would be *really* painful to do though for OSTree, because it's heavily oriented around writing to the raw filesystem APIs.

It might be possible to split the writes so that "normal" stuff goes via FUSE, but all xattrs are done in a second pass where we unmount the FUSE mount, then use the libguestfs API for just for xattrs.

Comment 6 Richard W.M. Jones 2014-05-24 06:39:29 UTC
(In reply to Colin Walters from comment #5)
> An alternative workaround for this would be to avoid FUSE, and have programs
> directly call into the libguestfs API to set the xattrs.   That would be
> *really* painful to do though for OSTree, because it's heavily oriented
> around writing to the raw filesystem APIs.
> 
> It might be possible to split the writes so that "normal" stuff goes via
> FUSE, but all xattrs are done in a second pass where we unmount the FUSE
> mount, then use the libguestfs API for just for xattrs.

A couple of things are happening for 1.28, being driven by ptoscano:

- There will be a "relabel this filesystem" API call.  If you have
  an SELinux policy in the guest, then there will just be a single
  call you have to make to relabel the whole guest filesystem.

- We're going to make the whole API thread-safe, which means we
  can implement a multi-threaded guestmount which implements
  SELinux labels (note that my reservations in comment 3 about
  whether SELinux will allow this to work across different guest/host
  policies may still apply).

Comment 7 Colin Walters 2014-05-24 13:58:33 UTC
(In reply to Richard W.M. Jones from comment #6)

> - There will be a "relabel this filesystem" API call.  If you have
>   an SELinux policy in the guest, then there will just be a single
>   call you have to make to relabel the whole guest filesystem.

That will likely work for "mainline", but it's unlikely to work for OSTree-based installs.  In the OSTree model there is more than one OS in a physical storage - potentially many.  Each with a potentially different SELinux policy.

So what I am currently doing is using the "default deployment" (ie the first in the boot order) to relabel the disk and itself:

https://git.gnome.org/browse/ostree/commit/?id=e11de9357cea643b45a2e5e3f94d33dbd84d9ca3

Unfortunately the OSTree model invalidates all of the "high level" virt-* tools that are expecting to find exactly one operating system in the physical /.  Another good example of this is I can't use the "-i" option to guestmount because there's no /etc/fstab - that's really /ostree/deploy/fedora-atomic/deploy/$deployment/etc/fstab for a given $deployment.

That's a conversation to have somewhere else though...

> - We're going to make the whole API thread-safe, which means we
>   can implement a multi-threaded guestmount which implements
>   SELinux labels (note that my reservations in comment 3 about
>   whether SELinux will allow this to work across different guest/host
>   policies may still apply).

I think so - assuming a process has self:capability { mac_admin } it can lay down security.selinux values unknown to the system policy.  Likewise if we call the raw getxattr() I believe we should see the untranslated value, even if the label isn't known to the system.

Comment 8 Richard W.M. Jones 2014-05-24 15:03:21 UTC
There are several issues here and I think it's best to discuss
the design on the mailing list.  However my brief thoughts
are in-line below.

(In reply to Colin Walters from comment #7)
> (In reply to Richard W.M. Jones from comment #6)
> 
> > - There will be a "relabel this filesystem" API call.  If you have
> >   an SELinux policy in the guest, then there will just be a single
> >   call you have to make to relabel the whole guest filesystem.
> 
> That will likely work for "mainline", but it's unlikely to work for
> OSTree-based installs.

The design is by no means set in stone, and so we should discuss
what your requirements are and make it so that it works for the
existing user [virt-customize/virt-builder] and your use-case too.
If we need to have multiple APIs then we can do that too.

I will start a thread and CC you & Pino.

> In the OSTree model there is more than one OS in a
> physical storage - potentially many.  Each with a potentially different
> SELinux policy.
> 
> So what I am currently doing is using the "default deployment" (ie the first
> in the boot order) to relabel the disk and itself:
> 
> https://git.gnome.org/browse/ostree/commit/
> ?id=e11de9357cea643b45a2e5e3f94d33dbd84d9ca3
> 
> Unfortunately the OSTree model invalidates all of the "high level" virt-*
> tools that are expecting to find exactly one operating system in the
> physical /.  Another good example of this is I can't use the "-i" option to
> guestmount because there's no /etc/fstab - that's really
> /ostree/deploy/fedora-atomic/deploy/$deployment/etc/fstab for a given
> $deployment.

Indeed, and another thing that could be fixed.  Note that we've
already been through this with btrfs -- libguestfs can now (often)
recognize multiple btrfs snapshots as different operating systems.

Although the '-i' option still won't work as it currently requires
a single root.  (Could also be fixed ..)

> > - We're going to make the whole API thread-safe, which means we
> >   can implement a multi-threaded guestmount which implements
> >   SELinux labels (note that my reservations in comment 3 about
> >   whether SELinux will allow this to work across different guest/host
> >   policies may still apply).
> 
> I think so - assuming a process has self:capability { mac_admin } it can lay
> down security.selinux values unknown to the system policy.  Likewise if we
> call the raw getxattr() I believe we should see the untranslated value, even
> if the label isn't known to the system.

OK that's hopeful.

Comment 9 Eric Paris 2014-05-24 15:24:09 UTC
(In reply to Colin Walters from comment #7)
> (In reply to Richard W.M. Jones from comment #6)

> I think so - assuming a process has self:capability { mac_admin } it can lay
> down security.selinux values unknown to the system policy.

Correct

> Likewise if we
> call the raw getxattr() I believe we should see the untranslated value, even
> if the label isn't known to the system.

Sorta correct.  If you have mac admin (both in DAC and SELinux) you will see the unknown label.  If you don't, you will see unlabeled_t no matter what userspace interface you use.

Comment 10 Richard W.M. Jones 2014-05-24 15:25:54 UTC
Let's move discussion of the SELinux relabelling API to this thread:
https://www.redhat.com/archives/libguestfs/2014-May/msg00094.html


Note You need to log in before you can comment on or make changes to this bug.