This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2034630 - [libvirt] Add support to run virtiofsd inside a user namespace (unprivileged)
Summary: [libvirt] Add support to run virtiofsd inside a user namespace (unprivileged)
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: rc
: ---
Assignee: Ján Tomko
QA Contact: Lili Zhu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-21 14:49 UTC by Vivek Goyal
Modified: 2023-11-28 12:39 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-22 16:35:56 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-27131 0 None None None 2023-08-15 10:47:32 UTC
Red Hat Issue Tracker   RHEL-7386 0 None Migrated None 2023-11-28 12:39:53 UTC
Red Hat Issue Tracker RHELPLAN-106361 0 None None None 2021-12-21 14:51:56 UTC

Description Vivek Goyal 2021-12-21 14:49:44 UTC
Description of problem:

virtiofsd (rust) now has the capability to run unprivileged (inside a user namepsace). Next step is to figure out how to integrate this capability in libvirt and where will it make most sense.

Thanks to German who configured and tested and provided instructions on how to run virtiofsd unprivileged.

https://listman.redhat.com/archives/virtio-fs/2021-December/msg00054.html

One big advatange of running virtiofsd as non-root is that system security goes up. One can not drop setuid root binaries on the host and somehow manage to execute these. And there are more examples. So if we can run virtiofsd unprivileged, I think that's a huge win in terms of system security w.r.t virtiofsd.

One limitation of running unprivileged is that one can not create block or char device nodes. That's not allowed. 

Hence opening this bug to figure out how this functionality can be integrated in libvirt so that users can enable this unprivileged/rootless mode.

I think if users are running unprivileged VMs, it will make sense to virtiofsd unprivileged. Even in normal case, I think qemu runs as "qemu" user. So it probably will make sense to not run virtiofsd as root and run as "qemu" user instead.

There will probably be few dependencies.

- We need to allocate a range of uid/gid to qemu user.

- We need to select a range dynamically to setup a user namespace. May be a
  range of 64K for each virtiofsd instance. If file system is being shared by
  multiple VMs, they will have to use same uid/gid range.

- Depending on the use case, shared directory will have to be either chowned
  according to uid/gid range of user namespace or one will have to create
  idmapped mount mapping. Now basic idmapped mount support is upstream. So
  that's not a technology barrier anymore.


Opening this bug to carry out the conversation how libvirt and its users can benefit from this new unpriviliged virtiofsd mode and how to integrate it up the stack so that users can benefit from it.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Vivek Goyal 2021-12-21 15:40:51 UTC
Thinking more about idmapped mounts. We probably don't want to use them in the beginning. Because, they will shift back unprivileged id to root. IOW, if user creates a file as "1000" it might be saved as uid "0" back on disk and if one accesses it through "non-idmapped" mount, it will be visible as root owned file.

IOW, this takes us back to the risk of VM user dropping setuid root binaries and somehow arranging an unprivileged entity on host to execute it.

I guess in the simplest form we need to first wire it up with out idmapped mounts and suggest chown of shared dir as needed.

Comment 2 Vivek Goyal 2021-12-21 20:45:44 UTC
In the simplest form, we probably can run all virtiofsd instances using same uid/gid ranges. This still makes sure there are no real root owned files in shared dir. 

And if all VMs are running as user "qemu" and different qemu processes don't have any uid/gid based isolation between 
them, then it probably be ok to run virtiofsd as "qemu" being pseudo root and all processes using same uid/gid mapping.

It simplifies setup. One downside is that one virtiofsd can attack another if it manages to break out of sandbox.

Comment 3 Martin Kletzander 2022-01-21 09:44:30 UTC
(In reply to Vivek Goyal from comment #2)
Maybe even simpler one would be just running the virtiofsd instance under the same privileges as qemu.  That could be a step in the right direction in order to figure out what's needed next.  It would already secure some bits and I would imagine give us enough of a headache to make sure it works properly.

Comment 4 Vivek Goyal 2022-01-21 14:46:52 UTC
(In reply to Martin Kletzander from comment #3)
> (In reply to Vivek Goyal from comment #2)
> Maybe even simpler one would be just running the virtiofsd instance under
> the same privileges as qemu.  That could be a step in the right direction in
> order to figure out what's needed next.  It would already secure some bits
> and I would imagine give us enough of a headache to make sure it works
> properly.

IIUC, you are suggesting that run virtiofsd as "qemu" user without user namespaces?

If yes, that will not go too far because we lose the ability to switch between arbitrary user ids as needed by guest in
many cases (like kata containers).

Anyway, with user namespaces, that's what we are effectively doing. qemu user will setup a user namespace and become root inside. That way it is practically running with privileges of qemu on host. But inside namespace it runs as root and is able to switch between uid/gids visible inside the user namespace.

Comment 5 Martin Kletzander 2022-01-24 08:17:34 UTC
(In reply to Vivek Goyal from comment #4)
Oh, I see.  Would that require all the user IDs being mapped to the IDs in the user namespace?  Wouldn't it be the same like just dropping the capabilities for virtiofsd running as root (without user namespaces)?

Comment 6 Vivek Goyal 2022-01-24 14:03:10 UTC
(In reply to Martin Kletzander from comment #5)
> (In reply to Vivek Goyal from comment #4)
> Oh, I see.  Would that require all the user IDs being mapped to the IDs in
> the user namespace?  Wouldn't it be the same like just dropping the
> capabilities for virtiofsd running as root (without user namespaces)?

Yes, setting a user namespace will require assigning some subuids/subgids to "qemu" user and then mapping those
subuids/subgids in the newly launched user namespace.

In theory you can launch new user namespace by just mapping "qemu" user as "root" inside user namespace and no
other mappings. But in that case, virtiofsd will fail to switch to other uids/gids and guest will experience all
these errors. 

So to support a filesystem semantics with multiple uids/gids and switching between uids/gids, we will need to
create a user namespace with qemu user subuids/subgids mapped into user namespace.

I think one open question here is that what range of subuids/subgids qemu user can use so that it does not 
conflict with other use cases. Right now I don't see a some sort of fixed allocation of subuid/subgid per
user. In that case if we arbitrarily choose a range, it can conflict with other use cases.

I am copying Dan Walsh. He might know if something has happened in this area.

Till then, I guess we can ask system admin to allocate a range of subuids/subgids to qemu user which libvirt will make use of. This involves a manual step on sysadmin's part, which is not good. We will need to get rid of it somehow later. But that's not necessarily a libvirt problem. Its more of a subuid/sugid allocation problem in this sytem/cluster.

Comment 7 Vivek Goyal 2022-01-24 14:05:27 UTC
Dan, has there been any progress on the issue of how to allocate subuids/subgids for a user in an automated way (without conflicting with other users). We might need a chunk of subuid/subgid allocation for "qemu" user.

Comment 8 Daniel Walsh 2022-01-24 14:12:51 UTC
In Podman we suggest that users allocate the top 2 billion UIDs for the range in /etc/subuid and /etc/subgid with the user name containers.

podman run --userns=auto

Grabs unused (By it) 64k Blocks from that range to assign to its containers.  I would suggest that libvirt do the same. It would be nice if there was some way to collaborate, but
right now the only standard is the /etc/subuid and /etc/subgid files.

With the support of libsubid, now these files can come from the internet, which might make this more easier or more difficult.

Comment 9 Vivek Goyal 2022-01-24 14:51:14 UTC
(In reply to Daniel Walsh from comment #8)
> In Podman we suggest that users allocate the top 2 billion UIDs for the
> range in /etc/subuid and /etc/subgid with the user name containers.
> 
> podman run --userns=auto
> 
> Grabs unused (By it) 64k Blocks from that range to assign to its containers.
> I would suggest that libvirt do the same. It would be nice if there was some
> way to collaborate, but
> right now the only standard is the /etc/subuid and /etc/subgid files.
> 
> With the support of libsubid, now these files can come from the internet,
> which might make this more easier or more difficult.


Hmmm..., so uid/gid range is probably 32 bit. So that's around 4 billion uids. Top 2 billion gone to podman. So may be next range can be picked by livirt.

Libvirt might not have to grab that big a range as number of VMs launched might not be as many as native containers.

So if 64K range is given to each VM and say one might at max launch 512/1024 VMs on this host. So may be 32 million
or 64 million uids/subgids could be reserved for qemu user.

Anyway, looks like for now, reserving subuid/subgids will be done by sys admin and libvirt will just need to either use
user specied subuid/subgid range or automatically pick one subrange.

We should probably start simple. And that is allow user to specify subuid/subgid range in virtiofsd configuration. libvirt will just need to verify that user has ownership of these subuids/subgids before mapping them.

Automatic selection of subuid/subgid ranges should probably be next step.

Comment 10 Ján Tomko 2022-02-04 15:36:14 UTC
I played around with 'unshare' and the C impl of virtiofsd (qemu-common-6.2.0-2.fc34.x86_64):

* fv_socket_lock tries to mkdir /var/run/virtiofsd to lock the pathname,
I would not expect it to do that for unprivileged users
* even the chroot sandbox requires capabilities (I have not investigated further what caps
are required by the rest of virtiofsd)

Libvirt already does uid_map for libvirt_lxc containers:
https://libvirt.org/formatdomain.html#container-boot
so it should be possible to reuse some of that code.

Asking the libvirt user to provide the uidmap should be easily doable.

Automatic assignment for the 'qemu' user should be theoretically possible in libvirtd alone since
no other program needs to reserve subuids so far.

But to make it usable for regular users with unprivileged libvirt, some sort of coordination
is needed and virtiofsd can't require all the capabilities it has now (but that might be out of scope of this bug)

Comment 11 Vivek Goyal 2022-02-04 16:51:29 UTC
(In reply to Ján Tomko from comment #10)
> I played around with 'unshare' and the C impl of virtiofsd
> (qemu-common-6.2.0-2.fc34.x86_64):
> 

Current C version of virtiofsd does not run well inside user namespaces. I think somebody tried it and it did not work.
We never put effort to investigate and make it work.

Rust virtiofsd works inside user namespaces. And given that's the future, we are not planning to add support of
user namespaces to C version of virtiofsd.

So any testing we do with user namespace, we need to do with Rust virtiofsd.

https://gitlab.com/virtio-fs/virtiofsd

> * fv_socket_lock tries to mkdir /var/run/virtiofsd to lock the pathname,
> I would not expect it to do that for unprivileged users

Agreed. Where are we supposed to store the local state of unprivileged user. We
use qemu_get_local_state_pathname(). I assumed that it will automatically
give some path say in $HOME/.local/ and the we can save pid file in
say $HOME/.local/virtiofsd/<pidfile>

Not sure what's the qemu convention for unprivileged users.

> * even the chroot sandbox requires capabilities (I have not investigated
> further what caps
> are required by the rest of virtiofsd)

Rust version seems to work reasonably well. We can't enable certain features like
--inodes-file-handle. Down the line we will not be use SELinux as well as 
that will need to set trusted xattr and needs CAP_SYS_ADMIN.

IOW, we expect that all features will not work when running unprivileged. So at
the end of the date it will be a trade-off between security and functionality/feature-richness.

> 
> Libvirt already does uid_map for libvirt_lxc containers:
> https://libvirt.org/formatdomain.html#container-boot
> so it should be possible to reuse some of that code.

Nice. I am assuing you are referring to <idmap>. This looks good.

> 
> Asking the libvirt user to provide the uidmap should be easily doable.
> 
> Automatic assignment for the 'qemu' user should be theoretically possible in
> libvirtd alone since
> no other program needs to reserve subuids so far.

Sounds reasonable. I think Dan was mentioning that upstream changes are there which
autoamtically allocate a small subuid/subgid range to a user upon creation. I am not
sure about the details though.

> 
> But to make it usable for regular users with unprivileged libvirt, some sort
> of coordination
> is needed and virtiofsd can't require all the capabilities it has now (but
> that might be out of scope of this bug)

Comment 12 Klaus Heinrich Kiwi 2022-05-25 15:26:37 UTC
Can we get an update? Has this RFE been planned for? How should we prioritize it?

Comment 13 Klaus Heinrich Kiwi 2022-08-15 17:20:56 UTC
(In reply to Klaus Heinrich Kiwi from comment #12)
> Can we get an update? Has this RFE been planned for? How should we
> prioritize it?

assigning medium priority /severity according to my understanding of the issue, and looks like Ján targeted it for RHEL 9.2.0, which is good. Thanks

Comment 14 Daniel Walsh 2022-08-15 17:37:52 UTC
Containers are just allocating the bottom half of all UIDs in /etc/subuid and /etc/subgid files.

containers:2147483647:2147483648

There is no way I know of yet to coordinate this other then those two files.

Comment 16 German Maglione 2023-03-16 14:01:50 UTC
Upstream we have few request for this feature, also GNOME Boxes is also waiting for this feature[0] to be able to support virtio-fs

[0] https://gitlab.gnome.org/GNOME/gnome-boxes/-/issues/292

Comment 20 Dan Kenigsberg 2023-08-15 10:47:32 UTC
May I increase the severity of this bug, as it blocks consumption of virtiofs by KubeVirt and OpenShift Virtualziation.

Comment 21 Vivek Goyal 2023-09-11 12:09:24 UTC
(In reply to Dan Kenigsberg from comment #20)
> May I increase the severity of this bug, as it blocks consumption of
> virtiofs by KubeVirt and OpenShift Virtualziation.

I am wondering what's the plan w.r.t usage of user namespaces in Kubevirt.

If kubevirt decides to use user namespaces and launch all the pods in a user namepsace of its own, then virtiofsd probably can run in
same user namespace (and not setup one of its own separately). I think that will be a simpler model instead of virtiofsd sandboxing
itself using a user namespace.

Does kubevirt have any plans to start making use of user namespaces for its pods?

Comment 23 Ján Tomko 2023-09-11 13:54:53 UTC
v1 proposed upstream:
https://listman.redhat.com/archives/libvir-list/2023-September/242012.html

Comment 24 German Maglione 2023-09-11 14:23:47 UTC
(In reply to Ján Tomko from comment #23)
> v1 proposed upstream:
> https://listman.redhat.com/archives/libvir-list/2023-September/242012.html

Is the idea to create a user namespace and then launch virtiofsd or libvirt will pass the id mapping to the virtiofsd's --uid-map/--gid-map parameters?

Comment 25 German Maglione 2023-09-11 14:34:42 UTC
(In reply to Ján Tomko from comment #23)
> v1 proposed upstream:
> https://listman.redhat.com/archives/libvir-list/2023-September/242012.html

I'll suggest not using virtiofsd's --uid-map/--gid-map parameters, in the future we want to remove all the sandboxing code to an external tool (but we can discuss it)

Comment 27 Ján Tomko 2023-09-12 09:30:44 UTC
(In reply to German Maglione from comment #25)
> (In reply to Ján Tomko from comment #23)
> > v1 proposed upstream:
> > https://listman.redhat.com/archives/libvir-list/2023-September/242012.html
> 
> I'll suggest not using virtiofsd's --uid-map/--gid-map parameters, in the
> future we want to remove all the sandboxing code to an external tool (but we
> can discuss it)

The version (or rather - draft) I wrote uses virtiofsd's --uid-map. So far,
the user needs to allocate and specify the uids themself.

It also allows running as non-root without ID mapping (I haven't noticed when
virtiofsd started to support that).


As for sandboxing, having libvirt create a user namespace for virtiofsd seems
reasonable to me (but unnecessary for the KubeVirt, if they create their own
namespace for virtiofsd). But I can't imagine how libvirt would set up seccomp
or separate capabilities for different threads of virtiofsd, so from libvirt's
point of view this new tool would become a wrapper for virtiofsd (preferably it
could find out about it from the virtiofsd.json file)

Comment 28 German Maglione 2023-09-12 09:49:56 UTC
(In reply to Ján Tomko from comment #27)
> (In reply to German Maglione from comment #25)
> > (In reply to Ján Tomko from comment #23)
> > > v1 proposed upstream:
> > > https://listman.redhat.com/archives/libvir-list/2023-September/242012.html
> > 
> > I'll suggest not using virtiofsd's --uid-map/--gid-map parameters, in the
> > future we want to remove all the sandboxing code to an external tool (but we
> > can discuss it)
> 
> The version (or rather - draft) I wrote uses virtiofsd's --uid-map. So far,
> the user needs to allocate and specify the uids themself.
> 
> It also allows running as non-root without ID mapping (I haven't noticed when
> virtiofsd started to support that).

recently, since version 1.6.0

> 
> 
> As for sandboxing, having libvirt create a user namespace for virtiofsd seems
> reasonable to me (but unnecessary for the KubeVirt, if they create their own

I agree with you, I don't think libvirt user NS support is necessary for KubeVirt

> namespace for virtiofsd). But I can't imagine how libvirt would set up
> seccomp
> or separate capabilities for different threads of virtiofsd, so from
> libvirt's
> point of view this new tool would become a wrapper for virtiofsd (preferably
> it
> could find out about it from the virtiofsd.json file)

(I probably use the term sandboxing too broadly)
the idea is to create a user namespace and launch virtiofsd as "fake" root inside it
with --sandbox=chroot or --sandbox=none, virtiofsd will drop capabilities and set the
seccomp filters, so from a virtiofsd pov it will be running as root.

But, if you think virtiofsd should be the one creating the UNS instead of libvirt, 
we can work on the required changes when we move the sandboxing code to an external
tool (will not be soon, btw).

Comment 29 Dan Kenigsberg 2023-09-12 11:14:27 UTC
(In reply to Vivek Goyal from comment #21)
> (In reply to Dan Kenigsberg from comment #20)
> > May I increase the severity of this bug, as it blocks consumption of
> > virtiofs by KubeVirt and OpenShift Virtualziation.
> 
> I am wondering what's the plan w.r.t usage of user namespaces in Kubevirt.
> 
> If kubevirt decides to use user namespaces and launch all the pods in a user
> namepsace of its own, then virtiofsd probably can run in
> same user namespace (and not setup one of its own separately). I think that
> will be a simpler model instead of virtiofsd sandboxing
> itself using a user namespace.
> 
> Does kubevirt have any plans to start making use of user namespaces for its
> pods?

I may well have misunderstood the nature of this bug. I have raised attention to it only because it is (the only RHEL bug) marked as blocking https://issues.redhat.com/browse/CNV-27131 . If this is not correct, please fix the dependency. I suspect that I should have asked higher priority on https://issues.redhat.com/browse/RHELPLAN-165875 instead.

Comment 30 RHEL Program Management 2023-09-22 16:35:32 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 31 RHEL Program Management 2023-09-22 16:35:56 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.


Note You need to log in before you can comment on or make changes to this bug.