1854595 – virtiofsd sandboxing requires CAP_SYS_ADMIN and does not run in a container

Bug 1854595 - virtiofsd sandboxing requires CAP_SYS_ADMIN and does not run in a container

Summary: virtiofsd sandboxing requires CAP_SYS_ADMIN and does not run in a container

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	8.3
Assignee:	Dr. David Alan Gilbert
QA Contact:	xiagao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1860491
TreeView+	depends on / blocked

Reported:	2020-07-07 18:13 UTC by Cole Robinson
Modified:	2021-12-17 06:39 UTC (History)
CC List:	12 users (show)
Fixed In Version:	5.2
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-17 06:39:20 UTC
Type:	Feature Request
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Description Cole Robinson 2020-07-07 18:13:03 UTC

kubevirt has an unmerged PR adding virtiofs support: https://github.com/kubevirt/kubevirt/pull/3493

However virtio-fs requires granting the VM pod CAP_SYS_ADMIN which is root equivalent, so it doesn't fit kubevirt's containment model.

It was mentioned to me that this requirement comes from virtiofsd's sandboxing via unshare() but I didn't confirm this myself, though `man 2 unshare` backs it up: https://man7.org/linux/man-pages/man2/unshare.2.html

Comment 1 Roman Mohr 2020-07-08 07:56:01 UTC

I suppose that the mount namespace is unshared? Let me add here that for the kubevirt use-case this does not provide any additional security and is therefore not needed. The Pods where the VMs are running at, already have a user-owned mount-namespace. All data which is there is already owned by the user.

Comment 3 Michal Privoznik 2020-07-09 07:37:39 UTC

when the virtiofd starts, it tries to constraint itself by calling:

    if (unshare(CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET) != 0) {
        fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWPID | CLONE_NEWNS): %m\n");
        exit(1);
    }

All three namespaces it tries to create require CAP_SYS_ADMIN. However, I'm not sure if we can remove it, because later the virtiofsd sets up some extra mount points which it doesn't want to propagate into the parent namespace. Anyway, I'll leave this for somebody from QEMU team.

Comment 4 Roman Mohr 2020-07-09 07:41:07 UTC

(In reply to Michal Privoznik from comment #3)
> when the virtiofd starts, it tries to constraint itself by calling:
> 
>     if (unshare(CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET) != 0) {
>         fuse_log(FUSE_LOG_ERR, "unshare(CLONE_NEWPID | CLONE_NEWNS): %m\n");
>         exit(1);
>     }
> 
> All three namespaces it tries to create require CAP_SYS_ADMIN. However, I'm
> not sure if we can remove it, because later the virtiofsd sets up some extra
> mount points which it doesn't want to propagate into the parent namespace.
> Anyway, I'll leave this for somebody from QEMU team.

I think in our case it can also not assume that it can do mounts. It is already in the restricted environment and will not have the possibility to create mounts, but it also does not have to.

Comment 5 Roman Mohr 2020-07-10 07:16:43 UTC

By the way, in our case it would make a lot of sense to also run virtiofsd standalone in a separate container  and only share the socket to the container where libvirt and qemu are running. Would it make sense to allow libvirt to connect qemu to a prestarted virtiofsd? While I don't think it is too critical from a security perspective, as written before, it would harden the virtiofsd setup a little bit more if it does not do its unsharing. If we can run it in a separate container it would run in completely separate namespaces, except for the networking namespace.

Comment 6 Cole Robinson 2020-07-10 13:55:27 UTC

(In reply to Roman Mohr from comment #5)
> By the way, in our case it would make a lot of sense to also run virtiofsd
> standalone in a separate container  and only share the socket to the
> container where libvirt and qemu are running. Would it make sense to allow
> libvirt to connect qemu to a prestarted virtiofsd? While I don't think it is
> too critical from a security perspective, as written before, it would harden
> the virtiofsd setup a little bit more if it does not do its unsharing. If we
> can run it in a separate container it would run in completely separate
> namespaces, except for the networking namespace.

For vhostuser net libvirt has XML like:

  <interface type='vhostuser'>
    <mac address='52:54:00:3b:83:1a'/>
    <source type='unix' path='/tmp/vhost1.sock' mode='server'/>
    <model type='virtio'/>
  </interface>

Maybe we can do the same for virtiofs. I filed a libvirt RFE and cc'd you:
https://bugzilla.redhat.com/show_bug.cgi?id=1855789

Comment 9 Dr. David Alan Gilbert 2020-07-13 14:50:43 UTC

This isn't that trivial a change; we've got one suggestion for changing virtiofsd to run unpriv, but we haven't figured out
what the permissions would look like.

Comment 10 Stefan Hajnoczi 2020-07-14 14:53:57 UTC

CLONE_NEWPID - not needed inside a virtiofsd container (there are no other processes). Can be dropped.
CLONE_NEWNS - currently needed because virtiofsd is designed to run in a mount namespace with the shared directory as / (and no other mounts). This is for security: it ensures path traversal cannot escape the shared directory.
CLONE_NEWNET - virtiofsd only needs to the UNIX domain socket, no further network communication should be allowed. This can be dropped without impacting functionality, but security is affected.

Note that CAP_SYS_ADMIN is required for some extended attributes (the trusted. namespace that overlayfs uses and the security. namespace that SELinux uses). If CAP_SYS_ADMIN is removed then overlayfs will not work on the virtiofs mount inside the guest.

Comment 11 Roman Mohr 2020-07-14 15:31:56 UTC

(In reply to Stefan Hajnoczi from comment #10)
> CLONE_NEWPID - not needed inside a virtiofsd container (there are no other
> processes). Can be dropped.
> CLONE_NEWNS - currently needed because virtiofsd is designed to run in a
> mount namespace with the shared directory as / (and no other mounts). This
> is for security: it ensures path traversal cannot escape the shared
> directory.

From a kubevirt perspective that is not needed. We can run virtiofsd in its own mount namespace. k8s will mount everything into that namespace which we want to allow to be shared.

> CLONE_NEWNET - virtiofsd only needs to the UNIX domain socket, no further
> network communication should be allowed. This can be dropped without
> impacting functionality, but security is affected.

In k8s containers have most capabilities dropped by default. It would not have privileges like CAP_NET_ADMIN.
Any privileges left would not make it more harmful than an arbitrary http server (but maybe I miss something).

> 
> Note that CAP_SYS_ADMIN is required for some extended attributes (the
> trusted. namespace that overlayfs uses and the security. namespace that
> SELinux uses). If CAP_SYS_ADMIN is removed then overlayfs will not work on
> the virtiofs mount inside the guest.

There may be potential increased security with using overlayfs here, but I don't think that we can leverage that. By giving in our case virtiofsd the CAP_SYS_ADMIN privilege we may make it more secure regarding to isolating different shares, but we may make it more harmful for the whole node. Also here I may miss something.

Comment 12 Stefan Hajnoczi 2020-07-15 09:01:36 UTC

(In reply to Roman Mohr from comment #11)
> (In reply to Stefan Hajnoczi from comment #10)
> > CLONE_NEWPID - not needed inside a virtiofsd container (there are no other
> > processes). Can be dropped.
> > CLONE_NEWNS - currently needed because virtiofsd is designed to run in a
> > mount namespace with the shared directory as / (and no other mounts). This
> > is for security: it ensures path traversal cannot escape the shared
> > directory.
> 
> From a kubevirt perspective that is not needed. We can run virtiofsd in its
> own mount namespace. k8s will mount everything into that namespace which we
> want to allow to be shared.

The shared directory cannot be / in the container because the init process is still executed from inside the container. I think the best k8s can do is:

  /
    bin/virtiofsd
    lib/...dependencies...
    shared-dir/
      ..shared files..

An absolute symlink inside shared-dir/ resolves to / instead of /shared-dir/. A .. traversal on shared-dir/ resolves to /.

Ways to deal with this:
1. Mount namespaces + pivot_root(2). What virtiofsd does today but it requires CAP_SYS_ADMIN. Won't work in containers.
2. Use chroot(2). This requires CAP_SYS_CHROOT, which is a default capability in Docker and podman.
3. Use openat2(2) to restrict traversal. This is trickier to use and does not sandbox virtiofsd since many other syscalls still take paths that could escape.
4. Accept that malicious guests can escape the shared directory. Assume this is okay because no sensitive data is available inside the container (e.g. just executables and libraries mounted with 'ro').

> > CLONE_NEWNET - virtiofsd only needs to the UNIX domain socket, no further
> > network communication should be allowed. This can be dropped without
> > impacting functionality, but security is affected.
> 
> In k8s containers have most capabilities dropped by default. It would not
> have privileges like CAP_NET_ADMIN.
> Any privileges left would not make it more harmful than an arbitrary http
> server (but maybe I miss something).

The process launches with privileges but it drops them, so once it has started it's more isolated an unprivileged container.

The container should not be allowed to make outgoing connections (external or to other containers). It should only accept connections on its UNIX domain socket. Is it possible to enforce this policy in k8s?

> > Note that CAP_SYS_ADMIN is required for some extended attributes (the
> > trusted. namespace that overlayfs uses and the security. namespace that
> > SELinux uses). If CAP_SYS_ADMIN is removed then overlayfs will not work on
> > the virtiofs mount inside the guest.
> 
> There may be potential increased security with using overlayfs here, but I
> don't think that we can leverage that. By giving in our case virtiofsd the
> CAP_SYS_ADMIN privilege we may make it more secure regarding to isolating
> different shares, but we may make it more harmful for the whole node. Also
> here I may miss something.

Disabling CAP_SYS_ADMIN on virtiofsd means the guest cannot use overlayfs on virtiofs mounts. This means containers on virtiofs won't work inside VMs :(.

Comment 13 Roman Mohr 2020-07-15 09:24:54 UTC

(In reply to Stefan Hajnoczi from comment #12)
> (In reply to Roman Mohr from comment #11)
> > (In reply to Stefan Hajnoczi from comment #10)
> > > CLONE_NEWPID - not needed inside a virtiofsd container (there are no other
> > > processes). Can be dropped.
> > > CLONE_NEWNS - currently needed because virtiofsd is designed to run in a
> > > mount namespace with the shared directory as / (and no other mounts). This
> > > is for security: it ensures path traversal cannot escape the shared
> > > directory.
> > 
> > From a kubevirt perspective that is not needed. We can run virtiofsd in its
> > own mount namespace. k8s will mount everything into that namespace which we
> > want to allow to be shared.
> 
> The shared directory cannot be / in the container because the init process
> is still executed from inside the container. I think the best k8s can do is:
> 
>   /
>     bin/virtiofsd
>     lib/...dependencies...
>     shared-dir/
>       ..shared files..
> 
> An absolute symlink inside shared-dir/ resolves to / instead of
> /shared-dir/. A .. traversal on shared-dir/ resolves to /.
> 
> Ways to deal with this:
> 1. Mount namespaces + pivot_root(2). What virtiofsd does today but it
> requires CAP_SYS_ADMIN. Won't work in containers.
> 2. Use chroot(2). This requires CAP_SYS_CHROOT, which is a default
> capability in Docker and podman.
> 3. Use openat2(2) to restrict traversal. This is trickier to use and does
> not sandbox virtiofsd since many other syscalls still take paths that could
> escape.
> 4. Accept that malicious guests can escape the shared directory. Assume this
> is okay because no sensitive data is available inside the container (e.g.
> just executables and libraries mounted with 'ro').


I would recommend to provide virtiofsd as a static binary.
There would be nothing else in the container than the data and this single binary.


> 
> > > CLONE_NEWNET - virtiofsd only needs to the UNIX domain socket, no further
> > > network communication should be allowed. This can be dropped without
> > > impacting functionality, but security is affected.
> > 
> > In k8s containers have most capabilities dropped by default. It would not
> > have privileges like CAP_NET_ADMIN.
> > Any privileges left would not make it more harmful than an arbitrary http
> > server (but maybe I miss something).
> 
> The process launches with privileges but it drops them, so once it has
> started it's more isolated an unprivileged container.

While the process may drop them, the container itself still keeps the privielege. For instance if someone manages to run `oc exec` to enter that container, still all privileges are present. Further k8s has a permission model where a user does not delegate requests to a more privileged entity. One normally needs to have the privilege to start a pod with the said privileges. This means that we would have to give all users the permissions to create such privileged pods, just to start the pod. Right now we work around this by delegating the pod creation to a more privileged entity and hope that other security mechanisms prevent people from running e.g. `oc  exec`, but we need to get away with this for security reasons and for integration reasons, since we right now deliberately bypass with this way the k8s permission system. We just bring up all this because a clean implementation is one of the most important goals where kubevirt is right now not where it should be.

> 
> The container should not be allowed to make outgoing connections (external
> or to other containers). It should only accept connections on its UNIX
> domain socket. Is it possible to enforce this policy in k8s?

I don't think it is possible, but I would argue that this is only of interest for people which also want to not allow users inside the VM to interact with the network.
There are other ways in k8s to enforce security. Namespaces can be isolated with NetworkPolicies, we can think about adding flags for super-secure use-cases inside the VM (note up to the creator of the VM), where we then help the user avoiding potential unexpected security pitfalls by rejecting then VMs which use virtiofs, and so forth.

I really think that this is not a blocker at this stage. It is really expected that k8s users which can create pods (so everyone who you would call a user), can make use of the network interface in their network namespace.

> 
> > > Note that CAP_SYS_ADMIN is required for some extended attributes (the
> > > trusted. namespace that overlayfs uses and the security. namespace that
> > > SELinux uses). If CAP_SYS_ADMIN is removed then overlayfs will not work on
> > > the virtiofs mount inside the guest.
> > 
> > There may be potential increased security with using overlayfs here, but I
> > don't think that we can leverage that. By giving in our case virtiofsd the
> > CAP_SYS_ADMIN privilege we may make it more secure regarding to isolating
> > different shares, but we may make it more harmful for the whole node. Also
> > here I may miss something.
> 
> Disabling CAP_SYS_ADMIN on virtiofsd means the guest cannot use overlayfs on
> virtiofs mounts. This means containers on virtiofs won't work inside VMs :(.

Oh now I understand your point. What exactly is virtiofs doing there? Can this be eventually done by someone else and virtiofs may e.g. consume the result and expose it (anythink is thinkable, even things like sharing file-descriptor over sockets)?

Comment 15 Stefan Hajnoczi 2020-07-20 12:57:30 UTC

(In reply to Roman Mohr from comment #13)
> (In reply to Stefan Hajnoczi from comment #12)
> > (In reply to Roman Mohr from comment #11)
> > > > CLONE_NEWNET - virtiofsd only needs to the UNIX domain socket, no further
> > > > network communication should be allowed. This can be dropped without
> > > > impacting functionality, but security is affected.
> > > 
> > > In k8s containers have most capabilities dropped by default. It would not
> > > have privileges like CAP_NET_ADMIN.
> > > Any privileges left would not make it more harmful than an arbitrary http
> > > server (but maybe I miss something).
> > 
> > The process launches with privileges but it drops them, so once it has
> > started it's more isolated an unprivileged container.
> 
> While the process may drop them, the container itself still keeps the
> privielege. For instance if someone manages to run `oc exec` to enter that
> container, still all privileges are present.

Thanks for explaining. Then it's not worth using privileges to achieve tighter sandbox isolation.

> > > > Note that CAP_SYS_ADMIN is required for some extended attributes (the
> > > > trusted. namespace that overlayfs uses and the security. namespace that
> > > > SELinux uses). If CAP_SYS_ADMIN is removed then overlayfs will not work on
> > > > the virtiofs mount inside the guest.
> > > 
> > > There may be potential increased security with using overlayfs here, but I
> > > don't think that we can leverage that. By giving in our case virtiofsd the
> > > CAP_SYS_ADMIN privilege we may make it more secure regarding to isolating
> > > different shares, but we may make it more harmful for the whole node. Also
> > > here I may miss something.
> > 
> > Disabling CAP_SYS_ADMIN on virtiofsd means the guest cannot use overlayfs on
> > virtiofs mounts. This means containers on virtiofs won't work inside VMs :(.
> 
> Oh now I understand your point. What exactly is virtiofs doing there? Can
> this be eventually done by someone else and virtiofs may e.g. consume the
> result and expose it (anythink is thinkable, even things like sharing
> file-descriptor over sockets)?

overlayfs inside the guest accesses "trusted.*" xattrs. If the overlayfs is on a virtiofs mount then virtiofsd will perform those xattr accesses on behalf of the guest.

The "trusted.*" xattr namespace requires CAP_SYS_ADMIN, so if we remove this capability then the xattr accesses will fail and overlayfs will not work inside the guest.

There have been discussions about how to handle privileged xattrs recently. One option is to prefix them with an unprivileged string like "virtiofs.trusted.*" so that the guest can still read/write them but they will not be considered "trusted.*" xattrs by the host. This would make overlayfs work inside the guest at the cost of the xattrs not working when the file system is mounted directly on the host (not via virtiofs).

Comment 16 Fabian Deutsch 2020-07-20 13:26:12 UTC

I suspect that we'll need both modes.

There will be unprivileged guests which need simple host file access - but like an unprivileged used (restricted pods).
Then there can be privileged pods which really want to craft special things on the volumes, including xattrs - but here the expectation is that the pod will really have the necessary privileges to do this.

For CNV I'd imagine that the vast majority of cases will be the first case.

For kata I expect that it will almost exclusively use the second.

Comment 17 Stefan Hajnoczi 2020-07-21 09:54:51 UTC

(In reply to Fabian Deutsch from comment #16)
> I suspect that we'll need both modes.
> 
> There will be unprivileged guests which need simple host file access - but
> like an unprivileged used (restricted pods).
> Then there can be privileged pods which really want to craft special things
> on the volumes, including xattrs - but here the expectation is that the pod
> will really have the necessary privileges to do this.
> 
> For CNV I'd imagine that the vast majority of cases will be the first case.
> 
> For kata I expect that it will almost exclusively use the second.

I'm not sure Kata Containers has the same limitations because virtiofsd does not run in a container.

Comment 18 Roman Mohr 2020-07-21 09:57:35 UTC

(In reply to Fabian Deutsch from comment #16)
> I suspect that we'll need both modes.
> 
> There will be unprivileged guests which need simple host file access - but
> like an unprivileged used (restricted pods).
> Then there can be privileged pods which really want to craft special things
> on the volumes, including xattrs - but here the expectation is that the pod
> will really have the necessary privileges to do this.
> 
> For CNV I'd imagine that the vast majority of cases will be the first case.
> 


I agree here. If we can pass through secrets and configmaps (read only) live we make a big step forward and xattrs are not relevant for that.
For PVCs (reaed-write access on shared storage with application data) it should be in many cases also ok to not have xattrs supported.

Comment 19 Roman Mohr 2020-07-21 09:58:52 UTC

(In reply to Stefan Hajnoczi from comment #17)

> I'm not sure Kata Containers has the same limitations because virtiofsd does
> not run in a container.

Maybe. Let's focus on CNV here. We would benefit a lot from virtiofsd even without xattrs.

Comment 20 Stefan Hajnoczi 2020-07-21 14:00:49 UTC

(In reply to Roman Mohr from comment #19)
> (In reply to Stefan Hajnoczi from comment #17)
> 
> > I'm not sure Kata Containers has the same limitations because virtiofsd does
> > not run in a container.
> 
> Maybe. Let's focus on CNV here. We would benefit a lot from virtiofsd even
> without xattrs.

Great, I am working on patches that let virtiofsd run in a container.

Comment 21 Stefan Hajnoczi 2020-07-24 18:58:20 UTC

I have updated the BZ title to focus just on running virtiofsd in containers.

bz1860491 has been created to track overlayfs trusted.* xattr support.

Comment 27 menli@redhat.com 2020-12-28 02:32:10 UTC

I go through the hole bug and do some search on Intelnet:

'Container runtimes handle namespace setup and remove privileges needed by
virtiofsd to perform sandboxing.Introduce a new "virtiofsd -o sandbox=chroot" option that uses chroot(2)
instead of namespaces' 

my question is that is there some way that we can do verify this bug on qemu side once this bug on_qa? is it drop 'CAP_SYS_ADMIN'  first then use option '-o sandbox=chroot' to start virtiofs to test this on qemu side?

please feel free to correct it , thanks



Thanks

Menghuan

Comment 28 Dr. David Alan Gilbert 2021-01-11 16:53:56 UTC

(In reply to menli from comment #27)
> I go through the hole bug and do some search on Intelnet:
> 
> 'Container runtimes handle namespace setup and remove privileges needed by
> virtiofsd to perform sandboxing.Introduce a new "virtiofsd -o
> sandbox=chroot" option that uses chroot(2)
> instead of namespaces' 
> 
> my question is that is there some way that we can do verify this bug on qemu
> side once this bug on_qa? is it drop 'CAP_SYS_ADMIN'  first then use option
> '-o sandbox=chroot' to start virtiofs to test this on qemu side?
> 
> please feel free to correct it , thanks

I think the answer here is to start your virtiofsd under a root shell using capsh and remove the 
CAP_SYS_ADMIN capability;  it should fail.  Then add the -o sandbox=chroot  and it should be OK.
Note this is to the virtiofsd, not the qemu itself.

> 
> 
> Thanks
> 
> Menghuan

Comment 29 menli@redhat.com 2021-01-12 07:05:35 UTC

(In reply to Dr. David Alan Gilbert from comment #28)
> (In reply to menli from comment #27)
> > I go through the hole bug and do some search on Intelnet:
> > 
> > 'Container runtimes handle namespace setup and remove privileges needed by
> > virtiofsd to perform sandboxing.Introduce a new "virtiofsd -o
> > sandbox=chroot" option that uses chroot(2)
> > instead of namespaces' 
> > 
> > my question is that is there some way that we can do verify this bug on qemu
> > side once this bug on_qa? is it drop 'CAP_SYS_ADMIN'  first then use option
> > '-o sandbox=chroot' to start virtiofs to test this on qemu side?
> > 
> > please feel free to correct it , thanks
> 
> I think the answer here is to start your virtiofsd under a root shell using
> capsh and remove the 
> CAP_SYS_ADMIN capability;  it should fail.  Then add the -o sandbox=chroot 
> and it should be OK.
> Note this is to the virtiofsd, not the qemu itself.
> 
> > 
> > 
> > Thanks
> > 
> > Menghuan


Thanks for help confirm this~
 
But I have a doubt, in my understanding after remove the CAP_SYS_ADMIN capability, guest can boot successfully once Bug 1860491 fixed.(test steps like https://bugzilla.redhat.com/show_bug.cgi?id=1860491#c12), virtiofs don't have to add '-o sandbox=chroot'.
Or its my misunderstanding I have to add '-o sandbox=chroot' to verify Bug 1860491 after remove CAP_SYS_ADMIN capability  to confirm the guest boot successfully ? if so I will update test steps https://bugzilla.redhat.com/show_bug.cgi?id=1860491#c12

please feel free to correct me, thanks


Thanks
Menghuan

Comment 30 Dr. David Alan Gilbert 2021-02-03 10:55:00 UTC

(In reply to menli from comment #29)
> (In reply to Dr. David Alan Gilbert from comment #28)
> > (In reply to menli from comment #27)
> > > I go through the hole bug and do some search on Intelnet:
> > > 
> > > 'Container runtimes handle namespace setup and remove privileges needed by
> > > virtiofsd to perform sandboxing.Introduce a new "virtiofsd -o
> > > sandbox=chroot" option that uses chroot(2)
> > > instead of namespaces' 
> > > 
> > > my question is that is there some way that we can do verify this bug on qemu
> > > side once this bug on_qa? is it drop 'CAP_SYS_ADMIN'  first then use option
> > > '-o sandbox=chroot' to start virtiofs to test this on qemu side?
> > > 
> > > please feel free to correct it , thanks
> > 
> > I think the answer here is to start your virtiofsd under a root shell using
> > capsh and remove the 
> > CAP_SYS_ADMIN capability;  it should fail.  Then add the -o sandbox=chroot 
> > and it should be OK.
> > Note this is to the virtiofsd, not the qemu itself.
> > 
> > > 
> > > 
> > > Thanks
> > > 
> > > Menghuan
> 
> 
> Thanks for help confirm this~
>  
> But I have a doubt, in my understanding after remove the CAP_SYS_ADMIN
> capability, guest can boot successfully once Bug 1860491 fixed.(test steps
> like https://bugzilla.redhat.com/show_bug.cgi?id=1860491#c12), virtiofs
> don't have to add '-o sandbox=chroot'.
> Or its my misunderstanding I have to add '-o sandbox=chroot' to verify Bug
> 1860491 after remove CAP_SYS_ADMIN capability  to confirm the guest boot
> successfully ? if so I will update test steps
> https://bugzilla.redhat.com/show_bug.cgi?id=1860491#c12
> 
> please feel free to correct me, thanks
> 
> 
> Thanks
> Menghuan

Yes, see the comment I put on 1860491 yesterday; you need the sandbox=chroot to get past the startup
without CAP_SYS_ADMIN.

Comment 31 menli@redhat.com 2021-02-04 09:26:02 UTC

(In reply to Dr. David Alan Gilbert from comment #30)
> (In reply to menli from comment #29)
> > (In reply to Dr. David Alan Gilbert from comment #28)
> > > (In reply to menli from comment #27)
> > > > I go through the hole bug and do some search on Intelnet:
> > > > 
> > > > 'Container runtimes handle namespace setup and remove privileges needed by
> > > > virtiofsd to perform sandboxing.Introduce a new "virtiofsd -o
> > > > sandbox=chroot" option that uses chroot(2)
> > > > instead of namespaces' 
> > > > 
> > > > my question is that is there some way that we can do verify this bug on qemu
> > > > side once this bug on_qa? is it drop 'CAP_SYS_ADMIN'  first then use option
> > > > '-o sandbox=chroot' to start virtiofs to test this on qemu side?
> > > > 
> > > > please feel free to correct it , thanks
> > > 
> > > I think the answer here is to start your virtiofsd under a root shell using
> > > capsh and remove the 
> > > CAP_SYS_ADMIN capability;  it should fail.  Then add the -o sandbox=chroot 
> > > and it should be OK.
> > > Note this is to the virtiofsd, not the qemu itself.
> > > 
> > > > 
> > > > 
> > > > Thanks
> > > > 
> > > > Menghuan
> > 
> > 
> > Thanks for help confirm this~
> >  
> > But I have a doubt, in my understanding after remove the CAP_SYS_ADMIN
> > capability, guest can boot successfully once Bug 1860491 fixed.(test steps
> > like https://bugzilla.redhat.com/show_bug.cgi?id=1860491#c12), virtiofs
> > don't have to add '-o sandbox=chroot'.
> > Or its my misunderstanding I have to add '-o sandbox=chroot' to verify Bug
> > 1860491 after remove CAP_SYS_ADMIN capability  to confirm the guest boot
> > successfully ? if so I will update test steps
> > https://bugzilla.redhat.com/show_bug.cgi?id=1860491#c12
> > 
> > please feel free to correct me, thanks
> > 
> > 
> > Thanks
> > Menghuan
> 
> Yes, see the comment I put on 1860491 yesterday; you need the sandbox=chroot
> to get past the startup
> without CAP_SYS_ADMIN.

Thanks for your confirm~
I notice this bug fixed，Could you please change status to 'ON_QA'?

Comment 32 Dr. David Alan Gilbert 2021-02-04 09:32:09 UTC

I'll move it;  as per comment 30/31 moving to ON_QA - it's part of 1860491

Comment 33 menli@redhat.com 2021-02-04 12:05:07 UTC

bug 1860491 Verified, for this bug is part of 1860491, so change it to verified.

Note You need to log in before you can comment on or make changes to this bug.