Bug 1921826

Summary: NetworkManager is running with dac_override capability
Product: Red Hat Enterprise Linux 8 Reporter: Jan Pazdziora <jpazdziora>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: unspecified Docs Contact:
Priority: low    
Version: 8.4CC: acardace, bgalvani, ferferna, fge, fleitner, jpazdziora, lrintel, rkhan, sfaye, sukulkar, thaller, till, zpytela
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
Story Points: ---
Clone Of:
: 1956820 1986076 (view as bug list) Environment:
Last Closed: 2023-03-01 18:49:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1986076, 2053639    
Bug Blocks: 1956820    

Description Jan Pazdziora 2021-01-28 16:59:49 UTC
Description of problem:

NetworkManager is running with dac_override capability since CAP_DAC_OVERRIDE is listed in CapabilityBoundingSet in NetworkManager.service.

Traditionally dac_override capability is a sign that the file permissions for something that the process needs to access or manipulate are wrong, and it increases impact of potential vulnerability.

Version-Release number of selected component (if applicable):

NetworkManager-1.30.0-0.3.el9.x86_64

How reproducible:

Deterministic.

Steps to Reproduce:
1. pscap | grep NetworkManager | grep dac_override

Actual results:

1     625   root        NetworkManager      dac_override, kill, setgid, setuid, net_bind_service, net_admin, net_raw, sys_module, sys_chroot, audit_write +

Expected results:

No output and NetworkManager continues working.

Additional info:

Comment 1 Jan Pazdziora 2021-01-28 17:01:17 UTC
I quickly checked my Fedora 33 with

  /usr/lib/systemd/system/NetworkManager.service.d/capability.conf:
  [Service]
  CapabilityBoundingSet=~CAP_DAC_OVERRIDE

and it connected to authenticated WiFi or VPN just fine, so NetworkManager is not completely broken without that capability.

Comment 2 Thomas Haller 2021-02-03 10:29:27 UTC
> NetworkManager is not completely broken without that capability.

"not completely broken" is a good start :)


NetworkManager has lots of capabilities, not necessarily for itself but also because it forks processes that might need it (like VPN plugins, dnsmasq, arping). Optimally we fix this by not spawning processes but run them in a different context (e.g. as systemd services).

Anyway, the goal is certainly to reduce the capabilities as much as we can, and first it would be necessary to identify why those capabilities are there in the first place.

I don't know why NetworkManager would need CAP_DAC_OVERRIDE and the commit that introduced it ([1]), doesn't comment about that.

Maybe it's because users can configure certificates in files, which might be in their home directory with wrong permissions. But certificates in files are problematic anyway, also due to SELinux.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/4ffd57f83d9cc36c8908c42bcf3d452392bb0e60



Let's try to drop it -- that way we can at least find if somebody depends on it. And if something requires it, we would better investigation how that can be fixed to not do so.

Comment 9 Thomas Haller 2021-02-03 11:18:20 UTC
If it's not a suitable thing to do on a minor rhel-8 release, we likely also don't want to do it on a major rhel release.

It either breaks, or it does not break. There is no need to behave differently.

NetworkManager in 8.4 will also be rebased to 1.30, which comes from upstream master. Of course, it means that we only do things in upstream master that are sufficiently backward compatible up to the criteria of a Y-stream release in RHEL.

(exceptions prove the rule).

Comment 10 Thomas Haller 2021-02-09 15:58:02 UTC
we decided this might be too dangerious and too late for rhel-8.4 (nm-1-30) because it's unclear what might break.

We will merge it upstream post-1.30, and it is slanted for rhel-8.5 (and rhel-9.0).

Comment 11 Thomas Haller 2021-02-12 13:11:15 UTC
upstream reverted:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/4d66d6c7a195b9d57613d5f47741b5e470b3f2b2



so, unless (at least) the way how we talk to OVS changes, CAP_DAC_OVERRIDE cannnot be dropped...

Comment 12 Jan Pazdziora 2021-02-12 18:34:05 UTC
Couldn't the

  srwxr-x---. 1 openvswitch openvswitch 0 Xxx xx xx:xx /run/openvswitch/db.sock

be g+w and NetworkManager (or its OVS plugin) could add supplementary group openvswitch in the code ... or something similar?

Comment 13 Thomas Haller 2021-02-12 20:40:40 UTC
(In reply to Jan Pazdziora from comment #12)
> Couldn't the
> 
>   srwxr-x---. 1 openvswitch openvswitch 0 Xxx xx xx:xx
> /run/openvswitch/db.sock
> 
> be g+w and NetworkManager (or its OVS plugin) could add supplementary group
> openvswitch in the code ... or something similar?

what means "add suppl. group in the code"?

NetworkManaer runs as user "root", so root could also be in the "openvswitch" group. Which seems a bit odd...

Comment 14 Jan Pazdziora 2021-02-12 20:58:40 UTC
For file (or socket) with permissions 750, the "other" users have no permissions. Root can override that and access/read that file in spite of not being openvswitch user nor group if it has CAP_DAC_OVERRIDE capabilities. But that means that daemons running as root can ignore whatever permissions were laid out on the filesystem, increasing severity of potential vulnerability.

The purpose of dropping the capabilities is to make root-owned processes to obey the standard DAC rules -- if "other" has no permission, the process will be denied access, unless it is the root or member of the group that has permission.

Root processes with capability CAP_SETGID can use setgroups(2) call to add itself more groups (than the default root) in runtime. In this case, the daemon could cause itself to be also in the openvswitch group. That is the preferred way (over generic CAP_DAC_OVERRIDE) of root processes accessing filesystem items not owned by root user or root group.

Of course, using sudo can be another way of doing it, especially if the operations that need to access /run/openvswitch/db.sock are well confined.

Comment 18 Gris Ge 2021-03-24 04:56:08 UTC
Hi Flavio,

Is that possible for OVS to use 750 permission of daemon socket?

Comment 19 Flavio Leitner 2021-03-24 12:39:55 UTC
(In reply to Gris Ge from comment #18)
> Is that possible for OVS to use 750 permission of daemon socket?

Isn't that the current situation?

[root@wsfd-netdev73 ~]# stat -c "%a %n" /var/run/openvswitch/*sock
750 /var/run/openvswitch/db.sock
[root@wsfd-netdev73 ~]# ovs-vsctl show | grep ersion
    ovs_version: "2.15.0"
[root@wsfd-netdev73 ~]# rpm -q openvswitch2.15
openvswitch2.15-2.15.0-2.el8fdp.x86_64

Comment 20 Gris Ge 2021-03-24 12:50:24 UTC
Hi Beniamino,

Is there other obstacles out side of NM then?

Comment 21 Till Maas 2021-03-24 16:42:48 UTC
> Is there other obstacles out side of NM then?

We need 770/write access for a group (could also be via an ACL) for the proposed solution.

Comment 22 Thomas Haller 2021-03-24 19:12:52 UTC
> > Is that possible for OVS to use 750 permission of daemon socket?
> 
> Isn't that the current situation?

That's the current situation.

But maybe not the desired one: the read+executre permission for unix sockets
means that a user from that group still cannot open the unix socket (`man unix`).

I would think that it would be beneficial if users of the same group can talk to ovsdb,
which requirs `chmod 770` -- IIUC.


(In reply to Till Maas from comment #21)
> > Is there other obstacles out side of NM then?
> 
> We need 770/write access for a group (could also be via an ACL) for the
> proposed solution.

Yes and no. Only if the proposed solution is to use CAP_SETGID/setgroups() to add NM to the OVS group.
Which to mee seems not the favorable solution, because

 - we still require additional capabilities that we rather would not have.
 - to access the socket, we gain all powers of the unix group (which we otherwise don't want to have).
 - Worst: having a suboptimal solution will decrease the incentive for a good solution.

The IMO better solution is a nm-sudo service. That is medium size effort to add once, but once
we have that, we can solve various other use-cases (like dropping CAP_SYS_MODULE to call modprobe).

Without an nm-sudo service there will be always limits as to how many capabilities we can drop.

Comment 23 Till Maas 2021-03-25 11:18:57 UTC
(In reply to Thomas Haller from comment #22)
> > > Is that possible for OVS to use 750 permission of daemon socket?
> > 
> > Isn't that the current situation?
> 
> That's the current situation.
> 
> But maybe not the desired one: the read+executre permission for unix sockets
> means that a user from that group still cannot open the unix socket (`man
> unix`).
> 
> I would think that it would be beneficial if users of the same group can
> talk to ovsdb,
> which requirs `chmod 770` -- IIUC.
> 
> 
> (In reply to Till Maas from comment #21)
> > > Is there other obstacles out side of NM then?
> > 
> > We need 770/write access for a group (could also be via an ACL) for the
> > proposed solution.
> 
> Yes and no. Only if the proposed solution is to use CAP_SETGID/setgroups()
> to add NM to the OVS group.
> Which to mee seems not the favorable solution, because
> 
>  - we still require additional capabilities that we rather would not have.

This bug is about the dac_override capability, so an attempt to remove other capabilities would need more investigation


>  - to access the socket, we gain all powers of the unix group (which we
> otherwise don't want to have).

What additional powers are these? Why is it a problem?

>  - Worst: having a suboptimal solution will decrease the incentive for a
> good solution.

If it makes sense to remove other capabilities, we can still discuss this after this change. To me it seems more like that a possibly perfect solution is now blocking a good solution.

> The IMO better solution is a nm-sudo service. That is medium size effort to
> add once, but once
> we have that, we can solve various other use-cases (like dropping
> CAP_SYS_MODULE to call modprobe).
> 
> Without an nm-sudo service there will be always limits as to how many
> capabilities we can drop.


I am open to discuss a more detailed proposal for such a service that describes the nm-sudo design.

Comment 24 Thomas Haller 2021-03-25 12:10:50 UTC
(In reply to Till Maas from comment #23)
> (In reply to Thomas Haller from comment #22)
> > > > Is that possible for OVS to use 750 permission of daemon socket?
> > > 
> > > Isn't that the current situation?
> > 
> > That's the current situation.
> > 
> > But maybe not the desired one: the read+executre permission for unix sockets
> > means that a user from that group still cannot open the unix socket (`man
> > unix`).
> > 
> > I would think that it would be beneficial if users of the same group can
> > talk to ovsdb,
> > which requirs `chmod 770` -- IIUC.
> > 
> > 
> > (In reply to Till Maas from comment #21)
> > > > Is there other obstacles out side of NM then?
> > > 
> > > We need 770/write access for a group (could also be via an ACL) for the
> > > proposed solution.
> > 
> > Yes and no. Only if the proposed solution is to use CAP_SETGID/setgroups()
> > to add NM to the OVS group.
> > Which to mee seems not the favorable solution, because
> > 
> >  - we still require additional capabilities that we rather would not have.
> 
> This bug is about the dac_override capability, so an attempt to remove other
> capabilities would need more investigation

I meant, while with this approach we may drop CAP_DAC_OVERRIDE, we would need to
add CAP_SETGID. The goal is to reduce capabilities, and if a solution requires us to
trade one capability against another, then that is downside of that solution (even if
the trade may be worth it, because CAP_DAC_OVERRIDE is more dangerous than CAP_SETGID).


> >  - to access the socket, we gain all powers of the unix group (which we
> > otherwise don't want to have).
> 
> What additional powers are these? Why is it a problem?

On my machine (Fedora 33), the socket is owned by group hugetlbfs.
What else are the powers of this group? I don't know, that would require investigation. 

But the goal is to have less privileges. It's a downside when a solution for that
still gives us more than we strictly need/want.


> >  - Worst: having a suboptimal solution will decrease the incentive for a
> > good solution.
> 
> If it makes sense to remove other capabilities, we can still discuss this
> after this change. To me it seems more like that a possibly perfect solution
> is now blocking a good solution.

The Perfect is the enemy of the Good. But also the Bad is also the enemy of the Good.
Or: nothing is as durable as a workaround.

Each of the benefits that a potential nm-sudo solution could provide, are
not high on their own (otherwise, we would have addresses those issues already,
individually). Only in the sum of the benefits, it would be very attractive to do it.

We can discuss many things. But we cannot do a a solution now, and
discuss another solution afterwards, because by then the priorities will have
shifted.


> > The IMO better solution is a nm-sudo service. That is medium size effort to
> > add once, but once
> > we have that, we can solve various other use-cases (like dropping
> > CAP_SYS_MODULE to call modprobe).
> > 
> > Without an nm-sudo service there will be always limits as to how many
> > capabilities we can drop.
> 
> 
> I am open to discuss a more detailed proposal for such a service that
> describes the nm-sudo design.

Have a D-Bus activatable service (like we already have NetworkManager-dispatcher
at "org.freedesktop.nm_dispatcher"). The service can auto-quit after a short time
of idle.

That process runs with more capabilities.

It has a very strict API that allows access to a very small set of operations, like
"open /run/openvswitch/db.sock and provide a file descriptor" (D-Bus supports sending
of file descriptors).

nm-sudo service can use D-Bus to authenticate the request. On D-Bus, everybody connected
to the bus gets a unique identifier. nm-sudo sees the originator of a request, and the
request is only allowed for the process that is NetworkManager.
Thanks to D-Bus, nm-sudo can also easily find out which process is NetworkManager. It watches
name-owner of /org/freedesktop/NetworkManager and that's it. Additionally, it can check that the
name-owner runs as process is uid:0 (root). D-Bus really helps with tracking the service lifetime
and authentication.

Only downside is: this requires D-Bus. That may make it difficult for example for a initrd solution.
Well, the proper initrd solution is here too to run D-Bus. Otherwise, we need a fallback path that
NetworkManager uses if it is not connected to the D-Bus (in initrd case). Note that in initrd
NetworkManager also doesn't run as a systemd service currently -- so it already has capabilities
not limited.

What makes nm-sudo really attractive, is that we can imaging all kind of privileged operations.
Of course, it will not provide an operation "OpenArbitraryFile()". But it can guard access to
certain files and operations, that are carefully allowed.

 - open special files, so that CAP_DAC_OVERRIDE is no longer needed
 - load kernel modules so that CAP_SYS_MODULE is no longer needed.
 - we need CAP_NET_BIND_SERVICE, mostly for DHCP. Let nm-sudo do the bind,
   but only for the DHCP port.
 - open raw sockets for us, so that CAP_RAW_NET is no longer needed.


E.g. with CAP_RAW_NET, one valid question is: if nm-sudo merely opens any raw socket for
NetworkManager, why is that safer than granting CAP_RAW_NET? If done naively, it is indeed
not. But maybe the permissions of a raw socket can be restricted before passing it on? Dunno.
I think there are possibilities here. Theoretically, we could even proxy raw sockets (but that
may be so complicated to no being worth it).

The biggest part where NetworkManager has too many capabilities is file access. Sure, we
have SELinux, but it would be nice to do better. One could even imaging where nm-sudo can
proxy access to a small set of files (basically, the profiles on disk), so that we could
run with ProtectHome= and more sand-boxing options. That's the hardest part, but introducing
nm-sudo would open possibilities.

Comment 28 Thomas Haller 2021-07-26 15:25:26 UTC
upstream merged nm-sudo to main with [1], [2]:

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/938#note_1005391
[2] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/438fd3aa9cb0b2d5315de12040b1562d00140e02


What is still missing, is the actual dropping of CAP_DAC_OVERRIDE, because we first need new SELinux policy to allow it.

Moving rhbz back to NEW and cloning this rhbz for the SELinux.

Comment 29 Thomas Haller 2021-07-26 15:53:27 UTC
(In reply to Thomas Haller from comment #28)

> Moving rhbz back to NEW and cloning this rhbz for the SELinux.

bug 1986076

Comment 30 Thomas Haller 2021-10-07 12:52:32 UTC
Summery of current state:


The goal is to run NetworkManager without CAP_DAC_OVERRIDE.


We attempted to dropped that capability from NetworkManager.service unit earlier.

That uncovered the problem that OVS plugin cannot open the Unix socket at /run/openvswitch/db.sock (due to file permissions).
This problem is going to be solved by introducing nm-sudo.service D-Bus service, and NetworkManager will ask nm-sudo for the file descriptor.

E.g. see here: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/098a963e42a69b544a8505317f28a7ed1e616780/src/core/devices/ovs/nm-ovsdb.c#L2415

That is almost working, except that SELinux disallows nm-sudo to pass the file descriptor. For this we need to first fix bug 1986076.


After bug 1986076 is fixed, we can again try to drop CAP_DAC_OVERRIDE. Then again start testing (e.g. our NetworkManager-ci) and see whether this capability is missing for other reasons too. So far, no other issues are known.

Comment 31 Thomas Haller 2021-10-13 14:49:33 UTC
on my machine I have an openvpn profile, which specifies the certificate as `key = /etc/openvpn/client/tun7-keys/rh1.key`.
that file is not accessible without CAP_OVERRIDE, so dropping the capability breaks my setup.

Sure, it's probably a configuration error, and in my case I should have owned the file by root and readable to root.

But generally, users are supposed to have certificates owned by themselves (non-root), and they should not be required to make the file readable to anybody else. It seems CAP_OVERRIDE is rather important to VPN plugins.


The problem here is that VPN are spawned by NetworkManager itself. Instead, either:

 (1) there could be a separate service with more capabilities/permissions which spawns the process. Theoretically, this could be nm-sudo, but we don't actually want to grant unnecessary permissions to nm-sudo either (and the "nm-vpn-runner" would require most permissions). I think this could be something very similar to nm-sudo.

 (2) VPN plugins themselves should not require a nm-vpn-runner service, instead they should themselves be D-Bus activatable.


Advantages:

 - (1) works with old plugins and requires not changes to the existing plugin
 - (1) works probably without systemd, because it only depends on D-Bus activation (in practice however, we would sandbox nm-vpn-runner via systemd)
 - (2) does not require an additional runner which does not really know the capabilities that the VPN requires. The plugin could sandbox itself better.


TL;DR: we cannot drop CAP_DAC_OVERRIDE until we introduce a D-Bus activatable nm-vpn-runner service that spawns the actual VPN plugin for us.

Comment 39 RHEL Program Management 2022-07-28 07:44:45 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 44 Fernando F. Mancera 2022-08-04 13:47:47 UTC
We are re-open this and planning for 8.8. Thanks!

Comment 49 RHEL Program Management 2023-02-04 07:27:46 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 52 Till Maas 2023-03-01 18:49:07 UTC
This is a high effort, low impact issues to our understanding. We don't have the capacity to handle this in the foreseeable future, therefore I close this Bugzilla. Thank you for reporting this, we will still consider this in future design decisions.