Bug 1931934 - RFE: firefox control groups organization for systemd-oomd
Summary: RFE: firefox control groups organization for systemd-oomd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: firefox
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Gecko Maintainer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1913794
TreeView+ depends on / blocked
 
Reported: 2021-02-23 15:40 UTC by Chris Murphy
Modified: 2021-04-01 00:52 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-01 00:52:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Chris Murphy 2021-02-23 15:40:02 UTC
Description of problem:

Fedora 34 change to use systemd-oomd system wide. Earlyoom worked on a per process basis, so it tended to SIGTERM/SIGKILL on a per tab basis. But sd-oomd works at the cgroup level, and all Firefox processes are currently located in the same scope. This means Firefox as a whole is subject to being killed off when it exceeds resource control limits.

https://fedoraproject.org/wiki/Changes/EnableSystemdOomd


Version-Release number of selected component (if applicable):
final release version of Firefox for Fedora 34


Additional info:


$ systemctl --user status app-gnome-firefox-2277.scope
● app-gnome-firefox-2277.scope - Application launched by gnome-shell
     Loaded: loaded (/run/user/1000/systemd/transient/app-gnome-firefox-2277.scope; transient)
  Transient: yes
     Active: active (running) since Tue 2021-02-23 07:44:54 MST; 31min ago
      Tasks: 334 (limit: 9364)
     Memory: 1.6G
        CPU: 17min 9.492s
     CGroup: /user.slice/user-1000.slice/user/app.slice/app-gnome-firefox-2277.scope
             ├─2277 /usr/lib64/firefox/firefox
             ├─2423 /usr/lib64/firefox/firefox -contentproc -childID 1 -isForBrowser -prefsLen 1 -prefMapSize 246427 -parentBuildID 20210203130351 -appdir /u>
             ├─2492 /usr/lib64/firefox/firefox -contentproc -childID 2 -isForBrowser -prefsLen 7248 -prefMapSize 246427 -parentBuildID 20210203130351 -appdir>
             ├─2548 /usr/lib64/firefox/firefox -contentproc -childID 3 -isForBrowser -prefsLen 8046 -prefMapSize 246427 -parentBuildID 20210203130351 -appdir>
             ├─2595 /usr/lib64/firefox/firefox -contentproc -childID 4 -isForBrowser -prefsLen 10261 -prefMapSize 246427 -parentBuildID 20210203130351 -appdi>
             ├─2672 /usr/lib64/firefox/firefox -contentproc -childID 5 -isForBrowser -prefsLen 10876 -prefMapSize 246427 -parentBuildID 20210203130351 -appdi>
             ├─2715 /usr/lib64/firefox/firefox -contentproc -childID 6 -isForBrowser -prefsLen 10876 -prefMapSize 246427 -parentBuildID 20210203130351 -appdi>
             ├─2748 /usr/lib64/firefox/firefox -contentproc -childID 7 -isForBrowser -prefsLen 10876 -prefMapSize 246427 -parentBuildID 20210203130351 -appdi>
             └─3095 /usr/lib64/firefox/firefox -contentproc -childID 8 -isForBrowser -prefsLen 11883 -prefMapSize 246427 -parentBuildID 20210203130351 -appdi>

Comment 1 Martin Stransky 2021-02-23 15:44:00 UTC
How can that be fixed on Firefox side?

Comment 2 Anita Zhang 2021-03-02 09:39:21 UTC
This is an idea I had after looking at the Firefox code:

It seems that there's always one process (the first one) that acts as a primary and spawns the other firefox workers. This primary should read a flag or environment variable that indicates that the init is systemd, /sys/fs/cgroup is mounted cgroup2, and that firefox is being started in its own unit. Then this process needs to figure out which cgroup it is in; this can usually be figured out by parsing `/proc/self/cgroup` and reading the line starting with "0::" and prepending "/sys/fs/cgroup" to get the absolute path (in a fully cgroup2 system there is usually only one line but on occasion we've seen additional lines starting with "1:name=systemd:" so don't assume it will be the first line). Create a new directory under the absolute path (e.g. if "/sys/fs/cgroup/firefox.service/" is the current cgroup for the primary PID then "/sys/fs/cgroup/firefox.scope/main" could be the new path) to create a new sub-cgroup. Write your PID into "/sys/fs/cgroup/firefox.service/main/cgroup.procs" to move the primary into this sub-cgroup.

Whenever the primary forks off new workers, it should fork, create a new sibling directory (e.g. "/sys/fs/cgroup/firefox.service/<child PID>"), and write the child PID into "/sys/fs/cgroup/firefox.service/<child PID>/cgroup.procs" to move it into its own cgroup. Each child should have a unique cgroup path. Whenever a child exits the primary should clean up the child's cgroup.

GNOME or whatever is creating the unit for Firefox should pass the unit property Delegate=yes to fully follow the contract set out by systemd. But generally the cgroup2 contract says that processes should only be on leaf nodes, hence why the primary PID and all its children are leaf nodes under the firefox.service unit.

It would be easier if Firefox could make dbus calls to systemd to spawn new cgroups for it. However the sandboxing code in Firefox uses some namespaces that are not supported by systemd services. I think manually managing the cgroups as described above would get us to the goal without having to change any namespacing features.

Comment 3 Benjamin Berg 2021-03-17 16:11:04 UTC
I believe that in the long run we want a nice interface through xdg-portal that allows doing all this (and possibly more). But, it is not yet clear how this might look like, and having browser implement something now to just have to move to such a new XDG portal is also weird.

So, in the short term, a good option could be to solve this outside of the browsers entirely. As an experiment I implemented
  https://gitlab.freedesktop.org/benzea/cgroupify
which solves the issue by starting a tiny service using systemd for each browser instance.

Note that this can happen entirely without the browser knowing about it. This means, we could e.g. ship this as part of the uresourced package and avoid needing hacks inside the browser packages. Obviously we need to be careful to remove such a hack again when the browsers start managing their cgroups themselves.

Comment 4 Fedora Update System 2021-03-30 20:23:03 UTC
FEDORA-2021-af75ff35e7 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-af75ff35e7

Comment 5 Fedora Update System 2021-03-31 01:20:27 UTC
FEDORA-2021-af75ff35e7 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-af75ff35e7`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-af75ff35e7

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 6 Fedora Update System 2021-04-01 00:52:21 UTC
FEDORA-2021-af75ff35e7 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.