Bug 2258599

Summary: Leak in uresourced inotify watches
Product: [Fedora] Fedora Reporter: Raman Gupta <rocketraman>
Component: uresourcedAssignee: Benjamin Berg <benjamin-fedora>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 39CC: benjamin-fedora, danielkza2, hpa, jyx21, tom
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raman Gupta 2024-01-16 13:23:58 UTC
`uresourced` appears to leak inotify watches.

Using `inotify-info`, one can see the number of watches consumed by `uresourced` keeps increasing, and immediately drops significantly upon restart.

```
Before restart:

------------------------------------------------------------------------------
INotify Limits:
max_queued_events    16384
max_user_instances   512
max_user_watches     524288
------------------------------------------------------------------------------
Pid  App                        Watches   Instances
4782 uresourced                     43439   1
```

After restart:

```
Pid  App                        Watches   Instances
1365099 uresourced                     160   1
```


Reproducible: Always

Steps to Reproduce:
1. Wait
2. The leak happens fast enough that `watch 'inotify-info | grep resourced'` is enough to watch the inotify watch counts rise every few seconds.



Another user confirmed the bug here: https://discussion.fedoraproject.org/t/uresourced-461312-inotify-watches-normal/72952/2.

Comment 1 Tom Hughes 2024-01-16 13:51:38 UTC
I wrote a script to demonstrate the problem - it looks at all the inodes being watched by the current user's uresourced and tries to find them in the cgroup tree and many simply don't exist any more:

#!/bin/sh

uid=$(id -u)
pid=$(pgrep -u ${uid} uresourced)

for watch in $(fgrep inotify /proc/${pid}/fdinfo/5 | awk '{ print $3 }')
do
  inum=$((16#${watch##ino:}))
  name=$(find /sys/fs/cgroup -inum ${inum})

  echo "${inum} - ${name}"
done
  
exit 0

Comment 2 jyx21 2024-01-16 14:17:48 UTC
After restarting uresourced for 12 hours, it now has 12758 watches, consistent with my observation of (complained by vscode, then) restarting uresourced approximately once a week. Among them, only the following items can be found in /sys.

    8353 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice
    8408 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-podman\x2dcompose.slice
    8463 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus.socket
    8573 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-podman\x2dcompose.slice/podman-compose
    8699 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-podman\x2dcompose.slice/podman-compose
   10491 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/run-rfb2b8060ecf24d3b925426e11f9ff9be.scope
   10766 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/podman.service
 8103720 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-gnome\x2dsession\x2dmanager.slice
 8103775 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/gnome-session-monitor.service
 8103885 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-gnome\x2dsession\x2dmanager.slice/gnome-session-manager
 8104302 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus-:1.223-org.a11y.atspi.Registry
 8104454 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus-:1.2-org.gnome.Shell.CalendarServer
 8104509 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/evolution-source-registry.service
 8104624 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus-:1.2-org.gnome.Shell.Notifications
 8105461 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus-:1.2-org.freedesktop.portal.IBus
 8105516 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus-:1.2-org.freedesktop.problems.applet
 8105741 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-gnome-org.gnome.Evolution\x2dalarm\x2dnotify-3399597.scope
 8105796 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-gnome-org.gnome.SettingsDaemon.DiskUtilityNotify-3399562.scope
 8105851 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-gnome-org.gnome.Software-3399490.scope
 8105906 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus-:1.2-org.gnome.ScreenSaver
 8106003 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus-:1.2-org.gnome.OnlineAccounts
 8106100 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus-:1.2-org.gnome.Identity
 8106155 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/evolution-calendar-factory.service
 8106396 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/evolution-addressbook-factory.service
 8106655 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dconf.service
 8106854 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/xdg-desktop-portal-gnome.service
 8106909 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/xdg-desktop-portal-gtk.service
 8126880 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/dbus-:1.2-org.gnome.Calendar
 8127265 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/obex.service
 8127435 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/tracker-miner-fs-3.service
 8127885 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-cgroupify.slice
21813140 /sys/fs/cgroup/user.slice/user-1000.slice/user/app.slice/app-org.gnome.Terminal.slice

Comment 3 Benjamin Berg 2024-01-16 16:48:28 UTC
lol, I am surprised no one noticed this earlier … can you try this patch (seems to work here)?

diff --git a/src/r-app-monitor.c b/src/r-app-monitor.c
index 71f7244..4a39fb2 100644
--- a/src/r-app-monitor.c
+++ b/src/r-app-monitor.c
@@ -64,12 +64,17 @@ r_app_monitor_finalize (GObject *object)
   g_clear_pointer (&self->wd_to_path_map, g_hash_table_destroy);
   g_clear_pointer (&self->app_info_map, g_hash_table_destroy);
 
+  if (self->inotify_fd >= 0)
+    close (self->inotify_fd);
+
   G_OBJECT_CLASS (r_app_monitor_parent_class)->finalize (object);
 }
 
 gboolean
 inotify_add_cgroup_dir (RAppMonitor *self, gchar *path)
 {
+  gpointer old_path;
+  gpointer old_wd;
   gint wd;
 
   wd = inotify_add_watch (self->inotify_fd, path,
@@ -77,6 +82,14 @@ inotify_add_cgroup_dir (RAppMonitor *self, gchar *path)
   if (wd == -1)
     return FALSE;
 
+  if (g_hash_table_steal_extended(self->path_to_wd_map, path, &old_path, &old_wd))
+    {
+      g_free (old_path);
+      g_hash_table_remove (self->wd_to_path_map, old_wd);
+
+      inotify_rm_watch (self->inotify_fd, GPOINTER_TO_INT (old_wd));
+    }
+
   g_hash_table_replace (self->path_to_wd_map, g_strdup (path),
                         GINT_TO_POINTER (wd));
   g_hash_table_replace (self->wd_to_path_map, GINT_TO_POINTER (wd),
@@ -362,6 +375,8 @@ handle_inotify_event (RAppMonitor *self, struct inotify_event *i)
       g_hash_table_remove (self->path_to_wd_map, app_path);
       g_hash_table_remove (self->wd_to_path_map, wd_temp);
       g_hash_table_remove (self->app_info_map, app_path);
+
+      inotify_rm_watch (self->inotify_fd, GPOINTER_TO_INT (wd_temp));
     }
 }

Comment 4 Raman Gupta 2024-01-16 17:10:46 UTC
(In reply to Benjamin Berg from comment #3)
> lol, I am surprised no one noticed this earlier … can you try this patch
> (seems to work here)?

Installed the patch and I can confirm that the watch counts now appear to make sense, remaining stable by both going up *and* going down.

I can also confirm that the script from Tom Hughes (https://bugzilla.redhat.com/show_bug.cgi?id=2258599#c1) only outputs valid watches.

Comment 5 jyx21 2024-01-17 01:50:25 UTC
I'm not an expert, but is this "leak" tied to specific usecases, e.g., podman rootless containers? I'm surprised little information could be obtained by searching.

Comment 6 Daniel Miranda 2024-02-06 17:37:20 UTC
I am observing this issue, but I am not using podman, rootless or otherwise.

Any chance we can get the patch applied and pushed to Fedora?