This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2181467 - fapolicyd can leak FDs and never answer request, causing target process to hang forever
Summary: fapolicyd can leak FDs and never answer request, causing target process to ha...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: fapolicyd
Version: 8.7
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Radovan Sroka
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks: 2182065
TreeView+ depends on / blocked
 
Reported: 2023-03-24 08:16 UTC by Renaud Métrich
Modified: 2023-06-19 17:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2182065 (view as bug list)
Environment:
Last Closed: 2023-06-19 17:36:17 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Script simulating a "queue full" when executing sha256sum (772 bytes, text/plain)
2023-03-24 08:24 UTC, Renaud Métrich
no flags Details
Script simulating a "queue full" when executing sha256sum VERSION 2 (858 bytes, text/plain)
2023-03-27 06:37 UTC, Renaud Métrich
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-632 0 None None None 2023-06-19 17:36:16 UTC
Red Hat Issue Tracker RHELPLAN-152992 0 None None None 2023-03-24 08:17:59 UTC
Red Hat Issue Tracker RHINRULE-185 0 None None None 2023-03-28 01:44:45 UTC
Red Hat Issue Tracker SECENGSP-5121 0 None None None 2023-03-24 08:18:09 UTC
Red Hat Knowledge Base (Solution) 6998702 0 None None None 2023-03-27 03:36:35 UTC

Description Renaud Métrich 2023-03-24 08:16:50 UTC
Description of problem:

We have a customer hitting hangs of his processes, due to the processes waiting for reply from fapolicyd which never comes.
On the fapolicyd side, we could see some kind of "FD leak", but on fapolicyd side, the process was mostly idle, waiting for events.

Digging into the code, we could find that there is a bug when the event cannot be queued, leading to both FD leak and target process hanging:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
323 static void enqueue_event(const struct fanotify_event_metadata *metadata)
324 {
325         if (q_append(q, metadata))
326                 msg(LOG_DEBUG, "enqueue error");
327         else
328                 set_ready();
329 }

341 void handle_events(void)
342 {
 :
374                 if (metadata->fd >= 0) {
375                         if (metadata->mask & mask) {
376                                 if (metadata->pid == our_pid)
377                                         approve_event(metadata);
378                                 else
379                                         enqueue_event(metadata);
380                         }
381                         // For now, prevent leaking descriptors
382                         // in the near future we should do processing
383                         // to update the cache.
384                         else {
385                                 close(metadata->fd);
386                                 goto out;
387                         }
388                 }
 :

106 /* add DATA to Q */
107 int q_append(struct queue *q, const struct fanotify_event_metadata *data)
108 {
 :
112         if (q->queue_length == q->num_entries) {
113                 errno = ENOSPC;
114                 return -1;
115         }
 :
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Here above, when handle_events() enqueues an event (line 379), it's possible that the enqueuing fails, due to not having enough space or another issue (line 113 of q_append()).
In such case enqueue_event() logs are DEBUG level and returns immediately (line 326), causing the FD to never be closed, which creates a FD leak.
Additionally target process will hang forever since it never gets a reply, since event wasn't enqueued.

I can reproduce this through simulation using a systemtap script.

Version-Release number of selected component (if applicable):

fapolicyd-1.1.3-8.el8_7.1.x86_64

How reproducible:

Always using systemtap script below

Steps to Reproduce:
1. Start "hack_sha256sum.stp" stap script in attachment

  The script will simulate a "queue full" when getting an event for "sha256sum" process
  # stap -v -g ./hack_sha256sum.stp

2. From a shell, execute "sha256sum" program

  # sha256sum

3. stap script shows fake "queue full" event

  q_append(): Event for PID 2437 (sha256sum), FD 11: queue_len=0, num_entries=640
  q_append(): returning -1
  q_append(): restoring queue_length from 640 to 0

Actual results:

- sha256sum hangs forever and cannot be killed (even with SIGKILL signal):

  # cat /proc/2437/stack 
  [<0>] fanotify_handle_event+0x306/0x360
  [<0>] fsnotify+0x253/0x580
  [<0>] do_dentry_open+0xce/0x340
  [<0>] path_openat+0x53e/0x14f0
  [<0>] do_filp_open+0x93/0x100
  [<0>] do_sys_open+0x184/0x220
  [<0>] do_syscall_64+0x5b/0x1a0
  [<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
  
  # kill -9 2437
  --> no effect

- fapolicyd leaks FD 11

  # ll /proc/1792/fd
  total 0
  lr-x------. 1 root root 64 Mar 24 08:51 0 -> /dev/null
  lrwx------. 1 root root 64 Mar 24 08:51 1 -> 'socket:[29169]'
  lrwx------. 1 root root 64 Mar 24 08:51 10 -> 'anon_inode:[fanotify]'
  lr-x------. 1 root root 64 Mar 24 08:51 11 -> /etc/ld.so.cache          <<<<<<<<<< FD of "target process 2437"
  lrwx------. 1 root root 64 Mar 24 08:51 2 -> 'socket:[29169]'
  lrwx------. 1 root root 64 Mar 24 08:51 3 -> /run/fapolicyd/fapolicyd.fifo
  lr-x------. 1 root root 64 Mar 24 08:51 4 -> /var/lib/sss/mc/passwd
  lrwx------. 1 root root 64 Mar 24 08:51 5 -> 'socket:[29173]'
  lr-x------. 1 root root 64 Mar 24 08:51 6 -> /var/lib/sss/mc/group
  lrwx------. 1 root root 64 Mar 24 08:51 7 -> /var/lib/fapolicyd/lock.mdb
  lrwx------. 1 root root 64 Mar 24 08:51 8 -> /var/lib/fapolicyd/data.mdb
  lr-x------. 1 root root 64 Mar 24 08:51 9 -> /proc/1792/mounts

Expected results:

- sha256sum continues execution (or gets denied, I don't know, needs to be evaluated, in Permissive it should continue for sure)
- fapolicyd doesn't leak FD 11, fapolicyd "complains hard" and authorizes or denies target process execution (in Permissive it should authorize for sure)

Comment 1 Renaud Métrich 2023-03-24 08:24:18 UTC
Created attachment 1953338 [details]
Script simulating a "queue full" when executing sha256sum

Comment 2 Renaud Métrich 2023-03-24 08:35:30 UTC
Note: the fact that SIGKILL doesn't have effect makes the system take a long time when trying to reboot it, because systemd itself tries to SIGKILL remaining processes.

Comment 8 Renaud Métrich 2023-03-27 06:37:58 UTC
Created attachment 1953872 [details]
Script simulating a "queue full" when executing sha256sum VERSION 2

Comment 12 Radovan Sroka 2023-06-19 17:32:24 UTC
This bug is going to be migrated.

Contact point for migration questions or issues: rsroka
Guidance for Bugzilla users to test their Jira account or create one if needed:

https://redhat.service-now.com/help?id=kb_article_view&sysparm_article=KB0016394
https://redhat.service-now.com/help?id=kb_article_view&sysparm_article=KB0016694
https://redhat.service-now.com/help?id=kb_article_view&sysparm_article=KB0016774


Note You need to log in before you can comment on or make changes to this bug.