Bug 2181467
| Summary: | fapolicyd can leak FDs and never answer request, causing target process to hang forever | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> | ||||||
| Component: | fapolicyd | Assignee: | Radovan Sroka <rsroka> | ||||||
| Status: | CLOSED MIGRATED | QA Contact: | BaseOS QE Security Team <qe-baseos-security> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 8.7 | CC: | qguo, tnagata | ||||||
| Target Milestone: | rc | Keywords: | MigratedToJIRA, Triaged | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 2182065 (view as bug list) | Environment: | |||||||
| Last Closed: | 2023-06-19 17:36:17 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 2182065 | ||||||||
| Attachments: |
|
||||||||
Created attachment 1953338 [details]
Script simulating a "queue full" when executing sha256sum
Note: the fact that SIGKILL doesn't have effect makes the system take a long time when trying to reboot it, because systemd itself tries to SIGKILL remaining processes. Created attachment 1953872 [details]
Script simulating a "queue full" when executing sha256sum VERSION 2
This bug is going to be migrated. Contact point for migration questions or issues: rsroka Guidance for Bugzilla users to test their Jira account or create one if needed: https://redhat.service-now.com/help?id=kb_article_view&sysparm_article=KB0016394 https://redhat.service-now.com/help?id=kb_article_view&sysparm_article=KB0016694 https://redhat.service-now.com/help?id=kb_article_view&sysparm_article=KB0016774 |
Description of problem: We have a customer hitting hangs of his processes, due to the processes waiting for reply from fapolicyd which never comes. On the fapolicyd side, we could see some kind of "FD leak", but on fapolicyd side, the process was mostly idle, waiting for events. Digging into the code, we could find that there is a bug when the event cannot be queued, leading to both FD leak and target process hanging: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 323 static void enqueue_event(const struct fanotify_event_metadata *metadata) 324 { 325 if (q_append(q, metadata)) 326 msg(LOG_DEBUG, "enqueue error"); 327 else 328 set_ready(); 329 } 341 void handle_events(void) 342 { : 374 if (metadata->fd >= 0) { 375 if (metadata->mask & mask) { 376 if (metadata->pid == our_pid) 377 approve_event(metadata); 378 else 379 enqueue_event(metadata); 380 } 381 // For now, prevent leaking descriptors 382 // in the near future we should do processing 383 // to update the cache. 384 else { 385 close(metadata->fd); 386 goto out; 387 } 388 } : 106 /* add DATA to Q */ 107 int q_append(struct queue *q, const struct fanotify_event_metadata *data) 108 { : 112 if (q->queue_length == q->num_entries) { 113 errno = ENOSPC; 114 return -1; 115 } : -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Here above, when handle_events() enqueues an event (line 379), it's possible that the enqueuing fails, due to not having enough space or another issue (line 113 of q_append()). In such case enqueue_event() logs are DEBUG level and returns immediately (line 326), causing the FD to never be closed, which creates a FD leak. Additionally target process will hang forever since it never gets a reply, since event wasn't enqueued. I can reproduce this through simulation using a systemtap script. Version-Release number of selected component (if applicable): fapolicyd-1.1.3-8.el8_7.1.x86_64 How reproducible: Always using systemtap script below Steps to Reproduce: 1. Start "hack_sha256sum.stp" stap script in attachment The script will simulate a "queue full" when getting an event for "sha256sum" process # stap -v -g ./hack_sha256sum.stp 2. From a shell, execute "sha256sum" program # sha256sum 3. stap script shows fake "queue full" event q_append(): Event for PID 2437 (sha256sum), FD 11: queue_len=0, num_entries=640 q_append(): returning -1 q_append(): restoring queue_length from 640 to 0 Actual results: - sha256sum hangs forever and cannot be killed (even with SIGKILL signal): # cat /proc/2437/stack [<0>] fanotify_handle_event+0x306/0x360 [<0>] fsnotify+0x253/0x580 [<0>] do_dentry_open+0xce/0x340 [<0>] path_openat+0x53e/0x14f0 [<0>] do_filp_open+0x93/0x100 [<0>] do_sys_open+0x184/0x220 [<0>] do_syscall_64+0x5b/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca # kill -9 2437 --> no effect - fapolicyd leaks FD 11 # ll /proc/1792/fd total 0 lr-x------. 1 root root 64 Mar 24 08:51 0 -> /dev/null lrwx------. 1 root root 64 Mar 24 08:51 1 -> 'socket:[29169]' lrwx------. 1 root root 64 Mar 24 08:51 10 -> 'anon_inode:[fanotify]' lr-x------. 1 root root 64 Mar 24 08:51 11 -> /etc/ld.so.cache <<<<<<<<<< FD of "target process 2437" lrwx------. 1 root root 64 Mar 24 08:51 2 -> 'socket:[29169]' lrwx------. 1 root root 64 Mar 24 08:51 3 -> /run/fapolicyd/fapolicyd.fifo lr-x------. 1 root root 64 Mar 24 08:51 4 -> /var/lib/sss/mc/passwd lrwx------. 1 root root 64 Mar 24 08:51 5 -> 'socket:[29173]' lr-x------. 1 root root 64 Mar 24 08:51 6 -> /var/lib/sss/mc/group lrwx------. 1 root root 64 Mar 24 08:51 7 -> /var/lib/fapolicyd/lock.mdb lrwx------. 1 root root 64 Mar 24 08:51 8 -> /var/lib/fapolicyd/data.mdb lr-x------. 1 root root 64 Mar 24 08:51 9 -> /proc/1792/mounts Expected results: - sha256sum continues execution (or gets denied, I don't know, needs to be evaluated, in Permissive it should continue for sure) - fapolicyd doesn't leak FD 11, fapolicyd "complains hard" and authorizes or denies target process execution (in Permissive it should authorize for sure)