Description of problem: This is a continuation of BZ #1871869 related to setroubleshootd. When a system has /tmp, /var/tmp and /dev/shm mounted with "noexec" option, ffi_closure_alloc() called by Python fails. This shows up in the journal through this message (setroubleshootd being a Python script): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- setroubleshootd[8102]: could not allocate closure setroubleshootd[8102]: could not allocate closure -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- The message above comes from /usr/lib64/libgirepository-1.0.so.1.0.0 which basically just calls ffi_closure_alloc(): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 364 ffi_closure * 365 g_callable_info_prepare_closure (GICallableInfo *callable_info, 366 ffi_cif *cif, 367 GIFFIClosureCallback callback, 368 gpointer user_data) 369 { : 380 closure = ffi_closure_alloc (sizeof (GIClosureWrapper), &exec_ptr); 381 if (!closure) 382 { 383 g_warning ("could not allocate closure\n"); <<<<<<<<< HERE 384 return NULL; 385 } : 410 return exec_ptr; 411 } -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Stracing setroubleshootd shows that /proc/self/mounts were processed (then /etc/mtab): -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 5950 11:08:44.098683 openat(AT_FDCWD, "/proc/mounts", O_RDONLY) = 6</proc/5950/mounts> <0.000010> 5950 11:08:44.098707 fstat(6</proc/5950/mounts>, {st_dev=makedev(0, 4), st_ino=36348, st_mode=S_IFREG|0444, st_nlink=1, st_uid=986, st_gid=982, st_blksize=1024, st_blocks=0, st_size=0, st_atime=1599124124 /* 2020-09-03T11:08:44.098161186+0200 */, st_atime_nsec=98161186, st_mtime=1599124124 /* 2020-09-03T11:08:44.098161186+0200 */, st_mtime_nsec=98161186, st_ctime=1599124124 /* 2020-09-03T11:08:44.098161186+0200 */, st_ctime_nsec=98161186}) = 0 <0.000002> : 5950 11:08:44.098810 close(6</proc/5950/mounts>) = 0 <0.000003> 5950 11:08:44.098825 openat(AT_FDCWD, "/tmp", O_RDWR|O_EXCL|O_CLOEXEC|O_TMPFILE, 0700) = 6</tmp/#36349 (deleted)> <0.000012> 5950 11:08:44.098851 ftruncate(6</tmp/#36349 (deleted)>, 4096) = 0 <0.000003> 5950 11:08:44.098864 mmap(NULL, 4096, PROT_READ|PROT_EXEC, MAP_SHARED, 6</tmp/#36349 (deleted)>, 0) = -1 EPERM (Operation not permitted) <0.000004> : -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- From my understanding libffi is trying to find where to map the file to execute in the following locations: libffi-3.1/src/closures.c: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 328 #ifdef HAVE_MNTENT 329 /* Open a temporary file in an executable and writable mount point 330 listed in the mounts file. Subsequent calls with the same mounts 331 keep searching for mount points in the same file. Providing NULL 332 as the mounts file closes the file. */ 333 static int 334 open_temp_exec_file_mnt (const char *mounts) 335 { : 374 } 375 #endif /* HAVE_MNTENT */ and 377 /* Instructions to look for a location to hold a temporary file that 378 can be mapped in for execution. */ 379 static struct 380 { 381 int (*func)(const char *); 382 const char *arg; 383 int repeat; 384 } open_temp_exec_file_opts[] = { 385 { open_temp_exec_file_env, "TMPDIR", 0 }, 386 { open_temp_exec_file_dir, "/tmp", 0 }, 387 { open_temp_exec_file_dir, "/var/tmp", 0 }, 388 { open_temp_exec_file_dir, "/dev/shm", 0 }, 389 { open_temp_exec_file_env, "HOME", 0 }, 390 #ifdef HAVE_MNTENT 391 { open_temp_exec_file_mnt, "/etc/mtab", 1 }, 392 { open_temp_exec_file_mnt, "/proc/mounts", 1 }, 393 #endif /* HAVE_MNTENT */ 394 }; -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- On some customers systems which were hardened (for STIG maybe but I'm not sure), /tmp, /var/tmp and /dev/shm are all mounted with "noexec" flag so this just cannot work (at least for system services such as setroubleshootd). Version-Release number of selected component (if applicable): libffi-3.1-21.el8.x86_64 How reproducible: Always with setroubleshootd (sorry, reproducer is complicated) Steps to Reproduce: 1. Set up /tmp, /var/tmp and /dev/shm with "noexec" option /etc/fstab: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- tmpfs /tmp tmpfs defaults,nodev,nosuid,noexec 0 0 tmpfs /var/tmp tmpfs defaults,nodev,nosuid,noexec 0 0 tmpfs /dev/shm tmpfs nodev,nosuid,noexec 0 0 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 2. Reboot for changes to apply 3. Execute "sealert" to trigger "setroubleshootd" -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # sealert -l \* -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 4. Check the journal Actual results: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- setroubleshootd[8102]: could not allocate closure setroubleshootd[8102]: could not allocate closure -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Unfortunately, this is a consequence of how JIT support works with SELinux. We have a conceptual draft for an ABI-compatible version of libffi that does not require run-time code generation, but there's nothing yet that can be tested.
As of libffi-3.1-22.el8 we will allow systems to set the LIBFFI_TMPDIR environment variable to specify a libffi-specific directory which *does* have the required permissions on it, for when the "normal" temp directories are hardened. While we continue to research a better way to handle libffi closures, this workaround is our current answer to this particular problem.
This LIBFFI_TMPDIR environment will not really help in the customer's scenario when all tmpfs filesystems and /dev/shm have "noexec" flag. Indeed, services won't be able to use LIBFFI_TMPDIR unless it's somehow hardcoded in their environment. We can do that by modifying default's environment (using "systemctl set-environment" command) to include LIBFFI_TMPDIR and additionally set up a special mount for this. Hence I think we need to involve systemd maintainers as well to come up with a transparent solution. IMHO a real solution for services would be to be able to mount a private tmpfs as /tmp and /var/tmp, which isn't possible yet, there is no systemd properties for that: there is only PrivateTmp=true which basically unshares /tmp and /var/tmp, hence will inherit the "noexec" flag.
We are investigating the possibility of using memfd_create() as a potential workaround for filesystems that lack exec support. This may take some time and some upstream work.
When applying DISA STIG profile at installation time, the profile automatically sets "noexec" to /tmp, /var/tmp and /dev/shm, hence this is a real problem here: Kickstart sample: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- # Disk partitioning information part /boot --fstype="xfs" part pv.24 --fstype="lvmpv" --noformat volgroup VolGroup --noformat --useexisting logvol /tmp --fstype="xfs" --useexisting --name=tmp --vgname=VolGroup logvol /var/log --fstype="xfs" --useexisting --name=log --vgname=VolGroup logvol /var --fstype="xfs" --useexisting --name=var --vgname=VolGroup logvol /home --fstype="xfs" --useexisting --name=home --vgname=VolGroup logvol /var/log/audit --fstype="xfs" --useexisting --name=audit --vgname=VolGroup logvol swap --fstype="swap" --size=2048 --useexisting --resize --name=swap --vgname=VolGroup logvol /var/tmp --fstype="xfs" --useexisting --name=vartmp --vgname=VolGroup logvol / --fstype="xfs" --size=8192 --useexisting --resize --name=root --vgname=VolGroup %addon org_fedora_oscap content-type = scap-security-guide profile = xccdf_org.ssgproject.content_profile_stig %end -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Resulting /etc/fstab options: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- /dev/mapper/VolGroup-tmp /tmp xfs defaults,nosuid,noexec,nodev 0 0 /dev/mapper/VolGroup-vartmp /var/tmp xfs defaults,noexec,nodev,nosuid 0 0 tmpfs /dev/shm tmpfs defaults,relatime,nodev,noexec,nosuid 0 0 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Accidentially overwrote doc text
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (libffi bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2054