Hide Forgot
Hello Ceph security, during a routine review of systemd services in the openSUSE Linux distribution I noticed a crossing security boundaries in the ceph-crash.service which is part of the Ceph project. I believe to have found a notable vulnerability in this setup. The Vulnerability ================= The ceph-crash.service [1] runs the ceph-crash Python script [2] as root. The script is operating in the directory /var/lib/ceph/crash which is controlled by the unprivileged ceph user (ceph:ceph mode 0750). This constellation is subject to file system race conditions that can allow the ceph user to either: 1) post arbitrary data as a "crash dump", even content from private files owned by root. The consequences of this are not fully clear to me, it could be an information leak if the security domain of "root" on the system is different to the security domain of wherever the ceph-crash data will be sent to / accessible afterwards. The `ceph crash post` command expects JSON input, however, thus the degree of freedom for this is reduced. 2) cause a denial-of-service by feeding large amounts of data into the `ceph crash post` process. This can cause high amounts of memory and CPU consumption. By placing a symlink or FIFO into the directory instead of an actual file, the script can be even made to read from a device file like /dev/random or to block forever. 3) cause a local ceph to root user privilege escalation by tricking ceph-crash to move a ceph controlled file into a privileged file system location. Item 3) is the most critical of these possibilities. The ceph-crash script basically does the following at a regular interval (by default every 10 minutes): a) it iterates over all sub-directories of /var/lib/ceph/crash and for each sub-directory it does the following: b) it checks whether <crash>/meta is a regular file; if not then skips the dir. c) it checks whether <crash>/done is a regular file; if not then sleeps for a second and checks again; if still not then it skips the dir. d) it feeds the content of <crash>/meta to the command line timeout 30 ceph -n <auth> crash post -i - via stdin. e) only if the crash post succeeded (exit code 0) will the script attempt to perform os.rename("/var/lib/ceph/crash/<crash>", "/var/lib/ceph/crash/posted/<crash>") Due to the sleep of one second in step c) there is a nice synchronization point for winning the race condition. A possible approach for a compromised ceph user account for exploiting this is the following: - create a fake crash directory named 'mount', containing an empty 'meta' file: ceph$ mkdir /var/lib/ceph/crash/mount ceph$ touch /var/lib/ceph/crash/mount/meta - wait for c) to happen i.e. ceph-crash sleeps for a second to wait for the "done" file to appear. This can be done in an event triggered fashion by using the inotify API to detect the service opening the crash directory. While ceph-crash is sleeping create the "done" file and replace "meta" by a FIFO: ceph$ touch /var/lib/ceph/crash/mount/done ceph$ rm /var/lib/ceph/crash/mount/meta ceph$ mkfifo /var/lib/ceph/crash/mount/meta On success the "ceph-crash" script, upon returning from the one second sleep, will block on the FIFO until the attacker is writing data into it, giving the attacker enough time to stage the rest of the attack (30 seconds, because of the `timeout` frontend command used in step d). Another approach could be to place a rather large "meta" file there so that step d) takes relatively long. The file content must be accepted by `ceph crash post`, though, because its exit code must be zero for step e) to happen. - while ceph-crash is busy forwarding data to `ceph crash post` the ceph user can replace the "mount" directory by a regular file and prepare a symlink attack: ceph$ mv /var/lib/ceph/crash/mount /var/lib/ceph/crash/oldmount ceph$ echo 'echo evil code' >/var/lib/ceph/crash/mount ceph$ chmod 755 /var/lib/ceph/crash/mount ceph$ mv /var/lib/ceph/crash/posted /var/lib/ceph/crash/posted.old ceph$ ln -s /usr/bin /var/lib/ceph/crash/posted # unblock the ceph-crash script ceph$ echo "$FAKE_JSON_DATA" >/var/lib/ceph/crash/oldmount/meta If this succeeds in time then during step e) the ceph-crash script will rename the ceph controlled "mount" file to /usr/bin/mount, thereby replacing the system binary "mount" by the ceph controlled script. Any root process invoking this is then executing exploit code. Any other binary could be used for this, or also configuration files in /etc that could allow to crack the system. Because /var/lib/ceph/crash is not world-writable and has no sticky bit, the Linux kernel's symlink protection is not coming to the rescue in this constellation. A precondition is, however, that the file system /var/lib/ceph is the same file system as the target directory for the `rename()`, because `rename()` does not work across file system boundaries. For many default setups this is the case though. Reproducer ========== Attached to this e-mail is a proof of concept exploit script that demonstrates the vulnerability. Running the script with ceph:ceph credentials pretty reliably replaces /usr/bin/mount by a ceph controlled script. Since ceph-crash only executes its routine every 10 minutes it can take a bit of time to succeed, if the race is not won, but it is well within reach to succeed in a real world scenario. Possible Fix ============ To fix the issue the simplest route I see would be to execute the ceph-crash script also as ceph:ceph. If this is *not* possible for some reason though, then a careful selection of system calls and/or temporary privilege drops will be necessary in the ceph-crash script to avoid any symlink attacks and other race conditions on file system level. The systemd service, the ceph-crash script and also the directory permissions for /var/lib/ceph/crash are not specific to SUSE packaging but are already found in the upstream sources. Also Fedora Linux ships with the same setup, for example. Coordinated Disclosure ====================== The SUSE security team offers coordinated disclosure based on the openSUSE security disclosure policy [3]. I suggest an initial embargo period of 14 days resulting in a publication date no later than 2022-10-04. The maximum embargo period we can offer is 90 days (2022-12-19). Please let us know whether you want to follow the coordinated disclosure process and what your prefered publication date would be. The alternative is that we make the issue public right away. Please let us also know whether you acknowledge the issue(s). I suggest to assign one CVE for the fact that caph-crash operates with root privileges in the ceph controlled directory tree. If you want you can assign/request a CVE yourself, we can also offer to request one directly a Mitre for this purpose. Best Regards Matthias [1]: https://github.com/ceph/ceph/blob/main/systemd/ceph-crash.service.in [2]: https://github.com/ceph/ceph/blob/main/src/ceph-crash.in [3]: https://en.opensuse.org/openSUSE:Security_disclosure_policy
Created ceph tracking bugs for this issue: Affects: fedora-35 [bug 2137598] Affects: fedora-36 [bug 2137599]
*** EmbargoedBug 2129447 has been marked as a duplicate of this bug. ***
This issue has been addressed in the following products: Red Hat Ceph Storage 5.3 Via RHSA-2023:0980 https://access.redhat.com/errata/RHSA-2023:0980
This bug is now closed. Further updates for individual products will be reflected on the CVE page(s): https://access.redhat.com/security/cve/cve-2022-3650