Bug 1759954

Summary: systemd-cryptsetup dies due to memory corruption
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: systemdAssignee: systemd-maint
Status: CLOSED DUPLICATE QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: high    
Version: ---CC: scorreia, systemd-maint-list
Target Milestone: rc   
Target Release: 8.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-09 15:11:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2019-10-09 13:29:05 UTC
Description of problem:

With Clevis/Tang, when executing multiple instances of systemd-cryptsetup to unlock a system (6 or 8 instances), systemd-cryptsetup can die with "malloc(): memory corruption" or "realloc(): invalid next size" glibc message.

This is due to zero'ing non allocated data in src/shared/ask-password-api.c:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 63 static int retrieve_key(key_serial_t serial, char ***ret) {
 :
 70         for (;;) {
 71                 p = new(char, m);
 :
 75                 n = keyctl(KEYCTL_READ, (unsigned long) serial, (unsigned long) p, (unsigned long) m,     0);
 :
 82                 explicit_bzero(p, n);
 83                 free(p);
 84                 m *= 2;
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

On line 82, reached when the buffer "p" of size "m" is too small to contain the key ("n" bytes required), a zero'ing is done for "n" bytes long, instead of "m" bytes (size of the buffer).


Version-Release number of selected component (if applicable):

systemd-239-13.el8_0.5 + systemd from RHEL 8.1 beta

How reproducible:

Always with many Luks devices


Steps to Reproduce:
1. Install a VM with many Luks devices

  To do so, don't click on "Click here to create them automatically" in storage panel but directly on "+" button.
  This will create 1 LVM VG on /dev/vda2 + as many Luks devices as there are logical volumes.
  Split the system in many parts e.g. / /usr /tmp /home /var /var/log /var/log/audit swap
  Use "redhat" password for encryption.

2. Install Clevis+Tang and rebuild initramfs

  # yum -y install clevis-dracut
  # for i in $(seq 0 7); do clevis luks bind -f -k- -d /dev/rhel/0$i tang '{"url":"http://vm-tang7","thp":"txJOw9zhjTgEwcNnlRN-NGsv3hU"}' <<< "redhat"; done
  # dracut -f

3. Add all luks1 devices to kernel command line

  # awk '{ print "rd.luks.uuid="$1 }' /etc/crypttab
  rd.luks.uuid=luks-569a3783-4ec6-4bd0-a122-d6ca32ff8c23
  rd.luks.uuid=luks-ba80a2b3-dd86-4321-8a51-bf65a94c7c9c
  ...

  Edit /etc/default/grub to
  - add all devices to GRUB_CMDLINE_LINUX=
  - replace all "rd.lvm.lv" occurrences by a single "rd.lvm.vg=rhel" (rhel == VG)

  # grub2-mkconfig -o /etc/grub2.cfg
  
4. Reboot the system

Actual results:

dracut enters Emergency journalctl shows systemd-cryptsetup dying:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
\x2d9e85\x2dd2af2e550912.service: Main process exited, code=dumped, status=6/ABRT
Oct 09 09:29:01 vm-clevis8-luksv1 systemd[1]: systemd-cryptsetup@luks\x2d4243c2be\x2dad56\x2d43ed\x2d9e85\x2dd2af2e550912.service: Failed with result 'core-dump'.
Oct 09 09:29:02 vm-clevis8-luksv1 systemd-cryptsetup[873]: malloc(): memory corruption
...
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Entering passphrase manually fails. System needs to be rebooted and Tang server disconnected.

Expected results:

Automatic unlocking

Comment 3 Renaud Métrich 2019-10-09 15:11:05 UTC

*** This bug has been marked as a duplicate of bug 1752050 ***