Bug 1927877

Summary: CVE-2021-27645 glibc: Use-after-free in addgetnetgrentX function in netgroupcache.c [rhel-8]
Product: Red Hat Enterprise Linux 8 Reporter: schanzle
Component: glibcAssignee: Arjun Shankar <ashankar>
Status: CLOSED ERRATA QA Contact: Sergey Kolosov <skolosov>
Severity: low Docs Contact:
Priority: low    
Version: 8.3CC: ashankar, codonell, dj, fweimer, mnewsome, pfrankli, sipoyare
Target Milestone: rcKeywords: Security, SecurityTracking, Triaged
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.28-155.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 19:28:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1932589    
Deadline: 2022-08-23   
Attachments:
Description Flags
proposed fix to double free in nscd none

Description schanzle 2021-02-11 17:34:30 UTC
This bug report is coming from a CentOS 8.3.2011 system.  I hope it will be welcomed.

Description of problem:
nscd dies about every 19 seconds for me on this Dell R740xd server.

Feb 11 06:53:12 darkstar systemd[1]: nscd.service: Main process exited, code=killed, status=6/ABRT
Feb 11 06:53:12 darkstar systemd[1]: nscd.service: Failed with result 'signal'.
Feb 11 06:53:12 darkstar systemd[1]: nscd.service: Service RestartSec=100ms expired, scheduling restart.
Feb 11 06:53:12 darkstar systemd[1]: nscd.service: Scheduled restart job, restart counter is at 1828.

Version-Release number of selected component (if applicable):
nscd-2.28-127.el8.x86_64

How reproducible:
always

Steps to Reproduce:
1. systemctl start nscd
2. monitor logs, notice it exits.

Additional info:
NIS is used with a netgroup db of about 21,030 bytes (ypcat -k netgroup).  I have numerous other systems similarly configured that do not have nscd die, perhaps the # of cores on this box (2x Xeon 8168, HT enabled, 96 total) or the RAM (1.5TB, MemTotal=1583374272 kB).

[root@darkstar ~]# gdb nscd 
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-12.el8
[snip]
Reading symbols from nscd...Reading symbols from /usr/lib/debug/usr/sbin/nscd.debug...done.
done.
(gdb) run -dF
Starting program: /usr/sbin/nscd -dF
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: Loadable section ".note.gnu.property" outside of ELF segments
[20-ish repeats of previous line deleted]
[New Thread 0x7fffdb10c700 (LWP 26641)]
[New Thread 0x7fffdaf0b700 (LWP 26642)]
[New Thread 0x7fffdad0a700 (LWP 26643)]
[New Thread 0x7fffdab09700 (LWP 26644)]
[New Thread 0x7fffda908700 (LWP 26645)]
[New Thread 0x7fffda707700 (LWP 26646)]
[New Thread 0x7fffda506700 (LWP 26647)]
[New Thread 0x7fffda305700 (LWP 26648)]
[New Thread 0x7fffda104700 (LWP 26649)]

Thread 6 "nscd" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffda908700 (LWP 26645)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	  return ret;
Missing separate debuginfos, use: yum debuginfo-install audit-libs-3.0-0.17.20191104git1c2f876.el8.x86_64 keyutils-libs-1.5.10-6.el8.x86_64 krb5-libs-1.18.2-5.el8.x86_64 libblkid-2.32.1-24.el8.x86_64 libcap-2.26-4.el8.x86_64 libcap-ng-0.7.9-5.el8.x86_64 libcom_err-1.45.6-1.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libmount-2.32.1-24.el8.x86_64 libnsl2-1.2.0-2.20180605git4a062cf.el8.x86_64 libselinux-2.9-4.el8_3.x86_64 libtirpc-1.1.4-4.el8.x86_64 libuuid-2.32.1-24.el8.x86_64 nss_nis-3.0-8.el8.x86_64 openssl-libs-1.1.1g-12.el8_3.x86_64 pcre2-10.32-2.el8.x86_64 systemd-libs-239-41.el8_3.1.x86_64 zlib-1.2.11-16.el8_2.x86_64



I have also seen:
free(): double free detected in tcache 2
Thread 6 "nscd" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffda908700 (LWP 22399)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	  return ret;

Comment 2 schanzle 2021-02-12 16:21:00 UTC
Sorry for omitting backtrace.  Easily reproduced via:  echo -e "run -dF\nbt\nquit" | gdb /usr/sbin/nscd 


#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff71b8c35 in __GI_abort () at abort.c:79
#2  0x00007ffff7211987 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff731e11d "%s\n")
    at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff7218d8c in malloc_printerr (str=str@entry=0x7ffff731fd40 "free(): double free detected in tcache 2")
    at malloc.c:5374
#4  0x00007ffff721aafd in _int_free (av=0x7fffb8000020, p=0x7fffb8000b90, have_lock=<optimized out>) at malloc.c:4213
#5  0x0000555555571927 in addinnetgrX (db=0x555555779600 <dbs+1408>, fd=-1, key=<optimized out>, uid=4294967295, 
    he=0x7fffdb10ea38, dh=0x7fffdb10e9f0, req=<optimized out>, req=<optimized out>) at netgroupcache.c:605
#6  0x0000555555571d57 in readdinnetgr (db=<optimized out>, he=<optimized out>, dh=<optimized out>)
    at netgroupcache.c:663
#7  0x0000555555567c6f in prune_cache (table=table@entry=0x555555779600 <dbs+1408>, now=<optimized out>, 
    now@entry=1613146515, fd=fd@entry=-1) at cache.c:415
#8  0x000055555555c3f7 in nscd_run_prune (p=<optimized out>) at connections.c:1555
#9  0x00007ffff7bbc14a in start_thread (arg=<optimized out>) at pthread_create.c:479
#10 0x00007ffff7293f23 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)

Comment 4 Carlos O'Donell 2021-02-16 20:05:30 UTC
Thanks for submitting this issue. The backtrace is particularly helpful in identifying a possible cause.

I notice that you're using CentOS, which is a distinct product from Fedora, CentOS Stream, or RHEL.

* Can you reproduce the issue on CentOS Stream?

* Can you reproduce the issue on RHEL 8.3?

In order to prioritize this issue we would need you to confirm that you can reproduce it on a supported release, and even better if you can get a support case attached to this bug by working with Red Hat support.

We don't normally handle CentOS defects in this tracker, instead there is a distinct tracker for that here: https://bugs.centos.org/main_page.php

Comment 6 schanzle 2021-02-24 18:34:07 UTC
Created attachment 1759135 [details]
proposed fix to double free in nscd

I am not comfortable moving this server to an alternate OS at this time (I thought CentOS Linux would be supported through 2021).

Siddhesh, thank you for the Sourceware link.  I rebuilt glibc with a the proposed untested patch, adding it as Patch999 to the spec.  It did not apply directly, likely due other context around the patch being patched, so with prepared sources, I hand-made the two code changes and generated a new diff (attached).

While it's early to tell for sure, nscd hasn't crashed for 1.5 hours, which is an improvement.

Comment 7 Siddhesh Poyarekar 2021-02-25 04:20:23 UTC
(In reply to schanzle from comment #6)
> Created attachment 1759135 [details]
> proposed fix to double free in nscd
> 
> I am not comfortable moving this server to an alternate OS at this time (I
> thought CentOS Linux would be supported through 2021).

I understand, it's just that CentOS Linux bugs are recorded and prioritized separately through a different tracker, i.e. https://bugs.centos.org/main_page.php .  Luckily Carlos was able to identify a possible cause through the backtrace and was able to easily confirm that it's a bug.

> Siddhesh, thank you for the Sourceware link.  I rebuilt glibc with a the
> proposed untested patch, adding it as Patch999 to the spec.  It did not
> apply directly, likely due other context around the patch being patched, so
> with prepared sources, I hand-made the two code changes and generated a new
> diff (attached).
> 
> While it's early to tell for sure, nscd hasn't crashed for 1.5 hours, which
> is an improvement.

Thanks for testing the patch, hopefully it fixes your use case.

Comment 10 schanzle 2021-03-04 15:31:19 UTC
> While it's early to tell for sure, nscd hasn't crashed for 1.5 hours, which is an improvement.

Status update:  Running for about a week and no crashes.

I really appreciate a working nscd.  Nightly, this server scans several NFS servers to get metadata - basically 'find -ls'.  Without nscd, the scan time of one server with 7.5 million objects increases from 45 minutes to 3hr45m - about a 5X increase.  I attribute this to slow NIS lookups of uid/gid data.  We are moving to sssd/AD, but until then, this works well enough.

Thanks for all the effort behind the scenes to make this fix available.

Comment 15 schanzle 2021-09-02 14:17:46 UTC
When will this fix be published as an update?

nscd-2.28-151.el8.x86_64 from glibc-2.28-151.el8.src.rpm still crashes.

Comment 16 Florian Weimer 2021-09-17 12:48:12 UTC
(In reply to schanzle from comment #15)
> When will this fix be published as an update?
> 
> nscd-2.28-151.el8.x86_64 from glibc-2.28-151.el8.src.rpm still crashes.

Sorry, I cannot comment publicly on future release dates.

If you need urgent assistance or a hotfix, please contact Customer Support: https://access.redhat.com/support/cases/

Thank you for your understanding.

Comment 18 errata-xmlrpc 2021-11-09 19:28:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: glibc security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4358