Bug 1734553 - RHEL-8: blkid_get_dev() leak file descriptor to underlying block device
Summary: RHEL-8: blkid_get_dev() leak file descriptor to underlying block device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: util-linux
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 8.1
Assignee: Karel Zak
QA Contact: Radka Skvarilova
URL:
Whiteboard:
Depends On: 1734545
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-30 20:42 UTC by Dave Wysochanski
Modified: 2019-11-05 22:27 UTC (History)
4 users (show)

Fixed In Version: util-linux-2.32.1-13.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1734545
Environment:
Last Closed: 2019-11-05 22:26:57 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
simple C program demonstrating descriptor leak in libblkid when blkid_get_dev is called with devname=<somepartition> (2.45 KB, text/x-csrc)
2019-07-30 20:48 UTC, Dave Wysochanski
no flags Details
the nfs-server test that originally uncovered the problem (536 bytes, text/plain)
2019-07-30 20:49 UTC, Dave Wysochanski
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4312881 Troubleshoot None rpc.mountd may leave open block device after 'exportfs -u' due to libblkd bug involving leaked file descriptor with part... 2019-07-30 20:59:30 UTC
Red Hat Product Errata RHBA-2019:3603 None None None 2019-11-05 22:27:06 UTC

Internal Links: 1734545

Description Dave Wysochanski 2019-07-30 20:42:00 UTC
+++ This bug was initially created as a clone of Bug #1734545 +++

Description of problem:
We had a customer report an issue with the nfs server's rpc.mountd where it would retain a reference to an underlying block device even after a filesystem was unexported and no one was using it, and the only way to release it was to restart rpc.mountd.  The block device had one partition and on top of that was a filesystem.  The NFS reproducer is fairly simple but as it turns out we can reproduce this with a simple C program calling into libblkid and show it will leak a reference.

This bug appears to be in at least:
RHEL7 (libblkid-2.23.2-59.el7.x86_64, nfs-utils-1.3.0-0.61.el7.x86_64
RHEL8 (nfs-utils-2.3.3-14.el8.x86_64
f30 (libblkid-2.33.2-2.fc30.x86_64, nfs-utils-2.4.1-0.fc30.x86_64)

But is not in RHEL6 (libblkid-2.17.2-12.28.el6_9.2.x86_64, nfs-utils-1.2.3-75.el6_9.x86_64)


Version-Release number of selected component (if applicable):
libblkid-2.23.2-59.el7.x86_64 or above

How reproducible:
Everytime with reproducer

Steps to Reproduce:
util-linux / libblkid
1. Create a block device with one partition on it, then a filesystem on that (or just use /boot)
2. Run attached C program to show the file descriptor of the underlying block device remains open / is leaked
Example:
./blkid-test /boot
calling blkid_get_dev(cache, devname=/dev/vda1, BLKID_DEV_NORMAL)
BUG - file descriptor leak - before=6 after=7
uuid = e2b3e658-cc82-4fad-9ac2-2cf384ecc49e
# gdb ./blkid-test
(gdb) 
98              uuid = get_uuid_blkdev(argv[1]);
99              after = get_num_fds();
100             // printf("AFTER get_uuid_blkdev num_fds = %d\n", after);
101             if (before != after)
102                     printf("BUG - file descriptor leak - before=%d after=%d\n",
103                             before, after);
104             if (!uuid) {
105                     printf("ERROR: uuid = NULL\n");
106                     exit(1);
107             }
(gdb) b 98
Breakpoint 1 at 0x401455: file blkid-test.c, line 98.
(gdb) b 101
Breakpoint 2 at 0x401479: file blkid-test.c, line 101.
(gdb) run /boot
Starting program: /root/blkid-test /boot
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments

Breakpoint 1, main (argc=2, argv=0x7fffffffe478) at blkid-test.c:98
98              uuid = get_uuid_blkdev(argv[1]);
(gdb) 
[1]+  Stopped                 gdb ./blkid-test
# ps
  PID TTY          TIME CMD
 8870 pts/1    00:00:01 bash
13406 pts/1    00:00:01 gdb
13408 pts/1    00:00:00 blkid-test
13412 pts/1    00:00:00 ps
# ls -lh /proc/13408/fd
total 0
lrwx------. 1 root root 64 Jul 30 15:52 0 -> /dev/pts/1
lrwx------. 1 root root 64 Jul 30 15:52 1 -> /dev/pts/1
lrwx------. 1 root root 64 Jul 30 15:52 2 -> /dev/pts/1
# fg
gdb ./blkid-test
c
c
Continuing.
calling blkid_get_dev(cache, devname=/dev/vda1, BLKID_DEV_NORMAL)

Breakpoint 2, main (argc=2, argv=0x7fffffffe478) at blkid-test.c:101
101             if (before != after)
(gdb) 
[1]+  Stopped                 gdb ./blkid-test
# ls -lh /proc/13408/fd
total 0
lrwx------. 1 root root 64 Jul 30 15:52 0 -> /dev/pts/1
lrwx------. 1 root root 64 Jul 30 15:52 1 -> /dev/pts/1
lrwx------. 1 root root 64 Jul 30 15:52 2 -> /dev/pts/1
lr-x------. 1 root root 64 Jul 30 15:52 4 -> /dev/vda


nfs-server reproducer (calls into libblkid with same calls as above)
1. Run attached mountd-test.sh (for NFS server reproducer)


Actual results:
reference to underlying block device remains open after following call is made:
blkid_get_dev(cache, devname=/dev/vda1, BLKID_DEV_NORMAL)


Expected results:
no reference to underlying block device after the following call is made:
blkid_get_dev(cache, devname=/dev/vda1, BLKID_DEV_NORMAL)


Additional info:
If for some reason I've misunderstood and this is somehow misuse of the interface by nfs-utils (rpc.mountd) please let us know.

NOTE: on fc30, the blkid.h file is inside libblkid-debugsource, and I did this to compile / link:
ln -s /usr/lib64/libblkid.so.1 /usr/lib64/libblkid.so
gcc -g -o blkid-test -I/usr/src/debug/util-linux-2.33.2-2.fc30.x86_64/libblkid/src blkid-test.c -lblkid


On RHEL6/RHEL7, there is libblkid-devel and I did this:
ln -s /usr/lib64/libblkid.so.1 /usr/lib64/libblkid.so
gcc -g -o blkid-test -I/usr/include/blkid blkid-test.c -lblkid

In all instances you should be able to repro easily with:
# ./blkid-test /boot

If the bug exists, we see:
calling blkid_get_dev(cache, devname=/dev/vda1, BLKID_DEV_NORMAL)
BUG - file descriptor leak - before=6 after=7
uuid = 3bcb7235-6b3b-494f-ad49-c4a0e18d67b6

If it does not, there is no "BUG" line above, as on RHEL6:
calling blkid_get_dev(cache, devname=/dev/vda1, BLKID_DEV_NORMAL)
uuid = e2107ad2-f3f5-4704-94f3-2fe7606ee947

--- Additional comment from Dave Wysochanski on 2019-07-30 20:20 UTC ---


RHEL8 build:
debuginfo-install libblkid
ln -s /usr/lib64/libblkid.so.1 /usr/lib64/libblkid.so
gcc -g -o blkid-test -I/usr/src/debug/util-linux-2.32.1-11.el8.x86_64/libblkid/src/ blkid-test.c -lblkid

Comment 1 Dave Wysochanski 2019-07-30 20:48:48 UTC
Created attachment 1594830 [details]
simple C program demonstrating descriptor leak in libblkid when blkid_get_dev is called with devname=<somepartition>

Comment 2 Dave Wysochanski 2019-07-30 20:49:14 UTC
Created attachment 1594831 [details]
the nfs-server test that originally uncovered the problem

Comment 7 errata-xmlrpc 2019-11-05 22:26:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3603


Note You need to log in before you can comment on or make changes to this bug.