Bug 460645 - qdiskd isn't always closing file descriptors properly before forking
qdiskd isn't always closing file descriptors properly before forking
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
All Linux
medium Severity medium
: rc
: ---
Assigned To: Lon Hohberger
Cluster QE
Depends On:
Blocks: 456578
  Show dependency treegraph
Reported: 2008-08-29 09:06 EDT by Sean E. Millichamp
Modified: 2009-04-16 18:38 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-01-20 16:51:40 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Patch to close the file descriptors in the event of errors. (725 bytes, patch)
2008-08-29 09:06 EDT, Sean E. Millichamp
no flags Details | Diff
Adds pthread_mutex_lock()ing around the opens and fork (3.95 KB, patch)
2008-09-02 16:21 EDT, Sean E. Millichamp
no flags Details | Diff

  None (edit)
Description Sean E. Millichamp 2008-08-29 09:06:12 EDT
Created attachment 315353 [details]
Patch to close the file descriptors in the event of errors.

Description of problem:

I have attached a patch which solves this problem by closing the file descriptors which were opened to partitions in the event of an error accessing that partition in disk.c:qdisk_open().

In certain circumstances, qdiskd can encounter errors while scanning for a quorum disk label which causing file descriptors used during that scan to not be closed properly.  When qdiskd forks/execs the heuristic those open FDs are inherited by the heuristic program.

This is particularly annoying if you are using ping as your heuristic on a SELinux enabled system as you get AVC denials at every ping check filling your logs.  This is harmless as the check still can process, but severely clutters the SELinux logs making them less useful for finding real security issues.

Version-Release number of selected component (if applicable):

# rpm -q cman

How reproducible:

Every time if the conditions are proper.  To reproduce this you need:

- A host system with a partition that causes the error.  On my systems the extended partition container (/dev/sda4 and /dev/sdb4) triggered this.

- SELinux set to enforcing with the targeted policy and install-time defaults.

- A working cman configured for a quorum disk using a ping heuristic AND configured to look for a qdisk label instead of a specific block device.

Steps to Reproduce:
1. Have a SELinux enabled and enforcing system using the targeted policy.
2. Setup a working cluster with a quorum disk using /bin/ping as a heuristic and a qdisk found via label.
3. Watch /var/log/audit/auditd.log (or /var/log/messages depending on config) for SELinux denials.
Actual results:

If you manage to catch one of the ping processes long enough to list its open file descriptors you see
0 -> /dev/null
1 -> /dev/null
2 -> /dev/null
3 -> /dev/sda4
4 -> /dev/sdb4

(where /dev/sda4 and /dev/sdb4 are the extended partition type)

SELinux denials of the form are the obvious sign:

type=AVC msg=audit(1219980241.221:6801): avc:  denied  { read write } for  pid=3810 comm="ping" path="/dev/sda4" dev=tmpfs ino=1051 scontext=root:system_r:ping_t:s0 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=AVC msg=audit(1219980241.221:6801): avc:  denied  { read write } for  pid=3810 comm="ping" path="/dev/sdb4" dev=tmpfs ino=985 scontext=root:system_r:ping_t:s0 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file
type=SYSCALL msg=audit(1219980241.221:6801): arch=c000003e syscall=59 success=yes exit=0 a0=93d96b0 a1=93d9720 a2=93d8240 a3=3 items=0 ppid=1 pid=3810 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=220 com
m="ping" exe="/bin/ping" subj=root:system_r:ping_t:s0 key=(null)

Expected results:

No errors and only FDs 0-2 open to /dev/null on the fork/exec.

Additional info:
Comment 1 Lon Hohberger 2008-09-02 11:36:08 EDT
Pushed to master, STABLE2 and RHEL5 branches
Comment 2 RHEL Product and Program Management 2008-09-02 11:42:10 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 3 Sean E. Millichamp 2008-09-02 14:59:09 EDT
Actually, I just discovered that this is an incomplete fix for the problem.

There is a race condition between the heuristic forks and the open() calls in at least disc_util.c:getuptime() and main.c:update_local_status().  I just received two SELinux alerts relating to these open filehandles surviving the fork() into my ping test.

The ideal solution would be to use the O_CLOEXEC flag to open(2), but the manpage says that isn't available until Linux 2.6.23.  The only solution here might be to add a pthread_mutex_lock to synchronize the threads such that fork() can't be called until the open either fails or returns successfully and a fcntl(fd,F_SETFD,FD_CLOEXEC) can be called.
Comment 4 Sean E. Millichamp 2008-09-02 16:21:15 EDT
Created attachment 315593 [details]
Adds pthread_mutex_lock()ing around the opens and fork

I believe something like this patch is necessary to fix the race condition.  Please review it carefully as my C is quite rusty - feedback welcome.

It is hard to know for certain if this fixes the problem as it is tricky to reproduce the race condition, but I have this running on a test cluster right now and will report back if it doesn't seem to solve the problem for me.
Comment 7 errata-xmlrpc 2009-01-20 16:51:40 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.