466167 – RHEL5.3: posix-timers race condition causes timer to seize up

Bug 466167 - RHEL5.3: posix-timers race condition causes timer to seize up

Summary: RHEL5.3: posix-timers race condition causes timer to seize up

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Mark McLoughlin
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	469724
TreeView+	depends on / blocked

Reported:	2008-10-08 20:14 UTC by Mark McLoughlin
Modified:	2009-01-20 20:15 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-01-20 20:15:59 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
test-posix-timer-sigwait.c (57 bytes, text/plain) 2008-10-08 20:14 UTC, Mark McLoughlin	no flags	Details
posix-timers-race-condition.patch (3.19 KB, patch) 2008-10-08 20:27 UTC, Mark McLoughlin	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:0225	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update	2009-01-20 16:06:24 UTC

Description Mark McLoughlin 2008-10-08 20:14:20 UTC

Created attachment 319794 [details]
test-posix-timer-sigwait.c

RHEL5.3 posix-timers has a race condition which can cause the timer to effectively seize up if the SIGALRM is being collected at the same time as the timer fires.

This can cause KVM networking to stop working.

The fix went upstream here:
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba661292a2bc6ddd305a212b0526e5dc22195fe7

Also went into 2.6.26.2 and 2.6.25.16.

Bug originally reported here:

  http://lkml.org/lkml/2008/7/16/217

  When a timer fires, posix_timer_event() zeroes out its
  pre-allocated siginfo structure, initialises it and then
  queues up the signal with send_sigqueue().

  However, we may have previously queued up this signal, in
  which case we only want to increment si_overrun and
  re-initialising the siginfo structure is incorrect.

  Also, since we are modifying an already queued signal
  without the protection of the sighand spinlock, we may also
  race with e.g. collect_signal() causing it to fail to find
  a signal on the pending list because it happens to look at
  the siginfo struct after it was zeroed and before it was
  re-initialised.

  The race was observed with a modified kvm-userspace when
  running a guest under heavy network load. When it occurs,
  KVM never sees another SIGALRM signal because although
  the signal is queued up the appropriate bit is never set
  in the pending mask. Manually sending the process a SIGALRM
  kicks it out of this state.

  The fix is simple - only modify the pre-allocated sigqueue
  once we're sure that it hasn't already been queued.

Also attaching a small test case program that can be used to try and reproduce the issue on RHEL5 kernels.

Comment 1 Mark McLoughlin 2008-10-08 20:27:01 UTC

Created attachment 319797 [details]
posix-timers-race-condition.patch

Comment 4 RHEL Program Management 2008-10-09 11:57:08 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Don Zickus 2008-10-20 15:13:46 UTC

in kernel-2.6.18-120.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 10 errata-xmlrpc 2009-01-20 20:15:59 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.