Created attachment 319794 [details] test-posix-timer-sigwait.c RHEL5.3 posix-timers has a race condition which can cause the timer to effectively seize up if the SIGALRM is being collected at the same time as the timer fires. This can cause KVM networking to stop working. The fix went upstream here: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba661292a2bc6ddd305a212b0526e5dc22195fe7 Also went into 2.6.26.2 and 2.6.25.16. Bug originally reported here: http://lkml.org/lkml/2008/7/16/217 When a timer fires, posix_timer_event() zeroes out its pre-allocated siginfo structure, initialises it and then queues up the signal with send_sigqueue(). However, we may have previously queued up this signal, in which case we only want to increment si_overrun and re-initialising the siginfo structure is incorrect. Also, since we are modifying an already queued signal without the protection of the sighand spinlock, we may also race with e.g. collect_signal() causing it to fail to find a signal on the pending list because it happens to look at the siginfo struct after it was zeroed and before it was re-initialised. The race was observed with a modified kvm-userspace when running a guest under heavy network load. When it occurs, KVM never sees another SIGALRM signal because although the signal is queued up the appropriate bit is never set in the pending mask. Manually sending the process a SIGALRM kicks it out of this state. The fix is simple - only modify the pre-allocated sigqueue once we're sure that it hasn't already been queued. Also attaching a small test case program that can be used to try and reproduce the issue on RHEL5 kernels.
Created attachment 319797 [details] posix-timers-race-condition.patch
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-120.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html