Created attachment 333203 [details] Reproducer for the pselect issue [root@void ~]# uname -r 2.6.24.7-103.el5rt WARNING: at kernel/hrtimer.c:439 hrtimer_reprogram() Pid: 666, comm: test_pselect Not tainted 2.6.24.7-103.el5rt #1 Call Trace: [<ffffffff8128818c>] ? rt_spin_lock_slowlock+0x226/0x24c [<ffffffff81054f35>] hrtimer_reprogram+0x60/0xb2 [<ffffffff81054fe6>] enqueue_hrtimer+0x5f/0xe8 [<ffffffff81055a01>] hrtimer_start+0x111/0x17f [<ffffffff8128714e>] schedule_hrtimeout+0x9e/0xeb [<ffffffff81055314>] ? hrtimer_wakeup+0x0/0x21 [<ffffffff8128714e>] ? schedule_hrtimeout+0x9e/0xeb [<ffffffff810be499>] do_select+0x4bf/0x52d [<ffffffff810be95b>] ? __pollwait+0x0/0xdf [<ffffffff810346fc>] ? default_wake_function+0x0/0x11 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11 [<ffffffff8107fb76>] ? __rcu_read_unlock+0x5a/0x5c [<ffffffff81085013>] ? find_get_page+0x161/0x173 [<ffffffff8103470b>] ? default_wake_function+0xf/0x11 [<ffffffff810304fe>] ? __wake_up_common+0x41/0x74 [<ffffffff81085399>] ? find_lock_page+0x1e/0x5d [<ffffffff81087561>] ? filemap_fault+0x1fd/0x399 [<ffffffff810be6cd>] core_sys_select+0x1c6/0x275 [<ffffffff8105ef91>] ? __rt_mutex_adjust_prio+0x11/0x24 [<ffffffff8105f896>] ? rt_mutex_adjust_prio+0x35/0x3e [<ffffffff8128779b>] ? rt_read_slowunlock+0x473/0x4ac [<ffffffff8106047b>] ? rt_mutex_up_read+0x9d/0xa1 [<ffffffff81060b11>] ? rt_up_read+0x9/0xb [<ffffffff810be837>] sys_pselect7+0xbb/0x139 [<ffffffff810b112d>] ? vfs_write+0x13b/0x170 [<ffffffff810bebcd>] sys_pselect6+0x5d/0x6a [<ffffffff8100c22e>] system_call_ret+0x0/0x5
Created attachment 333325 [details] Possible fix for the NULL timeout handling in sys_pselect7() This patch has been added to CVS (probably -105) for testing and the commit log says: commit 62568510b8e2679cbc331d7de10ea9ba81ae8b3d Author: Bernd Schmidt <bernds_cb1> Date: Tue Jan 13 22:14:48 2009 +0100 Fix timeouts in sys_pselect7 Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've seen occasional 5-second hangs from telnet. telnetd calls select with a NULL timeout, but with the new kernel, the system call occasionally returns 0, which causes telnet to call sleep (5). This did not happen with earlier kernels. The code in sys_pselect7 looks a bit strange, in particular the variable "to" is initialized to NULL, then changed if a non-null timeout was passed in, but not used further. It needs to be passed to core_sys_select instead of &end_time. This bug was introduced by 8ff3e8e85fa6c312051134b3953e397feb639f51 ("select: switch select() and poll() over to hrtimers"). Signed-off-by: Bernd Schmidt <bernd.schmidt>
The patch above fixed both the behavior and the backtrace in pselect7(). Before applying the patch, everytime the timeout was NULL one would notice: a) a backtrace in dmesg and b) that pselect that should block until one event occurred (in this example a key press) but was being released instantly, as if the timeout had passed. Now it works: [root@void ~]# dmesg -c <cut the long messages> [root@void ~]# /tmp/test_pselect Entering the test loop... <keypress> Out of the test loop... [root@void ~]# /tmp/test_pselect Entering the test loop... <keypress> Out of the test loop... [root@void ~]# dmesg [root@void ~]#
Found upstream commit 62568510b8e2679cbc331d7de10ea9ba81ae8b3d as mrg-rt-v1.git commit a595346a5640b2f11e3f2f0257274ad6da23333b implemented in 2.6.24.7-107 Verified by code review.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0360.html