Bug 487382 - pselect with timeout=NULL triggers a warning at kernel/hrtimer.c:439 hrtimer_reprogram()
Summary: pselect with timeout=NULL triggers a warning at kernel/hrtimer.c:439 hrtimer_...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: Development
Hardware: All
OS: Linux
low
medium
Target Milestone: 1.1.1
: ---
Assignee: Red Hat Real Time Maintenance
QA Contact: David Sommerseth
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-02-25 18:40 UTC by Luis Claudio R. Goncalves
Modified: 2016-05-22 23:28 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-03-27 00:15:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Reproducer for the pselect issue (574 bytes, text/plain)
2009-02-25 18:40 UTC, Luis Claudio R. Goncalves
no flags Details
Possible fix for the NULL timeout handling in sys_pselect7() (1.58 KB, patch)
2009-02-26 13:44 UTC, Luis Claudio R. Goncalves
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0360 0 normal SHIPPED_LIVE Important: kernel-rt security and bug fix update 2009-03-27 00:15:06 UTC

Description Luis Claudio R. Goncalves 2009-02-25 18:40:47 UTC
Created attachment 333203 [details]
Reproducer for the pselect issue

[root@void ~]# uname -r
2.6.24.7-103.el5rt


WARNING: at kernel/hrtimer.c:439 hrtimer_reprogram()
Pid: 666, comm: test_pselect Not tainted 2.6.24.7-103.el5rt #1

Call Trace:
 [<ffffffff8128818c>] ? rt_spin_lock_slowlock+0x226/0x24c
 [<ffffffff81054f35>] hrtimer_reprogram+0x60/0xb2
 [<ffffffff81054fe6>] enqueue_hrtimer+0x5f/0xe8
 [<ffffffff81055a01>] hrtimer_start+0x111/0x17f
 [<ffffffff8128714e>] schedule_hrtimeout+0x9e/0xeb
 [<ffffffff81055314>] ? hrtimer_wakeup+0x0/0x21
 [<ffffffff8128714e>] ? schedule_hrtimeout+0x9e/0xeb
 [<ffffffff810be499>] do_select+0x4bf/0x52d
 [<ffffffff810be95b>] ? __pollwait+0x0/0xdf
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff8107fb76>] ? __rcu_read_unlock+0x5a/0x5c
 [<ffffffff81085013>] ? find_get_page+0x161/0x173
 [<ffffffff8103470b>] ? default_wake_function+0xf/0x11
 [<ffffffff810304fe>] ? __wake_up_common+0x41/0x74
 [<ffffffff81085399>] ? find_lock_page+0x1e/0x5d
 [<ffffffff81087561>] ? filemap_fault+0x1fd/0x399
 [<ffffffff810be6cd>] core_sys_select+0x1c6/0x275
 [<ffffffff8105ef91>] ? __rt_mutex_adjust_prio+0x11/0x24
 [<ffffffff8105f896>] ? rt_mutex_adjust_prio+0x35/0x3e
 [<ffffffff8128779b>] ? rt_read_slowunlock+0x473/0x4ac
 [<ffffffff8106047b>] ? rt_mutex_up_read+0x9d/0xa1
 [<ffffffff81060b11>] ? rt_up_read+0x9/0xb
 [<ffffffff810be837>] sys_pselect7+0xbb/0x139
 [<ffffffff810b112d>] ? vfs_write+0x13b/0x170
 [<ffffffff810bebcd>] sys_pselect6+0x5d/0x6a
 [<ffffffff8100c22e>] system_call_ret+0x0/0x5

Comment 1 Luis Claudio R. Goncalves 2009-02-26 13:44:55 UTC
Created attachment 333325 [details]
Possible fix for the NULL timeout handling in sys_pselect7()

This patch has been added to CVS (probably -105) for testing and the commit log says:

commit 62568510b8e2679cbc331d7de10ea9ba81ae8b3d
Author: Bernd Schmidt <bernds_cb1>
Date:   Tue Jan 13 22:14:48 2009 +0100

    Fix timeouts in sys_pselect7
    
    Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've
    seen occasional 5-second hangs from telnet.  telnetd calls select with a
    NULL timeout, but with the new kernel, the system call occasionally
    returns 0, which causes telnet to call sleep (5).  This did not happen
    with earlier kernels.
    
    The code in sys_pselect7 looks a bit strange, in particular the variable
    "to" is initialized to NULL, then changed if a non-null timeout was
    passed in, but not used further.  It needs to be passed to
    core_sys_select instead of &end_time.
    
    This bug was introduced by 8ff3e8e85fa6c312051134b3953e397feb639f51
    ("select: switch select() and poll() over to hrtimers").
    
    Signed-off-by: Bernd Schmidt <bernd.schmidt>

Comment 2 Luis Claudio R. Goncalves 2009-02-26 14:43:20 UTC
The patch above fixed both the behavior and the backtrace in pselect7().

Before applying the patch, everytime the timeout was NULL one would notice: a) a backtrace in dmesg and b) that pselect that should block until one event occurred (in this example a key press) but was being released instantly, as if the timeout had passed.

Now it works:

[root@void ~]# dmesg -c
<cut the long messages>

[root@void ~]# /tmp/test_pselect 
Entering the test loop...
<keypress>
Out of the test loop...

[root@void ~]# /tmp/test_pselect 
Entering the test loop...
<keypress>
Out of the test loop...

[root@void ~]# dmesg
[root@void ~]#

Comment 4 David Sommerseth 2009-03-24 15:30:06 UTC
Found upstream commit 62568510b8e2679cbc331d7de10ea9ba81ae8b3d as mrg-rt-v1.git commit a595346a5640b2f11e3f2f0257274ad6da23333b implemented in 2.6.24.7-107

Verified by code review.

Comment 6 errata-xmlrpc 2009-03-27 00:15:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0360.html


Note You need to log in before you can comment on or make changes to this bug.