Bug 487382

Summary: pselect with timeout=NULL triggers a warning at kernel/hrtimer.c:439 hrtimer_reprogram()
Product: Red Hat Enterprise MRG Reporter: Luis Claudio R. Goncalves <lgoncalv>
Component: realtime-kernelAssignee: Red Hat Real Time Maintenance <rt-maint>
Status: CLOSED ERRATA QA Contact: David Sommerseth <davids>
Severity: medium Docs Contact:
Priority: low    
Version: DevelopmentCC: bhu, jburke, lgoncalv, ovasik, williams
Target Milestone: 1.1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-27 00:15:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Reproducer for the pselect issue
none
Possible fix for the NULL timeout handling in sys_pselect7() none

Description Luis Claudio R. Goncalves 2009-02-25 18:40:47 UTC
Created attachment 333203 [details]
Reproducer for the pselect issue

[root@void ~]# uname -r
2.6.24.7-103.el5rt


WARNING: at kernel/hrtimer.c:439 hrtimer_reprogram()
Pid: 666, comm: test_pselect Not tainted 2.6.24.7-103.el5rt #1

Call Trace:
 [<ffffffff8128818c>] ? rt_spin_lock_slowlock+0x226/0x24c
 [<ffffffff81054f35>] hrtimer_reprogram+0x60/0xb2
 [<ffffffff81054fe6>] enqueue_hrtimer+0x5f/0xe8
 [<ffffffff81055a01>] hrtimer_start+0x111/0x17f
 [<ffffffff8128714e>] schedule_hrtimeout+0x9e/0xeb
 [<ffffffff81055314>] ? hrtimer_wakeup+0x0/0x21
 [<ffffffff8128714e>] ? schedule_hrtimeout+0x9e/0xeb
 [<ffffffff810be499>] do_select+0x4bf/0x52d
 [<ffffffff810be95b>] ? __pollwait+0x0/0xdf
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff810346fc>] ? default_wake_function+0x0/0x11
 [<ffffffff8107fb76>] ? __rcu_read_unlock+0x5a/0x5c
 [<ffffffff81085013>] ? find_get_page+0x161/0x173
 [<ffffffff8103470b>] ? default_wake_function+0xf/0x11
 [<ffffffff810304fe>] ? __wake_up_common+0x41/0x74
 [<ffffffff81085399>] ? find_lock_page+0x1e/0x5d
 [<ffffffff81087561>] ? filemap_fault+0x1fd/0x399
 [<ffffffff810be6cd>] core_sys_select+0x1c6/0x275
 [<ffffffff8105ef91>] ? __rt_mutex_adjust_prio+0x11/0x24
 [<ffffffff8105f896>] ? rt_mutex_adjust_prio+0x35/0x3e
 [<ffffffff8128779b>] ? rt_read_slowunlock+0x473/0x4ac
 [<ffffffff8106047b>] ? rt_mutex_up_read+0x9d/0xa1
 [<ffffffff81060b11>] ? rt_up_read+0x9/0xb
 [<ffffffff810be837>] sys_pselect7+0xbb/0x139
 [<ffffffff810b112d>] ? vfs_write+0x13b/0x170
 [<ffffffff810bebcd>] sys_pselect6+0x5d/0x6a
 [<ffffffff8100c22e>] system_call_ret+0x0/0x5

Comment 1 Luis Claudio R. Goncalves 2009-02-26 13:44:55 UTC
Created attachment 333325 [details]
Possible fix for the NULL timeout handling in sys_pselect7()

This patch has been added to CVS (probably -105) for testing and the commit log says:

commit 62568510b8e2679cbc331d7de10ea9ba81ae8b3d
Author: Bernd Schmidt <bernds_cb1>
Date:   Tue Jan 13 22:14:48 2009 +0100

    Fix timeouts in sys_pselect7
    
    Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've
    seen occasional 5-second hangs from telnet.  telnetd calls select with a
    NULL timeout, but with the new kernel, the system call occasionally
    returns 0, which causes telnet to call sleep (5).  This did not happen
    with earlier kernels.
    
    The code in sys_pselect7 looks a bit strange, in particular the variable
    "to" is initialized to NULL, then changed if a non-null timeout was
    passed in, but not used further.  It needs to be passed to
    core_sys_select instead of &end_time.
    
    This bug was introduced by 8ff3e8e85fa6c312051134b3953e397feb639f51
    ("select: switch select() and poll() over to hrtimers").
    
    Signed-off-by: Bernd Schmidt <bernd.schmidt>

Comment 2 Luis Claudio R. Goncalves 2009-02-26 14:43:20 UTC
The patch above fixed both the behavior and the backtrace in pselect7().

Before applying the patch, everytime the timeout was NULL one would notice: a) a backtrace in dmesg and b) that pselect that should block until one event occurred (in this example a key press) but was being released instantly, as if the timeout had passed.

Now it works:

[root@void ~]# dmesg -c
<cut the long messages>

[root@void ~]# /tmp/test_pselect 
Entering the test loop...
<keypress>
Out of the test loop...

[root@void ~]# /tmp/test_pselect 
Entering the test loop...
<keypress>
Out of the test loop...

[root@void ~]# dmesg
[root@void ~]#

Comment 4 David Sommerseth 2009-03-24 15:30:06 UTC
Found upstream commit 62568510b8e2679cbc331d7de10ea9ba81ae8b3d as mrg-rt-v1.git commit a595346a5640b2f11e3f2f0257274ad6da23333b implemented in 2.6.24.7-107

Verified by code review.

Comment 6 errata-xmlrpc 2009-03-27 00:15:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0360.html