251038 – poll system call behavior during suspend/resume

Bug 251038 - poll system call behavior during suspend/resume

Summary: poll system call behavior during suspend/resume

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.0
Hardware:	i686
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Lenny Szubowicz
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-08-06 18:05 UTC by Lomesh Agarwal
Modified:	2013-08-15 12:26 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-08-15 12:26:30 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Lomesh Agarwal 2007-08-06 18:05:36 UTC

Description of problem:
poll system call retunrs with EINTR when system goes to standby.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:
it should not return. below is the email exchange with Chris -
* Agarwal, Lomesh (lomesh.agarwal) wrote:
> > Chris Wright (chrisw) wrote:
> > Yes, please.
> [Agarwal, Lomesh] how do I open a bug against RHEL?

bugzilla.redhat.com -> enter new bug
Red Hat Enterprise Linux
Version 5
Component Kernel

Feel free to put me on Cc: list.

> >   Also note, turns out the patch will not work as is.  If you 
> > specify a (non-negative) timeout value and send STOP/CONT before 
> > timeout expires the poll will restart with original timeout.  Repeat 
> > this and you can see this process never returning to userspace.

> [Agarwal, Lomesh] non-negative timeout means infinite timeout and app 
> is

Negative timeout means infinite timeout, non-negative timeout means a real 
timeout value.  IOW, if you specifiy a timeout you'd expect to return to 
userspace at the end of the timeout instead of never.

> supposed to hang until one of the file descriptor is ready for IO or a 
> signal is sent. The only behavior change due to this patch would be 
> that app won't get control due to any signal which it does not handle 
> (signal which it handles will still return the control). I don't think 
> any app writer will rely on unhandled signal to get the control back.
> Also if this is a acceptable behavior in select then why is it not 
> acceptable in poll?

Well, it's slightly more subtle than that.  Both select and ppoll have a 
timeout structure that can be rewritten with time left until timeout, so they 
can easily restart without redoing the full timeout.  If updating the timeout 
structure fails, then select/ppoll will still return early with EINTR.

>   The full solution
> is more invasive, and we'll need to re-evaluate when it's been done.

Doesn't look as bad as I thought it might.  Here's a first cut.


 fs/select.c |   34 +++++++++++++++++++++++++++++++++-
 1 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index a974082..50e6d8e 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -736,10 +736,29 @@ out_fds:
 	return err;
 }
 
+long do_restart_poll(struct restart_block *restart_block) {
+	struct pollfd __user *ufds = (struct pollfd __user*)restart_block-
>arg0;
+	int nfds = restart_block->arg1;
+	s64 timeout = ((s64)restart_block->arg3<<32) | (s64)restart_block-
>arg2;
+	int ret;
+
+	restart_block->fn = do_no_restart_syscall;
+	ret = do_sys_poll(ufds, nfds, &timeout);
+	if (ret == -EINTR) {
+		restart_block->fn = do_restart_poll;
+		restart_block->arg2 = timeout & 0xFFFFFFFF;
+		restart_block->arg3 = timeout >> 32;
+		ret = -ERESTART_RESTARTBLOCK;
+	}
+	return ret;
+}
+
 asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
 			long timeout_msecs)
 {
 	s64 timeout_jiffies;
+	int ret;
 
 	if (timeout_msecs > 0) {
 #if HZ > 1000
@@ -754,7 +773,20 @@ asmlinkage long sys_poll(struct pollfd __user *ufds, 
unsigned int nfds,
 		timeout_jiffies = timeout_msecs;
 	}
 
-	return do_sys_poll(ufds, nfds, &timeout_jiffies);
+	ret = do_sys_poll(ufds, nfds, &timeout_jiffies);
+	if (ret == -EINTR) {
+		if (timeout_msecs > 0) {
+			struct restart_block *restart_block;
+			restart_block = &current_thread_info()->restart_block;
+			restart_block->fn = do_restart_poll;
+			restart_block->arg0 = (unsigned long)ufds;
+			restart_block->arg1 = nfds;
+			restart_block->arg2 = timeout_jiffies & 0xFFFFFFFF;
+			restart_block->arg3 = timeout_jiffies >> 32;
+			ret = -ERESTART_RESTARTBLOCK;
+		}
+	}
+	return ret;
 }
 
 #ifdef TIF_RESTORE_SIGMASK



Additional info:

Comment 2 Lenny Szubowicz 2013-08-15 12:26:30 UTC

Unfortunately, given the age of the problem report, it's relatively low severity, and the lifecycle stage that RHEL 5 is in, this problem is not going to be fixed in RHEL 5.

                                  -Lenny.

Note You need to log in before you can comment on or make changes to this bug.