Bug 251038

Summary: poll system call behavior during suspend/resume
Product: Red Hat Enterprise Linux 5 Reporter: Lomesh Agarwal <lomesh.agarwal>
Component: kernelAssignee: Lenny Szubowicz <lszubowi>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.0CC: chrisw
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-15 12:26:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lomesh Agarwal 2007-08-06 18:05:36 UTC
Description of problem:
poll system call retunrs with EINTR when system goes to standby.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:
it should not return. below is the email exchange with Chris -
* Agarwal, Lomesh (lomesh.agarwal) wrote:
> > Chris Wright (chrisw) wrote:
> > Yes, please.
> [Agarwal, Lomesh] how do I open a bug against RHEL?

bugzilla.redhat.com -> enter new bug
Red Hat Enterprise Linux
Version 5
Component Kernel

Feel free to put me on Cc: list.

> >   Also note, turns out the patch will not work as is.  If you 
> > specify a (non-negative) timeout value and send STOP/CONT before 
> > timeout expires the poll will restart with original timeout.  Repeat 
> > this and you can see this process never returning to userspace.

> [Agarwal, Lomesh] non-negative timeout means infinite timeout and app 
> is

Negative timeout means infinite timeout, non-negative timeout means a real 
timeout value.  IOW, if you specifiy a timeout you'd expect to return to 
userspace at the end of the timeout instead of never.

> supposed to hang until one of the file descriptor is ready for IO or a 
> signal is sent. The only behavior change due to this patch would be 
> that app won't get control due to any signal which it does not handle 
> (signal which it handles will still return the control). I don't think 
> any app writer will rely on unhandled signal to get the control back.
> Also if this is a acceptable behavior in select then why is it not 
> acceptable in poll?

Well, it's slightly more subtle than that.  Both select and ppoll have a 
timeout structure that can be rewritten with time left until timeout, so they 
can easily restart without redoing the full timeout.  If updating the timeout 
structure fails, then select/ppoll will still return early with EINTR.

>   The full solution
> is more invasive, and we'll need to re-evaluate when it's been done.

Doesn't look as bad as I thought it might.  Here's a first cut.


 fs/select.c |   34 +++++++++++++++++++++++++++++++++-
 1 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index a974082..50e6d8e 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -736,10 +736,29 @@ out_fds:
 	return err;
 }
 
+long do_restart_poll(struct restart_block *restart_block) {
+	struct pollfd __user *ufds = (struct pollfd __user*)restart_block-
>arg0;
+	int nfds = restart_block->arg1;
+	s64 timeout = ((s64)restart_block->arg3<<32) | (s64)restart_block-
>arg2;
+	int ret;
+
+	restart_block->fn = do_no_restart_syscall;
+	ret = do_sys_poll(ufds, nfds, &timeout);
+	if (ret == -EINTR) {
+		restart_block->fn = do_restart_poll;
+		restart_block->arg2 = timeout & 0xFFFFFFFF;
+		restart_block->arg3 = timeout >> 32;
+		ret = -ERESTART_RESTARTBLOCK;
+	}
+	return ret;
+}
+
 asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
 			long timeout_msecs)
 {
 	s64 timeout_jiffies;
+	int ret;
 
 	if (timeout_msecs > 0) {
 #if HZ > 1000
@@ -754,7 +773,20 @@ asmlinkage long sys_poll(struct pollfd __user *ufds, 
unsigned int nfds,
 		timeout_jiffies = timeout_msecs;
 	}
 
-	return do_sys_poll(ufds, nfds, &timeout_jiffies);
+	ret = do_sys_poll(ufds, nfds, &timeout_jiffies);
+	if (ret == -EINTR) {
+		if (timeout_msecs > 0) {
+			struct restart_block *restart_block;
+			restart_block = &current_thread_info()->restart_block;
+			restart_block->fn = do_restart_poll;
+			restart_block->arg0 = (unsigned long)ufds;
+			restart_block->arg1 = nfds;
+			restart_block->arg2 = timeout_jiffies & 0xFFFFFFFF;
+			restart_block->arg3 = timeout_jiffies >> 32;
+			ret = -ERESTART_RESTARTBLOCK;
+		}
+	}
+	return ret;
 }
 
 #ifdef TIF_RESTORE_SIGMASK



Additional info:

Comment 2 Lenny Szubowicz 2013-08-15 12:26:30 UTC
Unfortunately, given the age of the problem report, it's relatively low severity, and the lifecycle stage that RHEL 5 is in, this problem is not going to be fixed in RHEL 5.

                                  -Lenny.