Bug 1349641

Summary: Regression in 6.8: rlogin to another machine sometimes fails, sometimes succeeds
Product: Red Hat Enterprise Linux 6 Reporter: Michael Lampe <lampe>
Component: util-linux-ngAssignee: Karel Zak <kzak>
Status: CLOSED DUPLICATE QA Contact: qe-baseos-daemons
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.8   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-23 20:34:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Lampe 2016-06-23 20:28:22 UTC
Description of problem:

Machine B allows (passwordless) rlogin from machine A.

A> rlogin B
rlogin: connection closed.
A> rlogin B
rlogin: connection closed.
A> rlogin B
rlogin: connection closed.
A> rlogin B
B> exit
logout
rlogin: connection closed.
A> 

Version-Release number of selected component (if applicable):

util-linux-ng-2.17.2-12.24.el6.x86_64

How reproducible:

Always, if you try rlogin B a couple of times in a row.

Steps to Reproduce:
1. Put machine B under full load, e.g. by running the linpack binary from Intel. B has 2 CPUs, 20 cores and 40 threads.
2. Try rlogin B a couple of times in a row, like exemplified above.

Actual results:

Some rlogins fail, some succeed.

Expected results:

All rlogins succeed.

Additional info:

- This is a 6.8 regression. It worked in 6.7, and it works again if util-linux-ng is downgraded to util-linux-ng-2.17.2-12.18.el6.x86_64 or /bin/login is replaced with the version from util-linux-ng-2.17.2-12.18.el6.x86_64.

- /var/log/messages on B

Jun 23 22:01:26 B xinetd[16563]: START: login pid=16905 from=192.168.22.100
Jun 23 22:01:26 B xinetd[16563]: EXIT: login status=0 pid=16905 duration=0(sec)
Jun 23 22:01:28 B xinetd[16563]: START: login pid=16907 from=192.168.22.100
Jun 23 22:01:28 B xinetd[16563]: EXIT: login status=0 pid=16907 duration=0(sec)
Jun 23 22:01:29 B xinetd[16563]: START: login pid=16909 from=192.168.22.100
Jun 23 22:01:29 B xinetd[16563]: EXIT: login status=0 pid=16909 duration=0(sec)
Jun 23 22:01:30 B xinetd[16563]: START: login pid=16911 from=192.168.22.100
Jun 23 22:03:25 B xinetd[16563]: EXIT: login status=0 pid=16911 duration=8(sec)

That's the three failures followed by one success from the example above.

- running xinetd with '-d' on B

* Failed attempt

16/6/23@22:07:06: DEBUG: 17016 {main_loop} select returned 1
16/6/23@22:07:06: DEBUG: 17016 {server_start} Starting service login
16/6/23@22:07:06: DEBUG: 17016 {main_loop} active_services = 2
16/6/23@22:07:06: DEBUG: 17021 {exec_server} duping 8
16/6/23@22:07:06: DEBUG: 17016 {main_loop} active_services = 2
16/6/23@22:07:06: DEBUG: 17016 {main_loop} select returned 1
16/6/23@22:07:06: DEBUG: 17016 {check_pipe} Got signal 17 (Child exited)
16/6/23@22:07:06: DEBUG: 17016 {child_exit} waitpid returned = 17021
16/6/23@22:07:06: DEBUG: 17016 {server_end} login server 17021 exited
16/6/23@22:07:06: DEBUG: 17016 {svc_postmortem} Checking log size of login service
16/6/23@22:07:06: INFO: 17016 {conn_free} freeing connection
16/6/23@22:07:06: DEBUG: 17016 {child_exit} waitpid returned = -1
16/6/23@22:07:06: DEBUG: 17016 {main_loop} active_services = 2

* Succesful attempt

16/6/23@22:07:56: DEBUG: 17016 {main_loop} select returned 1
16/6/23@22:07:56: DEBUG: 17016 {server_start} Starting service login
16/6/23@22:07:56: DEBUG: 17016 {main_loop} active_services = 2
16/6/23@22:07:56: DEBUG: 17024 {exec_server} duping 8

- My guess is that this is a race condition between in.rlogind and login.