Bug 164002 - a write to a disconnected RS-232 serial device can cause process to hang in uninterruptable state on exit()
a write to a disconnected RS-232 serial device can cause process to hang in u...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Don Howard
Brian Brock
:
Depends On:
Blocks: 143573
  Show dependency treegraph
 
Reported: 2005-07-22 15:53 EDT by Jason Vas Dias
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 14:57:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Jason Vas Dias 2005-07-22 15:53:11 EDT
Description of problem:

If a process writes to a serial RS-232 device that is switched off,
and then calls exit(), it can enter an uninterruptable state where
it cannot be kill()-ed or ptrace()-ed.

For instance mgetty ( bug 162174 ), when configured to serve eg. a serial
console on an RS-232 null modem cable connected to ttyS0, if the cable
is not connected, will be unable to exit and will prevent normal system
shutdown as its open lock file will prevent /var being unmounted.

With mgetty configured for a "direct" line on /dev/ttyS0 with 
/etc/mgetty+sendfax/mgetty.config containing:
"port ttyS0
    direct y
    toggle-dtr n
"
and /etc/inittab containing:
"
S0:234:respawn:/sbin/mgetty ttyS0
"
after mgetty has run for more than two minutes with the cable disconnected 
or the external modem switched off, it will time out and call exit(),
after it has written a login prompt to the device. Its exit() call
never completes and it cannot be killed or ptraced. 

"ps" shows it with the process name in brackets and it cannot be 
killed or traced:
# ps -ef | grep mgetty
root      5680     1  0 09:48 ttyS0    00:00:00 [mgetty]
# kill -9 5680
# gcore 5680
...
ptrace: Operation not permitted.
...

I then generated this sysrq-trigger output:

mgetty        S 00000000     0  5680      1          5824  5679 (L-TLB)
 Call Trace:   [<e004ccc8>] do_get_write_access [jbd] 0x328 (0xde6b5d74)
 [<c0124144>] schedule [kernel] 0x2f4 (0xde6b5d88)
 [<c013522c>] schedule_timeout [kernel] 0xbc (0xde6b5dcc)
 [<c015ff04>] __pte_chain_free [kernel] 0x24 (0xde6b5dec)
 [<c01b806a>] tty_wait_until_sent [kernel] 0x9a (0xde6b5e04)
 [<c01ca73c>] rs_close [kernel] 0x14c (0xde6b5e60)
 [<c01b355e>] release_dev [kernel] 0x6ce (0xde6b5e84)
 [<c0143ca1>] handle_mm_fault [kernel] 0xd1 (0xde6b5ec0)
 [<c013f2af>] free_one_pmd [kernel] 0x8f (0xde6b5eec)
 [<c013f202>] __free_pte [kernel] 0x52 (0xde6b5ef8)
 [<c01b39f2>] tty_release [kernel] 0x32 (0xde6b5f30)
 [<c016587a>] __fput [kernel] 0xea (0xde6b5f3c)
 [<c01639ee>] filp_close [kernel] 0x8e (0xde6b5f58)
 [<c012d01c>] put_files_struct [kernel] 0x6c (0xde6b5f74)
 [<c012d90a>] do_exit [kernel] 0x1ba (0xde6b5f90)
 [<c012dc6b>] do_group_exit [kernel] 0x8b (0xde6b5fac)

The kernel should not hang the process on exit in an uninterruptable state
when closing the tty device if it is not switched on. 

Can't the kernel detect if the device on the other end of the cable
is switched on / listening with the RS-232 protocol ? If so, it should
not hang the process on a close() with unwritten data to a device that
is not switched on.

When the device is switched on, then the kernel is able to write the data,
and the mgetty process is able to exit (so the kernel must have known that
it was switched off and should not have hung the process when it was trying
to do an exit).

Yes, I can probably fix this in mgetty by doing a tcflush(1,TCOFLUSH) before 
the exit(), but I think it is wrong for the kernel to prevent the exit() and
leave a process where it cannot be kill()-ed or ptrace()-ed, and where it can
potentially prevent the whole system from shutting down cleanly, just because 
it has written to a device that is disconnected - if this can be fixed in the
kernel, it should be.

Version-Release number of selected component (if applicable):

kernel-2.4.21-32.ELsmp

How reproducible:
100%

Steps to Reproduce:
Run mgetty in direct mode for a disconnected tty
  
Actual results:
After 2 minutes, mgetty cannot be killed and the system cannot be shut down
cleanly (/var cannot be unmounted).

Expected results:
mgetty should be able to exit and the system shutdown cleanly.
Comment 1 Alan Cox 2005-07-22 17:51:23 EDT
Probably needs every serial driver to be modified and someone to look at the
spec in detail about close delay behaviour (same problem with ldisc switch in 2.6)
Comment 3 RHEL Product and Program Management 2007-10-19 14:57:33 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.