Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1212993 - ksh login crash on disk full
ksh login crash on disk full
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ksh (Show other bugs)
7.1
All All
urgent Severity high
: rc
: ---
Assigned To: Michal Hlavinka
BaseOS QE - Apps
: ZStream
Depends On: 1212992 1212994
Blocks: 1133060 1227420
  Show dependency treegraph
 
Reported: 2015-04-17 16:53 EDT by Paulo Andrade
Modified: 2015-11-20 05:29 EST (History)
6 users (show)

See Also:
Fixed In Version: ksh-20120801-24.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1212992
: 1227420 (view as bug list)
Environment:
Last Closed: 2015-11-20 05:29:56 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ksh-20120801-nodiskspace.patch (869 bytes, patch)
2015-04-17 16:55 EDT, Paulo Andrade
no flags Details | Diff
ksh-20120801-nodiskspace.patch (883 bytes, patch)
2015-05-08 17:32 EDT, Paulo Andrade
no flags Details | Diff

  None (edit)
Description Paulo Andrade 2015-04-17 16:53:26 EDT
+++ This bug was initially created as a clone of Bug #1212992 +++

Quoting upstream bug report at:
http://lists.research.att.com/pipermail/ast-users/2015q2/004751.html

"""
I have a user with a ksh crashing problem, and that has
some "Write error: No space left on device" messages
in /var/log/messages.

After some debugging, and creating a chroot on a file
disk image, and a test user, and slowly filling the
"on file" filesystem, e.g.

dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024
dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2

until leaving just around 12K, I managed to reproduce the
problem, and be able to debug it with valgrind and vgdb;
debugging on these conditions is tricky, as cannot tell
valgrind to spawn gdb, because then gdb itself would fail
to start.

So, after following the code enough, I learned that at places
it handles SH_JMPEXIT, there was almost non existing
handling of SH_JMPERREXIT.

ksh would evently cause a crash due to the struct
subshell allocated on stack, in sh/subshell.c:sh_subshell
kept set to the global subshell_data, after it siglongjmp
back the stack due to, not fully handling the out of disk
space errors. It would print a few messages, everytime
a pipe was created, e.g.:

/etc/profile: line 28: write to 3 failed [No space left on device]

until eventually crashing due to corrupted memory; e.g. the
references to stack data from sh_subsell in the global
subshell_data. One strange thing to me in coredump analysis
was that subshell_data prev field was pointing to itself when
it eventually crashed, what later was understood and expected...

The attached patch handles SH_JMPERREXIT in the code
paths SH_JMPEXIT is handled, and the failed login, on
full disk, ends in a pause() call:

---terminal 1---
$ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full
--vgdb-error=0 /bin/ksh -l
==17730== (action at startup) vgdb me ...
==17730==
==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==17730==   /path/to/gdb /bin/ksh
==17730== and then give GDB the following command
==17730==   target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730
==17730== --pid is optional if only one valgrind process is running
==17730==
==17730== Syscall param mount(type) points to unaddressable byte(s)
==17730==    at 0x563377A: mount (in /usr/lib64/libc-2.17.so)
==17730==    by 0x493E58: fs3d_mount (fs3d.c:115)
==17730==    by 0x493C8B: fs3d (fs3d.c:57)
==17730==    by 0x423E41: sh_init (init.c:1302)
==17730==    by 0x405CD3: sh_main (main.c:141)
==17730==    by 0x405B84: main (pmain.c:45)
==17730==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17730==
==17730== (action on error) vgdb me ...
==17730== Continuing ...
/etc/profile: line 28: write to 3 failed [No space left on device]
---8<---

---terminal 2---
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
(gdb) bt
#0  0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
#1  0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665
#2  0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604
#3  0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0,
file=0x524804 <e_sysprofile> "/etc/profile")
    at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109
#4  0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0)
at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202
#5  0x0000000000405b85 in main (argc=2, argv=0xfff000498) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45
(gdb)
---8<---
"""
Comment 1 Paulo Andrade 2015-04-17 16:55:33 EDT
Created attachment 1015755 [details]
ksh-20120801-nodiskspace.patch
Comment 3 Paulo Andrade 2015-05-08 17:32:31 EDT
Created attachment 1023599 [details]
ksh-20120801-nodiskspace.patch

I think it may be better to pass 0 to sh_done, because otherwise
it would pass 255 as an argument to sigaction.

Passing 0 would cause it to exit immediately, like a
normal shell execution.

I just noticed the issue while debugging a problem
in tcsh, and ksh would fallback in the pause(), instead
of exiting, if ^C was pressed during:

eval sleep 10
Comment 8 Jan Kurik 2015-11-20 05:29:56 EST
This bug has been closed as CURRENTRELEASE due to delivery of the fix in a z-stream. As the component is not on ACL, the fix is currently included in y-stream as well.

For more information please see the zstream process documentation:
* https://engineering.redhat.com/trac/ZStream/attachment/wiki/WikiStart/Z-Stream_process_update_4.odp .

Note You need to log in before you can comment on or make changes to this bug.