Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1212992

Summary: ksh login crash on disk full
Product: Red Hat Enterprise Linux 6 Reporter: Paulo Andrade <pandrade>
Component: kshAssignee: Michal Hlavinka <mhlavink>
Status: CLOSED ERRATA QA Contact: Martin Kyral <mkyral>
Severity: high Docs Contact:
Priority: high    
Version: 6.6CC: bnater, bs168, fkrska, jkejda, jkurik, mhlavink, mkyral, ovasik, pandrade, phracek, qe-baseos-apps, syangsao, vanhoof, zpytela
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: ksh-20120801-33.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1212993 1212994 1297319 (view as bug list) Environment:
Last Closed: 2016-05-11 00:45:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1075802, 1172231, 1212993, 1212994, 1297319    
Attachments:
Description Flags
ksh-20120801-nodiskspace.patch
none
write.c
none
ksh-20120801-nodiskspace.patch none

Description Paulo Andrade 2015-04-17 20:51:47 UTC
Quoting upstream bug report at:
http://lists.research.att.com/pipermail/ast-users/2015q2/004751.html

"""
I have a user with a ksh crashing problem, and that has
some "Write error: No space left on device" messages
in /var/log/messages.

After some debugging, and creating a chroot on a file
disk image, and a test user, and slowly filling the
"on file" filesystem, e.g.

dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024
dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2

until leaving just around 12K, I managed to reproduce the
problem, and be able to debug it with valgrind and vgdb;
debugging on these conditions is tricky, as cannot tell
valgrind to spawn gdb, because then gdb itself would fail
to start.

So, after following the code enough, I learned that at places
it handles SH_JMPEXIT, there was almost non existing
handling of SH_JMPERREXIT.

ksh would evently cause a crash due to the struct
subshell allocated on stack, in sh/subshell.c:sh_subshell
kept set to the global subshell_data, after it siglongjmp
back the stack due to, not fully handling the out of disk
space errors. It would print a few messages, everytime
a pipe was created, e.g.:

/etc/profile: line 28: write to 3 failed [No space left on device]

until eventually crashing due to corrupted memory; e.g. the
references to stack data from sh_subsell in the global
subshell_data. One strange thing to me in coredump analysis
was that subshell_data prev field was pointing to itself when
it eventually crashed, what later was understood and expected...

The attached patch handles SH_JMPERREXIT in the code
paths SH_JMPEXIT is handled, and the failed login, on
full disk, ends in a pause() call:

---terminal 1---
$ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full
--vgdb-error=0 /bin/ksh -l
==17730== (action at startup) vgdb me ...
==17730==
==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==17730==   /path/to/gdb /bin/ksh
==17730== and then give GDB the following command
==17730==   target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730
==17730== --pid is optional if only one valgrind process is running
==17730==
==17730== Syscall param mount(type) points to unaddressable byte(s)
==17730==    at 0x563377A: mount (in /usr/lib64/libc-2.17.so)
==17730==    by 0x493E58: fs3d_mount (fs3d.c:115)
==17730==    by 0x493C8B: fs3d (fs3d.c:57)
==17730==    by 0x423E41: sh_init (init.c:1302)
==17730==    by 0x405CD3: sh_main (main.c:141)
==17730==    by 0x405B84: main (pmain.c:45)
==17730==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17730==
==17730== (action on error) vgdb me ...
==17730== Continuing ...
/etc/profile: line 28: write to 3 failed [No space left on device]
---8<---

---terminal 2---
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
(gdb) bt
#0  0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
#1  0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665
#2  0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604
#3  0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0,
file=0x524804 <e_sysprofile> "/etc/profile")
    at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109
#4  0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0)
at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202
#5  0x0000000000405b85 in main (argc=2, argv=0xfff000498) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45
(gdb)
---8<---
"""

Comment 1 Paulo Andrade 2015-04-17 20:54:58 UTC
Created attachment 1015754 [details]
ksh-20120801-nodiskspace.patch

Comment 3 Paulo Andrade 2015-04-20 17:19:45 UTC
Created attachment 1016463 [details]
write.c

  Attached is the best test case I could come to so far,
other than creating a chroot on some kind of (loop)
filesystem and leaving only a few kb free.
Maybe it is possible to somehow get one without needing
LD_PRELOAD, and using /dev/full somehow.

  To test:

$ gcc -fPIC  -shared write.c -o write.so -ldl -D_GNU_SOURCE=1
$ LD_PRELOAD=$PWD/write.so ksh -l
/etc/profile: line 45: write to 3 failed [No space left on device]
Memory fault(coredump)

If using the proposed patch, it will hang in a pause call:

$ LD_PRELOAD=$PWD/write.so /home/pcpa/rhel/ksh/ksh-20120801/arch/linux.i386-64/bin/ksh -l
/etc/profile: line 45: write to 3 failed [No space left on device]
^C$

and after ^C it returns to the main shell.

Comment 5 Michal Hlavinka 2015-05-06 08:26:58 UTC
reproducible

Comment 6 Paulo Andrade 2015-05-08 21:30:53 UTC
Created attachment 1023598 [details]
ksh-20120801-nodiskspace.patch

I think it may be better to pass 0 to sh_done, because otherwise
it would pass 255 as an argument to sigaction.

Passing 0 would cause it to exit immediately, like a
normal shell execution.

I just noticed the issue while debugging a problem
in tcsh, and ksh would fallback in the pause(), instead
of exiting, if ^C was pressed during:

eval sleep 10

Comment 24 Michal Hlavinka 2016-01-26 15:28:12 UTC
I tried to reproduce this in a loop, but was not successful. What are your steps for reproducing this?

Comment 29 errata-xmlrpc 2016-05-11 00:45:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0932.html