1212992 – ksh login crash on disk full

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1212992 - ksh login crash on disk full

Summary: ksh login crash on disk full

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	ksh
Sub Component:
Version:	6.6
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Hlavinka
QA Contact:	Martin Kyral
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1075802 1172231 1212993 1212994 1297319
TreeView+	depends on / blocked

Reported:	2015-04-17 20:51 UTC by Paulo Andrade
Modified:	2023-08-21 15:05 UTC (History)
CC List:	14 users (show)
Fixed In Version:	ksh-20120801-33.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1212993 1212994 1297319 (view as bug list)
Environment:
Last Closed:	2016-05-11 00:45:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
ksh-20120801-nodiskspace.patch (869 bytes, patch) 2015-04-17 20:54 UTC, Paulo Andrade	no flags	Details \| Diff
write.c (470 bytes, text/plain) 2015-04-20 17:19 UTC, Paulo Andrade	no flags	Details
ksh-20120801-nodiskspace.patch (883 bytes, patch) 2015-05-08 21:30 UTC, Paulo Andrade	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0932	0	normal	SHIPPED_LIVE	ksh bug fix update	2016-05-10 22:54:18 UTC

Description Paulo Andrade 2015-04-17 20:51:47 UTC

Quoting upstream bug report at:
http://lists.research.att.com/pipermail/ast-users/2015q2/004751.html

"""
I have a user with a ksh crashing problem, and that has
some "Write error: No space left on device" messages
in /var/log/messages.

After some debugging, and creating a chroot on a file
disk image, and a test user, and slowly filling the
"on file" filesystem, e.g.

dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024
dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2

until leaving just around 12K, I managed to reproduce the
problem, and be able to debug it with valgrind and vgdb;
debugging on these conditions is tricky, as cannot tell
valgrind to spawn gdb, because then gdb itself would fail
to start.

So, after following the code enough, I learned that at places
it handles SH_JMPEXIT, there was almost non existing
handling of SH_JMPERREXIT.

ksh would evently cause a crash due to the struct
subshell allocated on stack, in sh/subshell.c:sh_subshell
kept set to the global subshell_data, after it siglongjmp
back the stack due to, not fully handling the out of disk
space errors. It would print a few messages, everytime
a pipe was created, e.g.:

/etc/profile: line 28: write to 3 failed [No space left on device]

until eventually crashing due to corrupted memory; e.g. the
references to stack data from sh_subsell in the global
subshell_data. One strange thing to me in coredump analysis
was that subshell_data prev field was pointing to itself when
it eventually crashed, what later was understood and expected...

The attached patch handles SH_JMPERREXIT in the code
paths SH_JMPEXIT is handled, and the failed login, on
full disk, ends in a pause() call:

---terminal 1---
$ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full
--vgdb-error=0 /bin/ksh -l
==17730== (action at startup) vgdb me ...
==17730==
==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==17730==   /path/to/gdb /bin/ksh
==17730== and then give GDB the following command
==17730==   target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730
==17730== --pid is optional if only one valgrind process is running
==17730==
==17730== Syscall param mount(type) points to unaddressable byte(s)
==17730==    at 0x563377A: mount (in /usr/lib64/libc-2.17.so)
==17730==    by 0x493E58: fs3d_mount (fs3d.c:115)
==17730==    by 0x493C8B: fs3d (fs3d.c:57)
==17730==    by 0x423E41: sh_init (init.c:1302)
==17730==    by 0x405CD3: sh_main (main.c:141)
==17730==    by 0x405B84: main (pmain.c:45)
==17730==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17730==
==17730== (action on error) vgdb me ...
==17730== Continuing ...
/etc/profile: line 28: write to 3 failed [No space left on device]
---8<---

---terminal 2---
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
(gdb) bt
#0  0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
#1  0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665
#2  0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604
#3  0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0,
file=0x524804 <e_sysprofile> "/etc/profile")
    at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109
#4  0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0)
at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202
#5  0x0000000000405b85 in main (argc=2, argv=0xfff000498) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45
(gdb)
---8<---
"""

Comment 1 Paulo Andrade 2015-04-17 20:54:58 UTC

Created attachment 1015754 [details]
ksh-20120801-nodiskspace.patch

Comment 3 Paulo Andrade 2015-04-20 17:19:45 UTC

Created attachment 1016463 [details]
write.c

  Attached is the best test case I could come to so far,
other than creating a chroot on some kind of (loop)
filesystem and leaving only a few kb free.
Maybe it is possible to somehow get one without needing
LD_PRELOAD, and using /dev/full somehow.

  To test:

$ gcc -fPIC  -shared write.c -o write.so -ldl -D_GNU_SOURCE=1
$ LD_PRELOAD=$PWD/write.so ksh -l
/etc/profile: line 45: write to 3 failed [No space left on device]
Memory fault(coredump)

If using the proposed patch, it will hang in a pause call:

$ LD_PRELOAD=$PWD/write.so /home/pcpa/rhel/ksh/ksh-20120801/arch/linux.i386-64/bin/ksh -l
/etc/profile: line 45: write to 3 failed [No space left on device]
^C$

and after ^C it returns to the main shell.

Comment 5 Michal Hlavinka 2015-05-06 08:26:58 UTC

reproducible

Comment 6 Paulo Andrade 2015-05-08 21:30:53 UTC

Created attachment 1023598 [details]
ksh-20120801-nodiskspace.patch

I think it may be better to pass 0 to sh_done, because otherwise
it would pass 255 as an argument to sigaction.

Passing 0 would cause it to exit immediately, like a
normal shell execution.

I just noticed the issue while debugging a problem
in tcsh, and ksh would fallback in the pause(), instead
of exiting, if ^C was pressed during:

eval sleep 10

Comment 24 Michal Hlavinka 2016-01-26 15:28:12 UTC

I tried to reproduce this in a loop, but was not successful. What are your steps for reproducing this?

Comment 29 errata-xmlrpc 2016-05-11 00:45:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0932.html

Note You need to log in before you can comment on or make changes to this bug.