Bug 618743

Summary:	[RHEL6] malloc's error path deadlocks
Product:	Red Hat Enterprise Linux 6	Reporter:	Adam Jackson <ajax>
Component:	glibc	Assignee:	Andreas Schwab <schwab>
Status:	CLOSED DUPLICATE	QA Contact:	qe-baseos-tools-bugs
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	6.1	CC:	cmeadors, dgregor, drepper, ebachalo, fweimer, gholms, jakub, jburke, jkurik, jwest, mgordon, moshiro, myamazak
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	618356	Environment:
Last Closed:	2011-06-03 13:45:03 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	658636

Description Adam Jackson 2010-07-27 15:57:02 UTC

+++ This bug was initially created as a clone of Bug #618356 +++

Description of problem:
When launching emacs. I periodically get a "hang" crash. The system gui is hung, I can't do anything. I am able to ssh into the system.

Version-Release number of selected component (if applicable):
xorg-x11-server-Xorg-1.7.7-21.el6.x86_64

How reproducible:
Intermittant

Steps to Reproduce:
1. Install RHEL6.0-Snapshot-7-Refresh
2. Login into desktop
3. Open terminal window, emacs /tmp/foo &
  
Actual results:
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x469138]
1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x4a2fe4]
2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x4739d4]
3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f0c40a09000+0x524f) [0x7f0c40a0e24f]
4: /usr/bin/Xorg (0x400000+0x74897) [0x474897]
5: /usr/bin/Xorg (0x400000+0x10dad3) [0x50dad3]
6: /lib64/libpthread.so.0 (0x31b4800000+0xf4c0) [0x31b480f4c0]
7: /lib64/libc.so.6 (0x31b4400000+0xf0dce) [0x31b44f0dce]
8: /lib64/libc.so.6 (0x31b4400000+0x7c1f8) [0x31b447c1f8]
9: /lib64/libc.so.6 (__libc_malloc+0x62) [0x31b4479af2]
10: /lib64/libc.so.6 (0x31b4400000+0x6fdbb) [0x31b446fdbb]
11: /lib64/libc.so.6 (0x31b4400000+0x75736) [0x31b4475736]
12: /lib64/libc.so.6 (0x31b4400000+0x78e78) [0x31b4478e78]
13: /lib64/libc.so.6 (__libc_malloc+0x6d) [0x31b4479afd]
14: /usr/bin/Xorg (miRegionCreate+0x23) [0x454be3]
15: /usr/bin/Xorg (miRectsToRegion+0x33) [0x455e43]
16: /usr/bin/Xorg (miChangeClip+0x8e) [0x55491e]
17: /usr/lib64/xorg/modules/libexa.so (0x7f0c42287000+0x2c6d) [0x7f0c42289c6d]
18: /usr/bin/Xorg (0x400000+0xd42b4) [0x4d42b4]
19: /usr/bin/Xorg (SetClipRects+0xbf) [0x4368ef]
20: /usr/bin/Xorg (0x400000+0x297a6) [0x4297a6]
21: /usr/bin/Xorg (0x400000+0x2ab5c) [0x42ab5c]
22: /usr/bin/Xorg (0x400000+0x21ffa) [0x421ffa]
23: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x31b441ec5d]
24: /usr/bin/Xorg (0x400000+0x21bb9) [0x421bb9]


Expected results:
Should continue to operate normally

Additional info:

--- Additional comment from jburke on 2010-07-26 14:36:59 EDT ---

nouveau - nVidia Corporation G96 [Quadro FX 580] (rev a1)

--- Additional comment from pm-rhel on 2010-07-26 14:42:38 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from jburke on 2010-07-26 14:42:56 EDT ---

Created an attachment (id=434494)
xorg log

--- Additional comment from pm-rhel on 2010-07-26 14:57:38 EDT ---

This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 2 Adam Jackson 2010-07-27 17:05:15 UTC

The backtrace above shows malloc calling __libc_message(abort=1) which calls malloc.  This deadlocks.  That's awful.  Hung processes are worse than crashed processes.

Either this needs to use some statically allocated bit of memory in .bss, or use sbrk() and just cope with leaking if the app is insane enough to trap SIGABRT and try to carry on.

Comment 3 Adam Jackson 2010-07-27 17:05:47 UTC

Reassigning back to glibc.  I didn't change component for nothing.

Comment 4 Andreas Schwab 2010-07-28 15:40:27 UTC

This has nothing to do with catching SIGABRT but with having the abort message visible in coredumps.

Comment 6 Adam Jackson 2010-07-29 14:28:29 UTC

(In reply to comment #4)
> This has nothing to do with catching SIGABRT but with having the abort message
> visible in coredumps.    

If the app catches SIGABRT, it may continue instead of exiting.  If it does, and then the abort is raised _again_, we try to free() the old message.  That's what I meant by "use sbrk() and just leak"; we could allocate the storage for the crash message with sbrk(), but we'd have no way of freeing it.

We could call into some internal bit of malloc that assumes the lock has already been taken, but that's dangerous, we're already at this point _because_ malloc's bookkeeping is corrupted.

We could use alloca, but then you'd get no record of the crash if the app does catch SIGABRT.

We could use mmap, but then you'd leak maps instead of leaking heap.

Or we could use a static buffer in .bss, but then that's additional data space in every process.

But really, at this point in a process' death throes, who cares.  Allocate with sbrk because it's easy.  Anyone trying to survive from a SIGABRT is already in a state of sin.

Comment 12 Andreas Schwab 2011-01-10 09:23:13 UTC

*** Bug 664365 has been marked as a duplicate of this bug. ***

Comment 20 Eric Bachalo 2011-02-24 21:57:11 UTC

Moving to RHEL 6.2 release, as no fix is upstream yet.  This will need to be fixed upstream before it is considered for a RHEL release.

Comment 30 Andreas Schwab 2011-06-03 13:45:03 UTC


*** This bug has been marked as a duplicate of bug 676591 ***