618743 – [RHEL6] malloc's error path deadlocks

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 618743 - [RHEL6] malloc's error path deadlocks

Summary: [RHEL6] malloc's error path deadlocks

Keywords:
Status:	CLOSED DUPLICATE of bug 676591
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	6.1
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Andreas Schwab
QA Contact:	qe-baseos-tools-bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	664365 (view as bug list)
Depends On:
Blocks:	GSS_6_2_PROPOSED
TreeView+	depends on / blocked

Reported:	2010-07-27 15:57 UTC by Adam Jackson
Modified:	2018-11-27 21:45 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	618356
Environment:
Last Closed:	2011-06-03 13:45:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Sourceware	11901	0	None	None	None	Never

Description Adam Jackson 2010-07-27 15:57:02 UTC

+++ This bug was initially created as a clone of Bug #618356 +++

Description of problem:
When launching emacs. I periodically get a "hang" crash. The system gui is hung, I can't do anything. I am able to ssh into the system.

Version-Release number of selected component (if applicable):
xorg-x11-server-Xorg-1.7.7-21.el6.x86_64

How reproducible:
Intermittant

Steps to Reproduce:
1. Install RHEL6.0-Snapshot-7-Refresh
2. Login into desktop
3. Open terminal window, emacs /tmp/foo &
  
Actual results:
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x469138]
1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x4a2fe4]
2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x4739d4]
3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f0c40a09000+0x524f) [0x7f0c40a0e24f]
4: /usr/bin/Xorg (0x400000+0x74897) [0x474897]
5: /usr/bin/Xorg (0x400000+0x10dad3) [0x50dad3]
6: /lib64/libpthread.so.0 (0x31b4800000+0xf4c0) [0x31b480f4c0]
7: /lib64/libc.so.6 (0x31b4400000+0xf0dce) [0x31b44f0dce]
8: /lib64/libc.so.6 (0x31b4400000+0x7c1f8) [0x31b447c1f8]
9: /lib64/libc.so.6 (__libc_malloc+0x62) [0x31b4479af2]
10: /lib64/libc.so.6 (0x31b4400000+0x6fdbb) [0x31b446fdbb]
11: /lib64/libc.so.6 (0x31b4400000+0x75736) [0x31b4475736]
12: /lib64/libc.so.6 (0x31b4400000+0x78e78) [0x31b4478e78]
13: /lib64/libc.so.6 (__libc_malloc+0x6d) [0x31b4479afd]
14: /usr/bin/Xorg (miRegionCreate+0x23) [0x454be3]
15: /usr/bin/Xorg (miRectsToRegion+0x33) [0x455e43]
16: /usr/bin/Xorg (miChangeClip+0x8e) [0x55491e]
17: /usr/lib64/xorg/modules/libexa.so (0x7f0c42287000+0x2c6d) [0x7f0c42289c6d]
18: /usr/bin/Xorg (0x400000+0xd42b4) [0x4d42b4]
19: /usr/bin/Xorg (SetClipRects+0xbf) [0x4368ef]
20: /usr/bin/Xorg (0x400000+0x297a6) [0x4297a6]
21: /usr/bin/Xorg (0x400000+0x2ab5c) [0x42ab5c]
22: /usr/bin/Xorg (0x400000+0x21ffa) [0x421ffa]
23: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x31b441ec5d]
24: /usr/bin/Xorg (0x400000+0x21bb9) [0x421bb9]


Expected results:
Should continue to operate normally

Additional info:

--- Additional comment from jburke on 2010-07-26 14:36:59 EDT ---

nouveau - nVidia Corporation G96 [Quadro FX 580] (rev a1)

--- Additional comment from pm-rhel on 2010-07-26 14:42:38 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from jburke on 2010-07-26 14:42:56 EDT ---

Created an attachment (id=434494)
xorg log

--- Additional comment from pm-rhel on 2010-07-26 14:57:38 EDT ---

This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 2 Adam Jackson 2010-07-27 17:05:15 UTC

The backtrace above shows malloc calling __libc_message(abort=1) which calls malloc.  This deadlocks.  That's awful.  Hung processes are worse than crashed processes.

Either this needs to use some statically allocated bit of memory in .bss, or use sbrk() and just cope with leaking if the app is insane enough to trap SIGABRT and try to carry on.

Comment 3 Adam Jackson 2010-07-27 17:05:47 UTC

Reassigning back to glibc.  I didn't change component for nothing.

Comment 4 Andreas Schwab 2010-07-28 15:40:27 UTC

This has nothing to do with catching SIGABRT but with having the abort message visible in coredumps.

Comment 6 Adam Jackson 2010-07-29 14:28:29 UTC

(In reply to comment #4)
> This has nothing to do with catching SIGABRT but with having the abort message
> visible in coredumps.    

If the app catches SIGABRT, it may continue instead of exiting.  If it does, and then the abort is raised _again_, we try to free() the old message.  That's what I meant by "use sbrk() and just leak"; we could allocate the storage for the crash message with sbrk(), but we'd have no way of freeing it.

We could call into some internal bit of malloc that assumes the lock has already been taken, but that's dangerous, we're already at this point _because_ malloc's bookkeeping is corrupted.

We could use alloca, but then you'd get no record of the crash if the app does catch SIGABRT.

We could use mmap, but then you'd leak maps instead of leaking heap.

Or we could use a static buffer in .bss, but then that's additional data space in every process.

But really, at this point in a process' death throes, who cares.  Allocate with sbrk because it's easy.  Anyone trying to survive from a SIGABRT is already in a state of sin.

Comment 12 Andreas Schwab 2011-01-10 09:23:13 UTC

*** Bug 664365 has been marked as a duplicate of this bug. ***

Comment 20 Eric Bachalo 2011-02-24 21:57:11 UTC

Moving to RHEL 6.2 release, as no fix is upstream yet.  This will need to be fixed upstream before it is considered for a RHEL release.

Comment 30 Andreas Schwab 2011-06-03 13:45:03 UTC


*** This bug has been marked as a duplicate of bug 676591 ***

Note You need to log in before you can comment on or make changes to this bug.